Sources overview
Learn how to use sources, or source connectors, in the ÃÛ¶¹ÊÓƵ Experience Platform interface. Sources are easily configurable integrations that allow you to ingest data from ÃÛ¶¹ÊÓƵ, first-party, and third-party applications into Platform’s Real-Time Customer Profile and data lake. For more information, please see the sources documentation.
Transcript
Hi there. I’m going to give you a quick overview of source connectors in ÃÛ¶¹ÊÓƵ Experience Platform. Data ingestion is a fundamental step to getting your data in Experience Platform so you can use it to build 360 degree Real-time Customer Profiles and use them to provide meaningful experiences. ÃÛ¶¹ÊÓƵ Experience Platform allows data to be ingested from various external sources while giving you the ability to structure, labor and enhance incoming data using Platform services. You can ingest data from a wide variety of sources, such as ÃÛ¶¹ÊÓƵ applications, Cloud-based storage, databases and many others. There are three mechanisms you can use to ingest your data you can send it via a Platform API request using Batch or Streaming option, you can drag and drop data files into the UI and ingest it with Batch mode, or you can configure a source connector that will ingest data from the system of origin using the most appropriate mode for that system. Let’s dive deeper into sources. When you log into Platform you will see sources in the left navigation. Clicking Sources will take you to the Source Catalog screen there you can see all the sources currently available in Platform. Note that, there are source connectors for ÃÛ¶¹ÊÓƵ applications, CRM solutions, Cloud storage providers and more. By clicking on the My Sources option you can view all sources connected to Platform. You can also use the search field to locate a specific source. Let’s explore how to ingest data from the Cloud storage into Experience Platform. Each source has its specific configuration details but the general configuration for non-ÃÛ¶¹ÊÓƵ application is somewhat similar. For our video let’s use the Amazon S3 Cloud storage. Select the decide source, let’s click on Creating a New account and provide the source connection details. Complete the required fields for account authentication and then initiate a source connection request. If the connection is successful, click Next to proceed to data selection. In this step, we choose the source file for data ingestion and verify the file data format. Note that, the ingested file data can be formatted as JSON, XDM parquet or Delimited. Currently for Delimited files you have an option to preview sample data of the source file. Let’s proceed to the next step to assign a target dataset for the incoming data. You can choose an existing dataset or create a new dataset. Let’s choose the new dataset option and provide a dataset name and description. To create a dataset you need to have an associated Schema, using the Schema finder, assign a Schema to this dataset. Upon selecting a Schema for this dataset, Experience Platform performs a mapping between the source of file field and the target field. This mapping is performed based on the title and type of the field. This pre-mapping of standard fields are editable. You can add a new field mapping to map a source field to a target field. Add calculated Field option, lets you run functions on source fields to prepare the data for ingestion for example, we can combine the first name field and the last name field into a calculated field using the concatenation function before ingesting the data into a dataset field. You can also preview the sample result of a calculated field. After reviewing the field mapping, you can also preview data to see how the ingested data will get stored in your dataset. The mapping looks good, let’s move to the next step. Scheduling lets you choose a frequency at which data flow should happen from source to a dataset. Let’s select a frequency of 15 minutes for this video and set a start time for data flow. To let historical data to be ingested enable the backfill option. Backfill is a Boolean value that determines what data is initially ingested. If backfill is enabled all current files in the specific path will be ingested during the first scheduled edition. If backfill is disabled, only the files that are loaded in between the first run of the ingestion and the start time will be ingested, files loaded before start time will not be ingested. Let’s move to the data flow step. Provide a name for your data flow. In the data flow detail step the Partial Ingestion Toggle allows you to enable or disable the use of Partial Batch Ingestion. The Error Threshold allows you to set the percentage of acceptable errors before they enter batch fields. By default, this value is set to 5%. Lett’s review the source configuration details and save your changes. We don’t see any data flow on statuses as we have set a frequency of 15 minutes for our data flow runs. So, let’s wait for the data flow to run a few times. Let’s review this page after an hour, let’s refresh the page you can now see that our data flow run status has changed. Open the data flow to view more details about the activity. During this 45 minutes, three cycles of data flow run jobs will complete between our source and the target dataset. The data flow activity tab provides a quick summary of each data flow run. Two of them finished with zero failed records and the last run was completed successfully with a few field records. And you wonder why the most recent data flow run was successful when it had failed records? That’s because we enabled partial ingestion when we set up the data flow and chose an Error Threshold of 5%. Since we enabled error diagnosis for our data flows, you can also see the error code and description in the data flow run overview window. Experience Platform lets users download or preview the error diagnosis to determine what went wrong with the failed records. Let’s go back to the data flow activity tab. At this point, we verified that the data flow was completed successfully from the source to our dataset. Let’s open up a dataset to verify the data flow and activities. You can open the Luma Customer Loyalty dataset right from The data flow window or you can access it using the datasets option from the left navigation. Under the dataset activity, you can see a quick summary of ingested batches and failed batches during a specific time window. Scroll down to view the ingested batch ID. Each batch resents actual data ingestion from a Source Connector to a target dataset. Let’s also preview the dataset to make sure that data ingestion was successful. Finally, you can also add any governance labors to the dataset field to restrict its usage under the data governance tab.
To enable our dataset for a Real-time Profile ensure that the associated Schema is enabled for Real-time Profile. Once a Schema is enabled for profile it cannot be disabled or deleted, also fields cannot be removed from the Schema after this point. These implications are essential to keep in mind when working with your data in your production environment. It is recommended to verify and test the data ingestion process to capture and address any issues that may arise before enabling the dataset and Schema for the profile. Now let’s enable profile for our dataset and then save our changes. In the next successful batch run data ingested into our dataset will be used for creating Real-time Customer Profiles. Finally, let’s take a quick look at sources, we already discussed catalog at the beginning of this video. Now let’s click on the Accounts Tab and view the Amazon S3 Cloud account that we created. The Accounts Tab list all configured accounts used to create various sources. In the Data flows tab you can see the list of data flow tasks scheduled to retrieve an ingest data from a source to a Platform dataset. The System View tab is a neat visualization that shows you a view of the sources configured for the organization flowing into the Real-time Customer Profile. If you have ÃÛ¶¹ÊÓƵ Real-time Customer at a Platform you will also see the destinations to which data is flowing out. You can see that the S3 source connection that we created is enabled for profile. ÃÛ¶¹ÊÓƵ Experience Platform allows data to be ingested from external sources while providing you with the ability to structure, labor and enhance incoming data using Platform services. -
recommendation-more-help
9051d869-e959-46c8-8c52-f0759cee3763