蜜豆视频

Create or edit a connection create-or-edit-a-connection

The connection creation and edit workflow experience brings all the dataset and connection configuration settings to the center of the screen with an assistive workflow. It provides detailed dataset selection, configuration, and review experience. And allows you to specify critical information like dataset type, size, schema, dataset id, batch status, backfill status, Person IDs, and much more, to reduce the risk of wrong connection configuration. Here is an overview of the capabilities:

  • You can enable a rolling data retention window when you create the connection.
  • You can add to and remove datasets from a connection. (Removing a dataset removes it from the connection and impacts any associated data views and underlying Analysis Workspace projects.)
  • You can enable and request backfill data per dataset.
  • You can edit datasets, for example to request another backfill.
  • You can import existing data per dataset.
Video to illustrate the create and edit a connection experience

video poster

Prerequisites

The maximum number of datasets you can add to a connection is capped at 100. The mix depends on which Customer Journey Analytics package your company has purchased.

Contact your administrator if you鈥檙e unsure which Customer Journey Analytics package you have.

Select package
Foundation package
Any combination of event/profile/lookup/summary datasets, adding up to 100
One event dataset per connection
Up to 99 profile, lookup, or summary datasets per connection

Create and configure the connection create-connection

  1. In Customer Journey Analytics, select the Connections tab.

  2. Select Create new connection.

    Untitled connection settings

  3. Configure the connection settings.

    table 0-row-2 1-row-2 2-row-2 3-row-2 4-row-2 5-row-2 6-row-2 7-row-2 8-row-2 9-row-2 10-row-2 11-row-2 12-row-2 13-row-2 14-row-2 15-row-2 16-row-2 17-row-2 18-row-2 layout-auto
    Setting Description
    Connection name Enter a unique name for the connection.
    Connection description Describe the purpose of this connection.
    Sandbox

    Choose a sandbox in Experience Platform that contains the dataset/s to which you want to create a connection.

    蜜豆视频 Experience Platform provides sandboxes which partition a single Platform instance into separate virtual environments to help develop and evolve digital experience applications. You can think of sandboxes as 鈥渄ata silos鈥 that contain datasets. Sandboxes are used to control access to datasets.

    Once you have selected the sandbox, the left rail shows all the datasets in that sandbox that you can pull from.

    Enable rolling data window

    This checkbox, if checked, lets you define Customer Journey Analytics data retention as a rolling window in months (1 month, 3 months, 6 months, and so on), at the connection level.

    Data retention is based on event dataset timestamps and applies to event datasets only. No rolling data window setting exists for profile or lookup datasets, since there are no applicable timestamps. However, if your connection includes any profile or lookup datasets (besides one or more event datasets), that data is retained for the same time period.

    The main benefit is that you store or report only on data that is applicable and useful and delete older data that is no longer useful. It helps you stay under your contract limits and reduces the risk of overage cost.

    If you leave the default (unchecked), the 蜜豆视频 Experience Platform data retention setting supersedes the retention period. If you have 25 months鈥 worth of data in Experience Platform, Customer Journey Analytics gets 25 months of data through backfill. If you deleted 10 of those months in Platform, Customer Journey Analytics would retain the remaining 15 months.

    Add datasets (see below) Add datasets if no datasets appear in your dataset listing.
    Dataset name

    Select one or more datasets that you want to pull into Customer Journey Analytics and select Add.

    (If you have many datasets to choose from, you can search for the right one(s) using the Search datasets search bar above the list of datasets.)

    Last updated For event datasets only, this setting is automatically set to the default timestamp field from event-based schemas in Experience Platform. 鈥淣/A鈥 means that this dataset contains no data.
    Number of records The total records in the previous month for the dataset in Experience Platform.
    Schema The schema based on which the dataset was created in 蜜豆视频 Experience Platform.
    Dataset type For each dataset that you added to this connection, Customer Journey Analytics automatically sets the dataset type based on the data coming in. There are 3 different dataset types: Event data, Profile data, and Lookup data. See the table below for an explanation of dataset types.
    Granularity The granularity of the data in the dataset; only applicable for summary datasets.
    Data source type The data source type of the dataset. Not applicable for summary datasets.
    Person ID

    Select a Person ID from the drop-down list of available identities. These identities were defined in the dataset schema in the Experience Platform. See below for information on how to use Identity Map as a Person ID.

    IMPORTANT: If there are no Person IDs to choose from, that means one or more Person IDs have not been defined in the schema. View on how to define an identity in Experience Platform.

    Key For lookup datasets only (such as _id).
    Matching Key For lookup datasets only (such as _id).
    Import new data Set to On or Off.
    Backfill data

    You can request to backfill the data in a dataset. For example, you can request to backfill the last 7 days worth of data. Configure the dataset correctly and test your connection. If everything looks good, you can backfill all the remaining data with ease.

    In addition, you can enable the import of new data by dataset.

    Backfill status This status indicates whether any backfill data is processing.

Add and configure datasets add-dataset

The new workflow lets you add an Experience Platform dataset when you create a connection.

  1. In the Connection settings dialog, select Add datasets.

  2. In the Select datasets step, you see a list of the Experience Platform datasets.

    Select datasets

    For each dataset, the list shows:

    table 0-row-2 1-row-2 2-row-2 3-row-2 4-row-2 5-row-2 6-row-2 7-row-2
    Column Description
    Dataset Name of the dataset. Select the name to direct you to the dataset in Experience Platform. Select Info to display a popup with more details for the dataset. You can select Edit in Platform to edit the dataset directly in Experience Platform.
    Dataset type The type of dataset: Event, Profile, Lookup, or Summary.
    Number of records The total records in the previous month for the dataset in Experience Platform.
    Schema The schema for the dataset. Select the name to direct you to the schema in Experience Platform.
    Last batch The state of the last batch ingested in Experience Platform. See Batch states more information.
    Dataset ID The id of the dataset.
    Last updated The last updated timestamp of the dataset.
  3. Select one or more datasets and select Next. At least one event dataset must be part of the connection.

    • To change the columns displayed for the list of datasets, select Column settings and select the columns to be displayed in the Customize table dialog.
    • To search for a specific dataset, use the Search search field.
    • To toggle between showing or hiding the selected datasets, select Select Hide selected or Show selected.
    • To remove a dataset from the list of selected datasets, use Close . To remove all selected datasets, select Clear all.
  4. Now, configure the datasets one by one.

    Configure datasets

    table 0-row-2 1-row-2 2-row-2 3-row-2 4-row-2 5-row-2 6-row-2 7-row-2 8-row-2 9-row-2 10-row-2 11-row-2 12-row-2 13-row-2 14-row-2 15-row-2 16-row-2 17-row-2 layout-auto
    Setting Description
    Person ID

    Only available for event and profile datasets. Select a Person ID from the drop-down list of available identities. These identities were defined in the dataset schema in the Experience Platform. See below for information on how to use Identity Map as a Person ID.

    If there are no Person IDs to choose from, that means one or more Person IDs have not been defined in the schema. See Define identity fields in the UI for more information.

    The value for the selected Person ID is considered to be case sensitive. For example, abc123 and ABC123 are two different values.

    Timestamp For event and summary datasets only, this setting is automatically set to the default timestamp field from event-based schemas in Experience Platform.
    Key Only available for lookup datasets. The key to use for a Lookup dataset.
    Matching key Only available for lookup datasets. The matching key to join on in one of the event datasets. If this list is empty, you probably haven鈥檛 added or configured an event dataset.
    Timezone Only available for summary data. Select the appropriate timezone for the time-series summary data.
    Data source type

    Select a type of data source.
    Types of data sources include:

    • Web data
    • Mobile App data
    • POS data
    • CRM data
    • Survey data
    • Call Center data
    • Product data
    • Accounts data
    • Transaction data
    • Customer Feedback data
    • Other

    This field is used to survey the types of data sources in use.

    Import new data Enable this option if you want to establish an ongoing connection. With an ongoing connection new data batches that are added to the datasets are available automatically in Workspace.
    Dataset backfill

    Enable Backfill all existing data to ensure that all existing data is backfilled.

    Select Request backfill to backfill historical data for a specific period. You can define up to 10 dataset backfill periods.

    1. Define the period by entering start and end data or selecting dates using Calendar .
    2. Select Queue backfill to add the backfill to the list, or Cancel to cancel.

    For each entry, select Edit to edit the period, or select Delete to delete the entry.

    On backfills:

    • You can backfill each dataset individually.
    • You prioritize new data added to a dataset in the connection, so this new data has the lowest latency.
    • Any backfill (historical) data is imported at a slower rate. The amount of historical data influences the latency.
    • The Analytics source connector imports up to 13 months of data (irrespective of size) for production sandboxes. The backfill in non-production sandboxes is limited to 3 months.
    Transform dataset For specific B2B lookup datasets, you can enable the transformation of a dataset for proper B2B person-based reporting scenarios. See Transform datasets for B2B lookups for more information.
    Backfill status

    Possible status indicators are:

    • Success
    • X backfill(s) processing
    • Off
    Dataset ID This ID is automatically generated.
    Description The description given to this dataset when it was created.
    Dataset size The dataset鈥檚 size.
    Schema The schema based on which the dataset was created in 蜜豆视频 Experience Platform.
    Dataset The name of the dataset.
    Preview: dataset name Previews the dataset with date, my ID, and Identifier columns.
    Remove You can delete or remove the dataset and change the Person ID without deleting the whole connection. Deleting or removing reduces the costs involved in data ingestion and the cumbersome process of recreating the whole connection and associated data views.

Connection preview preview

To preview the connection that you have created, select Connection preview in the Connection settings dialog.

Connection preview

This preview contains some columns listing the connection configuration. What column types are shown depends on your individual datasets.

Dataset types dataset-types

For each dataset that you add to this connection, Customer Journey Analytics automatically sets the dataset type based on the data coming in.

IMPORTANT
Add at least one event or summary dataset as part of a connection.

There are different dataset types: Event data, Profile data, Lookup data and Summary data.

Dataset Type
Description
Timestamp
Schema
Person ID
Event
Data that represents events in time. For example, web visits, interactions, transactions, POS data, survey data, ad impression data, and so on. This data could be typical clickstream data, with a customer ID or a Cookie ID, and a timestamp. With event data, you have flexibility as to which ID is used as the Person ID.
Automatically set to the default timestamp field from event-based schemas in Experience Platform.
Any built-in or custom schema that is based on an XDM class with the 鈥淭ime Series鈥 behavior. Examples include 鈥淴DM Experience Event鈥 or 鈥淴DM Decision Event.鈥
You can pick which Person ID you want to include. Each dataset schema defined in the Experience Platform can have its own set of one or more identities defined and associated with an Identity Namespace. Any of these identities can be used as the Person ID. Examples include Cookie ID, Stitched ID, User ID, Tracking Code, and so on.
Lookup
You can add datasets as lookups of fields within all dataset types: Profile, Lookup, and Event datasets (the latter was always supported). This additional capability expands the capability of Customer Journey Analytics to support complex data models, including B2B. This data is used to look up values or keys found in your Event, Profile, or Lookup data. You can add up to two levels of lookups. (Note that Derived Fields cannot be used as matching keys for lookups within Connections.) For example, you might upload lookup data that maps numeric IDs in your event data to product names. See the B2B example for an example.
N/A
Any built-in or custom schema that is based on an XDM class with the 鈥淩ecord鈥 behavior, except for the 鈥淴DM Individual Profile鈥 class.
N/A
Profile
Data that is applied to your persons, users, or customers in the Event data. For example, allows you to upload CRM data about your customers.
N/A
Any built-in or custom schema that is based on the 鈥淴DM Individual Profile鈥 class.
You can pick which Person ID you want to include. Each dataset (except summary datasets), defined in Experience Platform, has its own set of one or more Person IDs defined. For example, Cookie ID, Stitched ID, User ID, Tracking Code, and so on.
Person ID Note: If you create a connection that includes datasets with different IDs, the reporting reflects that. To merge datasets, you need to use the same Person ID.
Summary
Time-series data that is not tied to an individual Person ID. Summary data represents aggregated data at a different level of aggregation, for example campaigns. You can use this data in Customer Journey Analytics to support various use cases. See Summary data for more information.
Automatically set to the default timestamp field from event-based summary metrics schemas in Experience Platform. Only hourly or daily granularity is supported.
Any built-in or custom schema that is based on the 鈥淴DM Summary Metrics鈥 class.
N/A

Use numeric fields as lookup keys and lookup values numeric

This lookup functionality is useful if you want to add a numeric field such as a cost or margin to a string-based key field. It allows numeric values to be part of lookups, either as keys or as values. In your lookup schema, you might have numeric values tied to, for example, your product names, COGS, campaign marketing cost, or margins. Here is an example lookup schema in 蜜豆视频 Experience Platform:

Lookup schema

You now support bringing in these values as metrics or dimensions into Customer Journey Analytics reporting. When you set up your connection and pull in lookup datasets, you can edit the datasets to select the Key and Matching Key:

Edit-dataset

When you set up a data view based on this connection, you add the numeric values as components to the data view. Any project based on this data view can then report on these numeric values.

Use Identity Map as a Person ID id-map

Customer Journey Analytics supports the ability to use the Identity Map for its Person ID. Identity Map is a map data structure that allows you to upload key -> value pairs. The keys are identity namespaces and the value is a structure that holds the identity value. The Identity Map exists on each row/event uploaded and is populated for each row accordingly.

The Identity Map is available for any dataset that uses a schema based on the ExperienceEvent XDM class. When you select such a dataset to be included in a Customer Journey Analytics Connection, you have the option of selecting either a field as the primary ID or the Identity Map:

If you select Identity Map, you get two additional configuration options:

Option
Description
Use Primary ID Namespace
This option instructs Customer Journey Analytics to find the identity in the Identity Map that is marked with a primary=true attribute and use that identity as the Person ID for that row. This identity is the primary key that is used in Experience Platform for partitioning. And this identity is also the prime candidate for usage as Customer Journey Analytics Person ID (depending on how the dataset is configured in a Customer Journey Analytics Connection).
Namespace
(This option is only available if you do not use the Primary ID Namespace.) Identity namespaces are a component of the Experience Platform Identity Service. Namespaces serve as indicators of the context to which an identity relates. If you specify a namespace, Customer Journey Analytics searches each row鈥檚 Identity Map for this namespace key and use the identity under that namespace as the Person ID for that row. Since Customer Journey Analytics cannot do a full dataset scan of all rows to determine which namespaces are present, all possible namespaces are displayed in the drop-down list. Know which namespaces are specified in the data; these namespaces are not auto-detected.

Identity Map edge cases id-map-edge

This table shows the two configuration options when edge cases are present and how they are handled:

Option
No IDs are present in the Identity Map
Multiple IDs, none marked as primary
Multiple IDs are marked as primary
Single ID, marked as primary or not
Invalid namespace with an ID marked as primary
Use Primary ID Namespace checked
Customer Journey Analytics drops the row.
Customer Journey Analytics drops the row, as no primary ID is specified.
All IDs marked as primary, under all namespaces, are extracted into a list. They are then alphabetically sorted; with the new sorting, the first namespace with its first ID is used as the Person ID.
The single ID is used as the Person ID.
Even though the namespace may be invalid (not present in 蜜豆视频 Experience Platform), Customer Journey Analytics uses the primary ID under that namespace as the Person ID.
Specific Identity Map namespace selected
Customer Journey Analytics drops the row.
All IDs under the selected namespace are extracted into a list and the first is used as the Person ID.
All IDs under the selected namespace are extracted into a list and the first is used as the Person ID.
All IDs under the selected namespace are extracted into a list and the first is used as the Person ID.
All IDs under the selected namespace are extracted into a list and the first is used as the Person ID. (Only a valid namespace can be selected at Connection creation time, so it is not possible for an invalid namespace/ID to be used as Person ID)

Calculate the average number of daily events average-number

This calculation is done for every dataset in the connection.

  1. Go to 蜜豆视频 Experience Platform Query Services and create a query.

    The query would look like this:

    code language-none
    Select AVG(A.total_events) from (Select DISTINCT COUNT (*) as total_events, date(TIMESTAMP) from analytics_demo_data GROUP BY 2 Having total_events>0) A;
    

    In this example, 鈥渁nalytics_demo_data鈥 is the name of the dataset.

  2. To show all the datasets that exist in 蜜豆视频 Experience Platform, perform the Show Tables query.

Algorithmic pruning of large lookup datasets

When creating a connection, you can add large datasets for lookup purposes. For example, a dataset representing a product catalog so descriptive product information can be looked up when building reports and visualizations. Such a large lookup dataset can exceed the maximum of 10 million unique lookups currently implemented as a guardrail, resulting in additional data being skipped.

You can request an algorithmic pruning of a large lookup dataset. This algorithmic pruning only keeps data in the lookup dataset that matches the keys in your event dataset. This way, you don鈥檛 need to load the entire unpruned lookup dataset. Old or less frequently used items are removed, which might slightly affect reports but brings significant benefits. The algorithm looks back 90 days and updates weekly.

Contact your 蜜豆视频 support team for further information and to enable this capability.

recommendation-more-help
080e5213-7aa2-40d6-9dba-18945e892f79