Customer Data Feeds | 蜜豆视频 Audience Manager

Documentation Audience Manager Audience Manager User Guide

Customer Data Feeds customer-data-feeds

Last update: Tue Aug 30 2022 00:00:00 GMT+0000 (Coordinated Universal Time)

Topics:
Customer Data Feeds

Basic information about Customer Data Feed (CDF) files and instructions on how to get started. Start here if you鈥檙e interested in receiving CDF files or just want more information.

File Contents and Purpose file-contents-purpose

A CDF file contains the same data that an Audience Manager event call (/event) sends to our servers. This includes data like user IDs, trait IDs, segment IDs, and all the other parameters captured by an event call. Internal Audience Manager systems processes event data into a CDF file with content organized into fields that appear in a set order. Audience Manager tries to generate CDF files hourly and stores them in a secure, customer-specific bucket on an Amazon S3 server. We provide these files so you can work with Audience Manager data outside of the limits imposed by our user interface.

IMPORTANT

Note the following restrictions when working with CDF files:

Prior to setting up CDF file delivery, please ensure you have the appropriate permissions from third-party data providers for the export of third-party traits. Audience Manager currently does not support functionality in the user interface to request CDF file delivery export permission from Third-Party Data Providers, so please reach out to them independently.
You should not use CDF files as a proxy to monitor page traffic, reconcile report discrepancies, or for billing, etc.

Getting Started getting-started

There is no self-service process to start CDF file delivery. Contact your Audience Manager consultant or Customer Care to get started. During implementation, your Audience Manager representative will:

Set up your Amazon S3 storage bucket.
Provide read-only S3 authentication credentials to your file storage bucket. You will not be able to see or access directories and files that belong to other customers.

File notifications and CDF files will appear in your S3 bucket when they鈥檙e ready for download. You鈥檙e responsible for monitoring and downloading files from your assigned S3 directory. See Customer Data Feed File Processing Notifications.

Next Steps next-steps

The sections below and the Customer Data Feed FAQ can help you become more familiar with this service.

Customer Data Feed Contents Defined cdf-defined

Lists and defines the data elements and arrays in a CDF file, by order of appearance. Definitions include data types, but this information is not part of a CDF file.

IMPORTANT

Event pixels are excluded by default in CDF configurations. Ensure that you specify in your request to client care if you desire event pixels to be included in your CDF files. Each event pixel will populate as a unique row in your CDF files.

Definitions definitions

A CDF file includes some or all of the fields defined below. For information about internal file organization, see Customer Data Feed File Structure.

Field

Data Type

Description

Event Time

Timestamp

The time a CDF file was processed by the Data Collection Servers (DCS). The timestamp uses the yyyy-mm-dd hhss format and is set in the UTC time zone.

Note: The Event Time is not:

The time of the page event or the event call itself, although it may be close to those times.
Related to the DCS hour in the file name. See also, Customer Data Feed File Name Times and File Content Times....

Device

String

This is the Unique User ID (UUID), which is a 38-digit device ID for your site visitor. See also, Index of IDs in Audience Manager.

Container ID

Numeric

The ID of the container that fires ID syncs. This field only populates if you set the container ID in the d_nsid field within your site implementation. Otherwise, the default value of 0 will not be included in CDF files.

Realized Traits

Numeric Array

An array of trait IDs that contains all the traits a visitor realized (qualified for) in the event call.

Note that the array can contain traits for which the visitor had qualified before and for which they re-qualify through this event call.

Realized Segments

Numeric Array

An array of segment IDs that contains all the segments a visitor realized (qualified for) in the event call.

Request Parameters

String

A string that captures all the parameters (variables, IDs, key-value pairs, device advertising IDs, etc.) passed in on the event call.

Shortened example:

d_rtbd:json,c_contextData.a.CarrierName:mobile,c_contextData.a.adid:92D56353-49C5-431E-B474-FC528D585810,c_contextData.a,RunMode:Application,c_contextData.a.DaysSinceLastUpgrade:61,d_cid_ic:xid%01EACB6E40-AC65-4012-9FE9-ABD59965E9C4%011,c_contextData.a.PrevSessionLength:583

Referer Data Type

String

The unencoded URL of the referring page (if any).

IP Data Type

String

The IP address for the visitor captured in the event call.

MCDevice

String

The Experience Cloud ID (MID) assigned to the site visitor. See also, Cookies and the蜜豆视频 Experience Platform Identity Service.

All Segments

Numeric Array

An array of segment IDs that contains previously realized segments and new segments the visitor is qualified for.

All Traits

Numeric Array

An array of first and third-party trait IDs that contains previously realized traits and new traits the visitor has qualified for since the last generated data feed.

Customer Data Feed File Structure cdf-file-structure

Lists and defines the data structure of a CDF file. This includes data sequence, field delimiters and separators, a data file map, and sample file.

Data Field Identifiers and Sequence identifiers-and-sequence

CDF files do not contain labeled columns or field headers. Instead, a CDF file defines fields and arrays with non-printing ASCII characters. Also, the CDF file lists each field and array in a specific order. Understanding the field identifiers and order will help you parse the file properly.

CDF File Element

Description

Field Separators and Delimiters

These non-printing characters define the elements and structure of your CDF file:

Ctrl + a (ASCII 001 or ^A) separates data in individual fields with a non-printing space indicator.
Ctrl + b (ASCII 002 or ^B) separates data an array and request parameters.
Ctrl + c (ASCII 003 or ^C) defines key-value pairs.

Field Sequence

Important: Audience Manager reserves the right to add new fields to the end of the CDF file in future releases. This means the technical design of your file parsing system should not assume a fixed number of columns (though it may assume a fixed order for existing columns).

Data in your CDF file appears in the order shown below. /N may appear in place of any of these fields, indicating a null value.

Event Time
Device
Container ID
Realized Traits
Realized Segments
Request Parameters
Referer
IP Address
Experience Cloud Device ID (or MID). See also, Cookies and the 蜜豆视频 Experience Platform Identity Service
All Segments
All Traits

For field descriptions, see Customer Data Feed Contents Defined.

CDF File Map cdf-file-map

CDF file data appears in the order shown below.

Identifying Arrays

Arrays in a CDF file start and end with the Ctrl + a field separator. This makes the first element in an array appear like a standalone data field. For example, the realized traits array starts with ^A1234. The array delimiter and ID ^B5678 follows this entry. As a result, you might be tempted to think that the first element in the realized traits array is ID 5678 (because it starts with ^B). This is not the case, which is why you need to be familiar with the sequence and structure of a data file. Even though the first element in the realized trait array (or any of the other arrays in a CDF file) starts with ^A, the order of appearance or position in the file defines the start of an array. And, the first element in an array is always separated from the preceding entry by ^A.

Sample CDF File sample-file

A sample CDF file could look similar to the following. We鈥檝e inserted line breaks into this example to help it fit the page.

Customer Data Feed File Naming Conventions cdf-naming-conventions

The sections below list and define the elements in your CDF file name.

CDF File Name: Syntax and Example cdf-file-name

A typical CDF file name contains the elements listed below. Note, italics indicates a variable placeholder:

Syntax

s3://aam-cdf/YOUR-S3-BUCKET-NAME/day=yyyy-mm-dd/hour=hh/AAM-CDF_PARTNER-ID_FILE-SEQUENCE_0.gz

Example

s3://aam-cdf/dataCompany/day=2017-09-14/hour=17/AAM_CDF_1234_0_0_0.gz

In your S3 storage bucket, files are sorted in ascending order by Partner ID (PID), day, and hour.

CDF File Name Elements Defined cdf-file-name-elements

The following table lists and defines the elements in a CDF file name.

File Name Element

Description

s3://aam-cdf/

This is the default, root storage bucket for your CDF file on an Amazon S3 server.

your S3 bucket name

The name of the read-only, S3 bucket that holds your CDF files.

day=yyyy-mm-dd

The date your file was processed.

hour=hh

A time value expressed in 24-hour notation and set in the UTC time zone. See also, Customer Data Feed File Name Times and File Content Times....

partner ID

Your partner ID.

File Sequence_0

Values which identify the file sequence. The sequence increments as follows: 0_0_0 , 0_1_0, 0_2_0....1_0_0

.gz

A gzip file extension. CDF files are gzip compressed.

Customer Data Feed File Processing Notifications cdf-file-processing-notifications

Audience Manager writes a .info file to your S3 directory to let you know when your Customer Data File (CDF) is ready for download. The .info file also includes JSON formatted metadata about the contents of your CDF files. Review this section for information about the syntax and fields used by this notification file.

Sample Info File sample-info-file

Each .info file contains a Files and Totals section. The Files section contains an array that holds specific metrics for each hourly file. The Totals section contains metrics aggregated across all your CDF files for a particular day. The contents of your .info file could look similar to the following example.

{
    "Files": [
        {
            "FileByteSize": 2709730,
            "FileChecksumMD5": "a9ea418e79511642cff11c2a898037dc-1",
            "FileName": "AAM_CDF_1109_000000_0.gz",
            "FileSequenceNumber": 1
        },
        {
            "FileByteSize": 2783351,
            "FileChecksumMD5": "7b469485d60274b6991acd0817855840-3",
            "FileName": "AAM_CDF_1109_000001_0.gz",
            "FileSequenceNumber": 2
        }
    ],
    "Totals": {
        "Day": "2017-09-26",
        "Hour": "18",
        "TotalByteSize": 150092997,
        "TotalNumberFiles": 2
    }
}

Info File Fields Defined info-file-fields-defined

The following tables list and define the elements in a CDF .info file.

Files Object

Field

Description

Files

Starts the array that contains metadata about your CDF files.

FileByteSize

File size in bytes.

FileChecksumMD5

The Amazon S3 ETag. The number following the hyphen shows the number of parts used to build the file during the multi-part upload. The ETag is not identical to the MD5 checksum of the file.

FileName

The file name. See Customer Data Feed File Naming Conventions.

FileSequenceNumber

An index number for each file.

Totals Object

Field

Description

Totals

Starts the object that contains aggregated data about all your CDF files.

Day

The day for which the data is available. Uses yyyy-mm-dd format.

Hour

The hour for which data is available. Uses 24-hour format set in UTC time zone.

TotalByteSize

Total size of all your CDF files for that date in bytes.

TotalNumberFiles

Total number of files uploaded to your S3 directory.

Customer Data Feed File Name Times and File Content Times are Different different-processing-times

Your CDF file contains timestamps in the file name and file contents. These timestamps record different event processes for the same CDF file. It is not uncommon to see different timestamps in the name and contents of the same file. Understanding each timestamp can help you avoid common mistakes when working with this data or trying to sort it by time.

Locating CDF File Timestamps locating-timestamps

CDF files record time differently in 2 separate locations.

Understanding the Difference Between Timestamps understanding-timestamps

The following table provides additional details about your CDF file timestamps along with information about how to use them properly.

Timestamp Location

Description

File Name

The timestamp in your CDF file name marks the time when Audience Manager started preparing your file for delivery. This timestamp is set in the UTC time zone. It uses the hour= parameter, with time formatted as a 2-digit hour in 24-hour notation. This time can be different than the event time recorded in the file contents. When working with CDF files, sometimes you鈥檒l notice that your S3 bucket is empty for a particular hour. An empty bucket means can mean either of the following:

There鈥檚 no data for that particular hour.
Our servers are under heavy loads and can鈥檛 process files for a particular hour. When the server catches up, it puts the files that should have gone in an earlier time bucket files into a bucket with a later time value. For example, you鈥檒l see this when a file that should have been in the hour 17 bucket appear in the hour 18 bucket (with hour=18 in the file name). In this case, the server probably started processing your file in hour 17 but couldn鈥檛 complete it within that time interval. Instead, the file gets pushed to the next hourly time bucket.

Important: Do not use the file name timestamp to group events by time. If you need to group by time, use the EventTime timestamp in the file contents.

File Contents

The timestamp in your CDF file contents marks the time the Data Collection Servers started processing the file. This timestamp is set in the UTC time zone. It uses the EventTime field, with time formatted as yyyy-mm-dd hh:mm:ss. This time is close to the actual time of the event on the page, but it can be different than the hour indicator in the file name.
Tip: Unlike the hour= timestamp in the file name, you can use EventTime to group data by time.

recommendation-more-help

de293fbf-b489-49b0-8daa-51ed303af695