Customer Data Feeds customer-data-feeds
Basic information about Customer Data Feed (CDF) files and instructions on how to get started. Start here if you’re interested in receiving CDF files or just want more information.
File Contents and Purpose file-contents-purpose
A CDF file contains the same data that an Audience Manager event call (/event
) sends to our servers. This includes data like user IDs, trait IDs, segment IDs, and all the other parameters captured by an event call. Internal Audience Manager systems processes event data into a CDF file with content organized into fields that appear in a set order. Audience Manager tries to generate CDF files hourly and stores them in a secure, customer-specific bucket on an Amazon S3 server. We provide these files so you can work with Audience Manager data outside of the limits imposed by our user interface.
- Prior to setting up CDF file delivery, please ensure you have the appropriate permissions from third-party data providers for the export of third-party traits. Audience Manager currently does not support functionality in the user interface to request CDF file delivery export permission from Third-Party Data Providers, so please reach out to them independently.
- You should not use CDF files as a proxy to monitor page traffic, reconcile report discrepancies, or for billing, etc.
Getting Started getting-started
There is no self-service process to start CDF file delivery. Contact your Audience Manager consultant or Customer Care to get started. During implementation, your Audience Manager representative will:
- Set up your Amazon S3 storage bucket.
- Provide read-only S3 authentication credentials to your file storage bucket. You will not be able to see or access directories and files that belong to other customers.
File notifications and CDF files will appear in your S3 bucket when they’re ready for download. You’re responsible for monitoring and downloading files from your assigned S3 directory. See Customer Data Feed File Processing Notifications.
Next Steps next-steps
The sections below and the Customer Data Feed FAQ can help you become more familiar with this service.
Customer Data Feed Contents Defined cdf-defined
Lists and defines the data elements and arrays in a CDF file, by order of appearance. Definitions include data types, but this information is not part of a CDF file.
Definitions definitions
A CDF file includes some or all of the fields defined below. For information about internal file organization, see Customer Data Feed File Structure.
Event Time
The time a CDF file was processed by the Data Collection Servers (DCS). The timestamp uses the yyyy-mm-dd hhss format and is set in the UTC time zone.
Note: The Event Time is not:
- The time of the page event or the event call itself, although it may be close to those times.
- Related to the DCS hour in the file name. See also, Customer Data Feed File Name Times and File Content Times....
Device
Container ID
Realized Traits
An array of trait IDs that contains all the traits a visitor realized (qualified for) in the event call.
Note that the array can contain traits for which the visitor had qualified before and for which they re-qualify through this event call.
Realized Segments
Request Parameters
A string that captures all the parameters (variables, IDs, key-value pairs, device advertising IDs, etc.) passed in on the event call.
Shortened example:
d_rtbd:json,c_contextData.a.CarrierName:mobile,c_contextData.a.adid:92D56353-49C5-431E-B474-FC528D585810,c_contextData.a,RunMode:Application,c_contextData.a.DaysSinceLastUpgrade:61,d_cid_ic:xid%01EACB6E40-AC65-4012-9FE9-ABD59965E9C4%011,c_contextData.a.PrevSessionLength:583
Referer Data Type
IP Data Type
MCDevice
All Segments
All Traits
Customer Data Feed File Structure cdf-file-structure
Lists and defines the data structure of a CDF file. This includes data sequence, field delimiters and separators, a data file map, and sample file.
Data Field Identifiers and Sequence identifiers-and-sequence
CDF files do not contain labeled columns or field headers. Instead, a CDF file defines fields and arrays with non-printing ASCII characters. Also, the CDF file lists each field and array in a specific order. Understanding the field identifiers and order will help you parse the file properly.
These non-printing characters define the elements and structure of your CDF file:
- Ctrl + a (ASCII
001
or^A
) separates data in individual fields with a non-printing space indicator. - Ctrl + b (ASCII
002
or^B
) separates data an array and request parameters. - Ctrl + c (ASCII
003
or^C
) defines key-value pairs.
Important: Audience Manager reserves the right to add new fields to the end of the CDF file in future releases. This means the technical design of your file parsing system should not assume a fixed number of columns (though it may assume a fixed order for existing columns).
Data in your CDF file appears in the order shown below. /N may appear in place of any of these fields, indicating a null value.
- Event Time
- Device
- Container ID
- Realized Traits
- Realized Segments
- Request Parameters
- Referer
- IP Address
- Experience Cloud Device ID (or MID). See also, Cookies and the ÃÛ¶¹ÊÓƵ Experience Platform Identity Service
- All Segments
- All Traits
For field descriptions, see Customer Data Feed Contents Defined.
CDF File Map cdf-file-map
CDF file data appears in the order shown below.
Identifying Arrays
Arrays in a CDF file start and end with the Ctrl + a
field separator. This makes the first element in an array appear like a standalone data field. For example, the realized traits array starts with ^A1234
. The array delimiter and ID ^B5678
follows this entry. As a result, you might be tempted to think that the first element in the realized traits array is ID 5678 (because it starts with ^B
). This is not the case, which is why you need to be familiar with the sequence and structure of a data file. Even though the first element in the realized trait array (or any of the other arrays in a CDF file) starts with ^A
, the order of appearance or position in the file defines the start of an array. And, the first element in an array is always separated from the preceding entry by ^A
.
Sample CDF File sample-file
A sample CDF file could look similar to the following. We’ve inserted line breaks into this example to help it fit the page.
Customer Data Feed File Naming Conventions cdf-naming-conventions
The sections below list and define the elements in your CDF file name.
CDF File Name: Syntax and Example cdf-file-name
A typical CDF file name contains the elements listed below. Note, italics indicates a variable placeholder:
Syntax
s3://aam-cdf/YOUR-S3-BUCKET-NAME/day=yyyy-mm-dd/hour=hh/AAM-CDF_PARTNER-ID_FILE-SEQUENCE_0.gz
Example
s3://aam-cdf/dataCompany/day=2017-09-14/hour=17/AAM_CDF_1234_0_0_0.gz
In your S3 storage bucket, files are sorted in ascending order by Partner ID (PID), day, and hour.
CDF File Name Elements Defined cdf-file-name-elements
The following table lists and defines the elements in a CDF file name.
s3://aam-cdf/
your S3 bucket name
day=yyyy-mm-dd
hour=hh
partner ID
File Sequence_0
.gz
Customer Data Feed File Processing Notifications cdf-file-processing-notifications
Audience Manager writes a .info
file to your S3 directory to let you know when your Customer Data File (CDF) is ready for download. The .info
file also includes JSON formatted metadata about the contents of your CDF files. Review this section for information about the syntax and fields used by this notification file.
Sample Info File sample-info-file
Each .info
file contains a Files
and Totals
section. The Files
section contains an array that holds specific metrics for each hourly file. The Totals
section contains metrics aggregated across all your CDF files for a particular day. The contents of your .info
file could look similar to the following example.
{
"Files": [
{
"FileByteSize": 2709730,
"FileChecksumMD5": "a9ea418e79511642cff11c2a898037dc-1",
"FileName": "AAM_CDF_1109_000000_0.gz",
"FileSequenceNumber": 1
},
{
"FileByteSize": 2783351,
"FileChecksumMD5": "7b469485d60274b6991acd0817855840-3",
"FileName": "AAM_CDF_1109_000001_0.gz",
"FileSequenceNumber": 2
}
],
"Totals": {
"Day": "2017-09-26",
"Hour": "18",
"TotalByteSize": 150092997,
"TotalNumberFiles": 2
}
}
Info File Fields Defined info-file-fields-defined
The following tables list and define the elements in a CDF .info
file.
Files Object
Files
FileByteSize
FileChecksumMD5
ETag
is not identical to the MD5 checksum of the file.FileName
FileSequenceNumber
Totals Object
Totals
Day
Hour
TotalByteSize
TotalNumberFiles
Customer Data Feed File Name Times and File Content Times are Different different-processing-times
Your CDF file contains timestamps in the file name and file contents. These timestamps record different event processes for the same CDF file. It is not uncommon to see different timestamps in the name and contents of the same file. Understanding each timestamp can help you avoid common mistakes when working with this data or trying to sort it by time.
Locating CDF File Timestamps locating-timestamps
CDF files record time differently in 2 separate locations.
Understanding the Difference Between Timestamps understanding-timestamps
The following table provides additional details about your CDF file timestamps along with information about how to use them properly.
The timestamp in your CDF file name marks the time when Audience Manager started preparing your file for delivery. This timestamp is set in the UTC time zone. It uses the hour=
parameter, with time formatted as a 2-digit hour in 24-hour notation. This time can be different than the event time recorded in the file contents. When working with CDF files, sometimes you’ll notice that your S3 bucket is empty for a particular hour. An empty bucket means can mean either of the following:
- There’s no data for that particular hour.
- Our servers are under heavy loads and can’t process files for a particular hour. When the server catches up, it puts the files that should have gone in an earlier time bucket files into a bucket with a later time value. For example, you’ll see this when a file that should have been in the hour 17 bucket appear in the hour 18 bucket (with
hour=18
in the file name). In this case, the server probably started processing your file in hour 17 but couldn’t complete it within that time interval. Instead, the file gets pushed to the next hourly time bucket.
Important: Do not use the file name timestamp to group events by time. If you need to group by time, use the EventTime
timestamp in the file contents.
EventTime
field, with time formatted as yyyy-mm-dd hh:mm:ss
. This time is close to the actual time of the event on the page, but it can be different than the hour indicator in the file name.Tip: Unlike the
hour=
timestamp in the file name, you can use EventTime
to group data by time.