Batch ingestion developer guide
This document provides a comprehensive guide to using in ۶Ƶ Experience Platform. For an overview of batch ingestion APIs, including prerequisites and best practices, please begin by reading the batch ingestion API overview.
The appendix to this document provides information for formatting data to be used for ingestion, including sample CSV and JSON data files.
Getting started
The API endpoints used in this guide is part of the . Batch ingestion is provided through a RESTful API where you can perform basic CRUD operations against the supported object types.
Before continuing, please review the batch ingestion API overview and the getting started guide.
Ingest JSON files
Create batch
Firstly, you will need to create a batch, with JSON as the input format. When creating the batch, you will need to provide a dataset ID. You will also need to ensure that all the files uploaded as part of the batch conform to the XDM schema linked to the provided dataset.
isMultiLineJson
flag will need to be set. For more information, please read the batch ingestion troubleshooting guide.API format
POST /batches
Request
curl -X POST https://platform.adobe.io/data/foundation/import/batches \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'Content-Type: application/json' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}' \
-d '{
"datasetId": "{DATASET_ID}",
"inputFormat": {
"format": "json"
}
}'
{DATASET_ID}
Response
{
"id": "{BATCH_ID}",
"imsOrg": "{ORG_ID}",
"updated": 0,
"status": "loading",
"created": 0,
"relatedObjects": [
{
"type": "dataSet",
"id": "{DATASET_ID}"
}
],
"version": "1.0.0",
"tags": {},
"createdUser": "{USER_ID}",
"updatedUser": "{USER_ID}"
}
{BATCH_ID}
{DATASET_ID}
Upload files
Now that you have created a batch, you can use the batch ID from the batch creation response to upload files to the batch. You can upload multiple files to the batch.
API format
PUT /batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}
{BATCH_ID}
{DATASET_ID}
{FILE_NAME}
Request
curl -X PUT https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}.json \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'content-type: application/octet-stream' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}' \
--data-binary "@{FILE_PATH_AND_NAME}.json"
{FILE_PATH_AND_NAME}
acme/customers/campaigns/summer.json
.Response
200 OK
Complete batch
Once you have finished uploading all the different parts of the file, you will need to signal that the data has been fully uploaded, and that the batch is ready for promotion.
API format
POST /batches/{BATCH_ID}?action=COMPLETE
{BATCH_ID}
Request
curl -X POST "https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}?action=COMPLETE" \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
Response
200 OK
Ingest Parquet files ingest-parquet-files
Create batch
Firstly, you will need to create a batch, with Parquet as the input format. When creating the batch, you will need to provide a dataset ID. You will also need to ensure that all the files uploaded as part of the batch conform to the XDM schema linked to the provided dataset.
Request
curl -X POST "https://platform.adobe.io/data/foundation/import/batches" \
-H "Authorization: Bearer {ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-H "x-gw-ims-org-id: {ORG_ID}" \
-H "x-api-key: {API_KEY}" \
-H "x-sandbox-name: {SANDBOX_NAME}"
-d '{
"datasetId": "{DATASET_ID}",
"inputFormat": {
"format": "parquet"
}
}'
{DATASET_ID}
Response
201 Created
{
"id": "{BATCH_ID}",
"imsOrg": "{ORG_ID}",
"updated": 0,
"status": "loading",
"created": 0,
"relatedObjects": [
{
"type": "dataSet",
"id": "{DATASET_ID}"
}
],
"version": "1.0.0",
"tags": {},
"createdUser": "{USER_ID}",
"updatedUser": "{USER_ID}"
}
{BATCH_ID}
{DATASET_ID}
{USER_ID}
Upload files
Now that you have created a batch, you can use the batchId
from before to upload files to the batch. You can upload multiple files to the batch.
API format
PUT /batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}
{BATCH_ID}
{DATASET_ID}
{FILE_NAME}
Request
curl -X PUT https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}.parquet \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'Content-Type: application/octet-stream' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}' \
--data-binary "@{FILE_PATH_AND_NAME}.parquet"
{FILE_PATH_AND_NAME}
acme/customers/campaigns/summer.parquet
.Response
200 OK
Complete batch
Once you have finished uploading all the different parts of the file, you will need to signal that the data has been fully uploaded, and that the batch is ready for promotion.
API format
POST /batches/{BATCH_ID}?action=complete
{BATCH_ID}
Request
curl -X POST https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}?action=COMPLETE \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
Response
200 OK
Ingest large Parquet files
Create batch
Firstly, you will need to create a batch, with Parquet as the input format. When creating the batch, you will need to provide a dataset ID. You will also need to ensure that all the files uploaded as part of the batch conform to the XDM schema linked to the provided dataset.
API format
POST /batches
Request
curl -X POST https://platform.adobe.io/data/foundation/import/batches \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'Content-Type: application/json' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
-d '{
"datasetId": "{DATASET_ID}",
"inputFormat": {
"format": "parquet"
}
}'
{DATASET_ID}
Response
201 Created
{
"id": "{BATCH_ID}",
"imsOrg": "{ORG_ID}",
"updated": 0,
"status": "loading",
"created": 0,
"relatedObjects": [
{
"type": "dataSet",
"id": "{DATASET_ID}"
}
],
"version": "1.0.0",
"tags": {},
"createdUser": "{USER_ID}",
"updatedUser": "{USER_ID}"
}
{BATCH_ID}
{DATASET_ID}
{USER_ID}
Initialize large file
After creating the batch, you will need to initialize the large file before uploading chunks to the batch.
API format
POST /batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}
{BATCH_ID}
{DATASET_ID}
{FILE_NAME}
Request
curl -X POST https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}.parquet?action=INITIALIZE \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
Response
201 Created
Upload large file chunks
Now that the file has been created, all subsequent chunks can be uploaded by making repeated PATCH requests, one for each section of the file.
API format
PATCH /batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}
{BATCH_ID}
{DATASET_ID}
{FILE_NAME}
Request
curl -X PATCH https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}.parquet \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'Content-Type: application/octet-stream' \
-H 'Content-Range: bytes {CONTENT_RANGE}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}' \
--data-binary "@{FILE_PATH_AND_NAME}.parquet"
{CONTENT_RANGE}
{FILE_PATH_AND_NAME}
acme/customers/campaigns/summer.json
.Response
200 OK
Complete large file
Now that you have created a batch, you can use the batchId
from before to upload files to the batch. You can upload multiple files to the batch.
API format
POST /batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}
{BATCH_ID}
{DATASET_ID}
{FILE_NAME}
Request
curl -X POST https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}.parquet?action=COMPLETE \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
Response
201 Created
Complete batch
Once you have finished uploading all the different parts of the file, you will need to signal that the data has been fully uploaded, and that the batch is ready for promotion.
API format
POST /batches/{BATCH_ID}?action=COMPLETE
{BATCH_ID}
Request
curl -X POST https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}?action=COMPLETE \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
Response
200 OK
Ingest CSV files
In order to ingest CSV files, you’ll need to create a class, schema, and a dataset that supports CSV. For detailed information on how to create the necessary class and schema, follow the instructions provided in the ad-hoc schema creation tutorial.
Create dataset
After following the instructions above to create the necessary class and schema, you’ll need to create a dataset that can support CSV.
API format
POST /catalog/dataSets
Request
curl -X POST https://platform.adobe.io/data/foundation/catalog/dataSets \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'Content-Type: application/json' \
-H 'x-api-key: {API_KEY}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
-d '{
"name": "{DATASET_NAME}",
"schemaRef": {
"id": "https://ns.adobe.com/{TENANT_ID}/schemas/{SCHEMA_ID}",
"contentType": "application/vnd.adobe.xed+json;version=1"
}
}'
{TENANT_ID}
{SCHEMA_ID}
Create batch
Next, you will need to create a batch with CSV as the input format. When creating the batch, you will need to provide a dataset ID. You will also need to ensure that all of the files uploaded as part of the batch conform to the schema linked to the provided dataset.
API format
POST /batches
Request
curl -X POST https://platform.adobe.io/data/foundation/import/batches \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'Content-Type: application/json' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
-d '{
"datasetId": "{DATASET_ID}",
"inputFormat": {
"format": "csv"
}
}'
{DATASET_ID}
Response
201 Created
{
"id": "{BATCH_ID}",
"imsOrg": "{ORG_ID}",
"updated": 0,
"status": "loading",
"created": 0,
"relatedObjects": [
{
"type": "dataSet",
"id": "{DATASET_ID}"
}
],
"version": "1.0.0",
"tags": {},
"createdUser": "{USER_ID}",
"updatedUser": "{USER_ID}"
}
{BATCH_ID}
{DATASET_ID}
{USER_ID}
Upload files
Now that you have created a batch, you can use the batchId
from before to upload files to the batch. You can upload multiple files to the batch.
API format
PUT /batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}
{BATCH_ID}
{DATASET_ID}
{FILE_NAME}
Request
curl -X PUT https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}.csv \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'Content-Type: application/octet-stream' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}' \
--data-binary "@{FILE_PATH_AND_NAME}.csv"
{FILE_PATH_AND_NAME}
acme/customers/campaigns/summer.csv
.Response
200 OK
Complete batch
Once you have finished uploading all of the different parts of the file, you will need to signal that the data has been fully uploaded, and that the batch is ready for promotion.
API format
POST /batches/{BATCH_ID}?action=COMPLETE
Request
curl -X POST https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}?action=COMPLETE \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
Response
200 OK
Cancel a batch
While the batch is processing, it can still be cancelled. However, once a batch is finalized (such as either a success or failed state), the batch cannot be cancelled.
API format
POST /batches/{BATCH_ID}?action=ABORT
{BATCH_ID}
Request
curl -X POST https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}?action=ABORT \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
Response
200 OK
Delete a batch delete-a-batch
A batch can be deleted by performing the following POST request with the action=REVERT
query parameter to the ID of the batch you wish to delete. The batch is the marked as “inactive”, making it eligible for garbage collection. The batch will be asynchronously collected, at which time it will be marked as “deleted”.
API format
POST /batches/{BATCH_ID}?action=REVERT
{BATCH_ID}
Request
curl -X POST https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}?action=REVERT \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
Response
200 OK
Patch a batch
Occasionally it may be necessary to update data in your organization’s Profile store. For example, you may need to correct records or change an attribute value. ۶Ƶ Experience Platform supports the update or patch of Profile store data through an upsert action or “patching a batch”.
The following is required in order to patch a batch:
- A dataset enabled for Profile and attribute updates. This is done through dataset tags and requires a specific
isUpsert:true
tag be added to theunifiedProfile
array. For details steps showing how to create a dataset or configure an existing dataset for upsert, follow the tutorial for enabling a dataset for Profile updates. - A Parquet file containing the fields to be patched and identity fields for the Profile. The data format for patching a batch is similar to the regular batch ingestion process. The input required is a Parquet file, and in addition to the fields to be updated, the uploaded data must contain the identity fields in order to match the data in the Profile store.
Once you have a dataset enabled for Profile and upsert, and a Parquet file containing the fields you wish to patch as well the necessary identity fields, you can follow the steps for ingesting Parquet files in order to complete the patch via batch ingestion.
Replay a batch
If you want to replace an already ingested batch, you can do so with “batch replay” - this action is equivalent to deleting the old batch and ingesting a new one instead.
Create batch
Firstly, you will need to create a batch, with JSON as the input format. When creating the batch, you will need to provide a dataset ID. You will also need to ensure that all the files uploaded as part of the batch conform to the XDM schema linked to the provided dataset. Additionally, you will need to provide the old batch(es) as reference in the replay section. In the example below, you are replaying batches with IDs batchIdA
and batchIdB
.
API format
POST /batches
Request
curl -X POST https://platform.adobe.io/data/foundation/import/batches \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'Content-Type: application/json' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
-d '{
"datasetId": "{DATASET_ID}",
"inputFormat": {
"format": "json"
},
"replay": {
"predecessors": ["${batchIdA}","${batchIdB}"],
"reason": "replace"
}
}'
{DATASET_ID}
Response
201 Created
{
"id": "{BATCH_ID}",
"imsOrg": "{ORG_ID}",
"updated": 0,
"status": "loading",
"created": 0,
"relatedObjects": [
{
"type": "dataSet",
"id": "{DATASET_ID}"
}
],
"replay": {
"predecessors": [
"batchIdA", "batchIdB"
],
"reason": "replace"
},
"version": "1.0.0",
"tags": {},
"createdUser": "{USER_ID}",
"updatedUser": "{USER_ID}"
}
{BATCH_ID}
{DATASET_ID}
{USER_ID}
Upload files
Now that you have created a batch, you can use the batchId
from before to upload files to the batch. You can upload multiple files to the batch.
API format
PUT /batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}
{BATCH_ID}
{DATASET_ID}
{FILE_NAME}
Request
curl -X PUT https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}/datasets/{DATASET_ID}/files/{FILE_NAME}.json \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'Content-Type: application/octet-stream' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}' \
--data-binary "@{FILE_PATH_AND_NAME}.json"
{FILE_PATH_AND_NAME}
acme/customers/campaigns/summer.json
.Response
200 OK
Complete batch
Once you have finished uploading all the different parts of the file, you will need to signal that the data has been fully uploaded, and that the batch is ready for promotion.
API format
POST /batches/{BATCH_ID}?action=COMPLETE
{BATCH_ID}
Request
curl -X POST https://platform.adobe.io/data/foundation/import/batches/{BATCH_ID}?action=COMPLETE \
-H 'Authorization: Bearer {ACCESS_TOKEN}' \
-H 'x-gw-ims-org-id: {ORG_ID}' \
-H 'x-api-key: {API_KEY}' \
-H 'x-sandbox-name: {SANDBOX_NAME}'
Response
200 OK
Appendix
The following section contains additional information for ingesting data in Experience Platform using batch ingestion.
Data transformation for batch ingestion
In order to ingest a data file into Experience Platform, the hierarchical structure of the file must comply with the Experience Data Model (XDM) schema associated with the dataset being uploaded to.
Information on how to map a CSV file to comply with an XDM schema can be found in the sample transformations document, along with an example of a properly formatted JSON data file. Sample files provided in the document can be found here: