Guardrails for Data Ingestion

Documentation Experience Platform Data Ingestion Guide

Last update: Wed Mar 12 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

Topics:
Data Ingestion

CREATED FOR:

Developer

IMPORTANT

Guardrails for batch and streaming ingestion are generally calculated at the organization level and not the sandbox level. This means that your data usage per sandbox is bound to the total license usage entitlement that corresponds with your entire organization. Additionally, data usage in development sandboxes are limited to 10% of your total profiles. For more information about license usage entitlement, read the data management best practices guide.

Guardrails are thresholds that provide guidance for data and system usage, performance optimization, and avoidance of errors or unexpected results in 蜜豆视频 Experience Platform. Guardrails can refer to your usage or consumption of data and processing in relation to your licensing entitlements.

IMPORTANT

Check your license entitlements in your Sales Order and corresponding on actual usage limits in addition to this guardrails page.

This document provides guidance on guardrails for data ingestion in 蜜豆视频 Experience Platform.

Guardrails for batch ingestion

The following table outlines guardrails to consider when using the batch ingestion API or sources:

Type of ingestion

Guidelines

Notes

Data lake ingestion using the batch ingestion API

You can ingest up to 20 GB of data per hour to data lake using the batch ingestion API.
The maximum number of files per batch is 1500.
The maximum batch size is 100 GB.
The maximum number of properties or fields per row is 10000.
The maximum number of batches per minute, per user is 2000.

Data lake ingestion using batch sources

You can ingest up to 200 GB of data per hour to data lake using batch ingestion sources such as Azure Blob, Amazon S3, and SFTP.
A batch size should be between 256 MB and 100 GB. This applies to both uncompressed and compressed data. When compressed data is uncompressed in the data lake, these limitations will apply.
The maximum number of files per batch is 1500.
The minimum size of a file or folder is 1 byte. You cannot ingest 0 byte size files or folders.

Read the sources overview for a catalog of sources you can use for data ingestion.

Batch ingestion to Profile

The maximum size of a record class is 100 KB (hard).
The maximum size of an ExperienceEvent class is 10 KB (hard).

Number of Profile or ExperienceEvent batches ingested per day

The maximum number of Profile or ExperienceEvent batches ingested per day is 90 per sandbox. This means that the combined total of Profile and ExperienceEvent batches ingested each day cannot exceed 90. Ingesting additional batches will affect system performance.

This is a soft limit. It is possible to go beyond a soft limit, however, soft limits provide a recommended guideline for system performance. Additionally, this guardrail is on a per sandbox basis, not a per organization basis.

Encrypted data ingestion

The maximum supported size of a single encrypted file is 1 GB. For example, while you can ingest 2 or more GBs worth of data in a single dataflow run, no individual file in the dataflow run can exceed 1 GB.

The process of ingesting encrypted data may take longer than that of a regular data ingestion. Read the encrypted data ingestion API guide for more information.

Upsert batch ingestion

Ingestion of upsert batches can be up to 10x slower than regular batches, therefore, you should keep your upsert batches under two million records in order to ensure an efficient runtime and to avoid blocking other batches from being processed in the sandbox.

While you can undoubtedly ingest batches that exceed two million records, the time of your ingestion will be significantly longer due to the limitations of small sandboxes.

Guardrails for streaming ingestion

Read the streaming ingestion overview for information on guardrails for streaming ingestion.

Guardrails for streaming sources

The following table outlines guardrails to consider when using the streaming sources:

Type of ingestion

Guidelines

Notes

Streaming sources

The maximum record size is 1 MB, with the recommended size being 10 KB.
Streaming sources support between 4000 to 5000 requests per second when ingesting to the data lake. This applies for both newly created source connections in addition to existing source connections. Note: It can take up to 30 minutes for streaming data to be completely processed to data lake.
Streaming sources support a maximum of 1500 requests per second when ingesting data to profile or streaming segmentation.

Streaming sources such as Kafka, Azure Event Hubs, and Amazon Kinesis do not use the Data Collection Core Service (DCCS) route and can have different throughput limits. See the sources overview for a catalog of sources you can use for data ingestion.

Next steps

See the following documentation for more information on other Experience Platform services guardrails, on end-to-end latency information, and licensing information from Real-Time CDP Product Description documents:

Real-Time CDP guardrails
End-to-end latency diagrams for various Experience Platform services.

recommendation-more-help

2ee14710-6ba4-4feb-9f79-0aad73102a9a