ÃÛ¶¹ÊÓƵ

Guardrails for Data Ingestion

IMPORTANT
Guardrails for batch and streaming ingestion are calculated at the organization level and not the sandbox level. This means that your data usage per sandbox is bound to the total license usage entitlement that corresponds with your entire organization. Additionally, data usage in development sandboxes are limited to 10% of your total profiles. For more information about license usage entitlement, read the data management best practices guide.

Guardrails are thresholds that provide guidance for data and system usage, performance optimization, and avoidance of errors or unexpected results in ÃÛ¶¹ÊÓƵ Experience Platform. Guardrails can refer to your usage or consumption of data and processing in relation to your licensing entitlements.

IMPORTANT
Check your license entitlements in your Sales Order and corresponding on actual usage limits in addition to this guardrails page.

This document provides guidance on guardrails for data ingestion in ÃÛ¶¹ÊÓƵ Experience Platform.

Guardrails for batch ingestion

The following table outlines guardrails to consider when using the batch ingestion API or sources:

Type of ingestion
Guidelines
Notes
Data lake ingestion using the batch ingestion API
  • You can ingest up to 20 GB of data per hour to data lake using the batch ingestion API.
  • The maximum number of files per batch is 1500.
  • The maximum batch size is 100 GB.
  • The maximum number of properties or fields per row is 10000.
  • The maximum number of batches per minute, per user is 2000.
Data lake ingestion using batch sources
  • You can ingest up to 200 GB of data per hour to data lake using batch ingestion sources such as Azure Blob, Amazon S3, and SFTP.
  • A batch size should be between 256 MB and 100 GB. This applies to both uncompressed and compressed data. When compressed data is uncompressed in the data lake, these limitations will apply.
  • The maximum number of files per batch is 1500.
  • The minimum size of a file or folder is 1 byte. You cannot ingest 0 byte size files or folders.
Read the sources overview for a catalog of sources you can use for data ingestion.
Batch ingestion to Profile
  • The maximum size of a record class is 100 KB (hard).
  • The maximum size of an ExperienceEvent class is 10 KB (hard).
Number of Profile or ExperienceEvent batches ingested per day
The maximum number of Profile or ExperienceEvent batches ingested per day is 90. This means that the combined total of Profile and ExperienceEvent batches ingested each day cannot exceed 90. Ingesting additional batches will affect system performance.
This is a soft limit. It is possible to go beyond a soft limit, however, soft limits provide a recommended guideline for system performance.
Encrypted data ingestion
The maximum supported size of a single encrypted file is 1 GB. For example, while you can ingest 2 or more GBs worth of data in a single dataflow run, no individual file in the dataflow run can exceed 1 GB.
The process of ingesting encrypted data may take longer than that of a regular data ingestion. Read the encrypted data ingestion API guide for more information.
Upsert batch ingestion
Ingestion of upsert batches can be up to 10x slower than regular batches, therefore, you should keep your upsert batches under two million records in order to ensure an efficient runtime and to avoid blocking other batches from being processed in the sandbox.
While you can undoubtedly ingest batches that exceed two million records, the time of your ingestion will be significantly longer due to the limitations of small sandboxes.

Guardrails for streaming ingestion

Read the streaming ingestion overview for information on guardrails for streaming ingestion.

Guardrails for streaming sources

The following table outlines guardrails to consider when using the streaming sources:

Type of ingestion
Guidelines
Notes
Streaming sources
  • The maximum record size is 1 MB, with the recommended size being 10 KB.
  • Streaming sources support between 4000 to 5000 requests per second when ingesting to the data lake. This applies for both newly created source connections in addition to existing source connections. Note: It can take up to 30 minutes for streaming data to be completely processed to data lake.
  • Streaming sources support a maximum of 1500 requests per second when ingesting data to profile or streaming segmentation.
Streaming sources such as Kafka, Azure Event Hubs, and Amazon Kinesis do not use the Data Collection Core Service (DCCS) route and can have different throughput limits. See the sources overview for a catalog of sources you can use for data ingestion.

Next steps

See the following documentation for more information on other Experience Platform services guardrails, on end-to-end latency information, and licensing information from Real-Time CDP Product Description documents:

recommendation-more-help
2ee14710-6ba4-4feb-9f79-0aad73102a9a