ÃÛ¶¹ÊÓƵ

Data quality in ÃÛ¶¹ÊÓƵ Experience Platform

ÃÛ¶¹ÊÓƵ Experience Platform provides well-defined guarantees for completeness, accuracy, and consistency for any data uploaded through either batch or streaming ingestion. The following document provides a summary of the supported checks and validation behaviors for batch and streaming ingestion in Experience Platform.

Supported checks

Batch Ingestion
Streaming Ingestion
Data type check
Yes
Yes
Enum check
Yes
Yes
Range check (min, max)
Yes
Yes
Required field check
Yes
Yes
Pattern check
No
Yes
Format check
No
Yes

Supported validation behaviors

Both batch and streaming ingestion prevent failed data from going downstream by moving bad data for retrieval and analysis in Data Lake. Data ingestion provides the following validations for batch and streaming ingestion.

Batch ingestion

The following validations are done for batch ingestion:

Validation area
Description
Schema
Ensures that the schema is not empty and contains a reference to the union schema, as follows: "meta:immutableTags": ["union"]
identityField
Ensures that all valid identity descriptors are defined.
createdUser
Ensures that the user who ingested the batch is allowed to ingest the batch.

Streaming ingestion

The following validations are done for streaming ingestion:

Validation area
Description
Schema
Ensures that the schema is not empty and contains a reference to the union schema, as follows: "meta:immutableTags": ["union"]
identityField
Ensures that all valid identity descriptors are defined.
JSON
Ensures that the JSON is valid.
Organization
Ensures that the organization that is listed is a valid organization.
Source name
Ensures that the name of the data source is specified.
Dataset
Ensures that the dataset is specified, enabled, and has not been removed.
Header
Ensures that the header is specified and is valid.

More information about how Platform monitors and validates data can be found in the monitoring data flows documentation.

Identity value validation

The following table outlines existing rules you must follow to ensure a successful validation of your identity value.

Namespace
Validation rule
System behavior when rule is violated
ECID
  • The identity value of an ECID must be exactly 38 characters.
  • The identity value of an ECID must consist of numbers only.
  • If the identity value of ECID is not exactly 38 characters, then the record is skipped.
  • If the identity value of ECID contains non-numerical characters, then the record is skipped.
Non-ECID
The identity value cannot exceed 1024 characters.
If the identity value exceeds 1024 characters, then the record is skipped.

For more information on Identity Service guardrails, see the Identity Service guardrails overview.

recommendation-more-help
2ee14710-6ba4-4feb-9f79-0aad73102a9a