Graph based stitching
In graph based stitching, you specify an event dataset as well as the persistent ID (cookie) and the namespace of the transient ID (person ID) for that dataset. Graph-based stitching creates a new column for the stitched ID in the new stitched dataset. And then uses the persistent ID to query the identity graph from the Experience Platform Identity Service, using the namespace specified, to update the stitched ID.
How graph-based stitching works
Stitching makes a minimum of two passes on data in a given dataset.
-
Live stitching: attempts to stitch each hit (event) as it comes in, using the persistent ID to look up the transient id for the selected namespace by querying the identity graph. If the transient id is available from the lookup, this transient id is immediately stitched.
-
Replay stitching: replays data based on updated identities from the identity graph. This stage is where hits from previously unknown devices (persistent IDs) become stitched as the identity graph has resolved the identity for a namespace. The replay is determined by two parameters: frequency and lookback window. ÃÛ¶¹ÊÓƵ offers the following combinations of these parameters:
- Daily lookback on a daily frequency: Data replays every day with a 24-hour lookback window. This option holds an advantage that replays are much more frequent, but unauthenticated visitors must authenticate the same day that they visit your site.
- Weekly lookback on a weekly frequency: Data replays once a week with a weekly lookback window (see options). This option holds an advantage that allows unauthenticated sessions a much more lenient time to authenticate. However, unstitched data less than a week old is not reprocessed until the next weekly replay.
- Biweekly lookback on a weekly frequency: Data replays once every week with a biweekly lookback window (see options). This option holds an advantage that allows unauthenticated sessions a much more lenient time to authenticate. However, unstitched data less than two weeks old is not reprocessed until the next weekly replay.
- Monthly lookback on a weekly frequency: Data replays every week with a monthly lookback window (see options). This option holds an advantage that allows unauthenticated sessions a much more lenient time to authenticate. However, unstitched data less than a month old is not reprocessed until the next weekly replay.
-
Privacy: When privacy-related requests are received, in addition to removing the requested identity from the source dataset, any stitching of that identity across unauthenticated events must be undone. Also, the identity must be removed from the identity graph to prevent future graph-based stitching for that specific identity.
note important IMPORTANT The unstitching process, as part of privacy requests, changes at the start of 2025. The current unstitching process restitches events using the latest version of known identities. This reassignment of events to another identity might have undesirable legal consequences. To remedy these concerns, from 2025 on, the new unstitching process updates events that are subject of the privacy request with the persistent ID.
Data beyond the lookback window is not replayed. A visitor must authenticate within a given lookback window for an unauthenticated visit and an authenticated visit to be identified together. Once a device is recognized, it is live stitched from that point forward.
Consider the following two identity graphs for persistent id 246
and 3579
, how these identity graphs are updated over time, and how these updates impact the steps in graph-based stitching.
You can view an identity graph over time for a specific profile using the Identity Graph Viewer. See also Identity Service linking logic to get a better understanding of the logic used when linking identities.
Step 1: Live stitching
Live stitching attempts to stitch each event, upon collection, to known information at that time from the identity graph.
table 0-row-5 1-row-5 2-row-5 3-row-5 4-row-5 5-row-5 6-row-5 7-row-5 1-align-right 7-align-right 13-align-right 19-align-right 25-align-right 31-align-right 37-align-right 43-align-right layout-auto | ||||
---|---|---|---|---|
Time | Persistent IDECID |
NamespaceEmail
|
Stitched ID (after live stitch) | |
1 | 2023-05-12 11:00 | 246 |
246
undefined
|
246 |
2 | 2023-05-12 14:00 | 246 |
246
bob.a@gmail.com
|
bob.a@gmail.com |
3 | 2023-05-12 15:00 | 246 |
246
bob.a@gmail.com
|
bob.a@gmail.com |
4 | 2023-05-12 17:00 | 3579 |
3579
undefined
|
3579 |
5 | 2023-05-12 19:00 | 3579 |
3579
ted.w@gmail.com
|
ted.w@gmail.com |
6 | 2023-05-13 15:00 | 246 |
246
bob.a@gmail.com
|
bob.a@gmail.com |
7 | 2023-05-13 16:30 | 246 |
246
a.b@yahoo.co.uk 246
bob.ab@gmail.com
|
a.b@yahoo.co.uk |
You can see how for each event the stitched id is resolved. Based on the time, the persistent id, and the lookup of the identity graph for the specified namespace (at that same time).
When the lookup resolves to more than one stitched id (like for event 7), the lexicographic first id returned by the identity graph is selected (a.b@yahoo.co.uk
in the example).
Step 2: Replay stitching
At regular intervals (depending on the chosen lookback window), replay stitching recalculates historical data based on the most recent version of the identity graph, at the time of the interval.
With a replay stitching happening at 2023-05-13 16:30, with a 24-hour lookback window configuration, some events from the sample are re-stitched (indicated by ).
table 0-row-6 1-row-6 2-row-6 3-row-6 4-row-6 5-row-6 6-row-6 layout-auto | |||||
---|---|---|---|---|---|
Time | Persistent IDECID |
NamespaceEmail
|
Stitched ID (after live stitch) |
Stitched ID (after replay 24 hours) |
|
2 | 2023-05-12 14:00 | 246 |
246
bob.a@gmail.com
|
bob.a@gmail.com |
bob.a@gmail.com |
3 | 2023-05-12 15:00 | 246 |
246
bob.a@gmail.com
|
bob.a@gmail.com |
bob.a@gmail.com |
4 | 2023-05-12 17:00 | 3579 |
3579
ted.w@gmail.com
|
3579 |
ted.w@gmail.com |
5 | 2023-05-12 19:00 | 3579 |
3579
ted.w@gmail.com
|
ted.w@gmail.com |
ted.w@gmail.com |
6 | 2023-05-13 15:00 | 246 |
246
a.b@yahoo.co.uk
|
bob.a@gmail.com |
a.b@yahoo.co.uk |
7 | 2023-05-13 16:30 | 246 |
246
a.b@yahoo.co.uk 246
bob.ab@gmail.com
|
a.b@yahoo.co.uk |
a.b@yahoo.co.uk |
With replay stitching happening at 2023-05-13 16:30, with a 7-day lookback window configuration, all events from the sample are re-stitched.
table 0-row-6 1-row-6 2-row-6 3-row-6 4-row-6 5-row-6 6-row-6 7-row-6 layout-auto | |||||
---|---|---|---|---|---|
Time | Persistent IDECID |
NamespaceEmail
|
Stitched ID (after live stitch) |
Stitched ID (after replay 7 days) |
|
1 | 2023-05-12 11:00 | 246 |
246
undefined
|
246 |
a.b@yahoo.co.uk |
2 | 2023-05-12 14:00 | 246 |
246
bob.a@gmail.com
|
bob.a@gmail.com |
a.b@yahoo.co.uk |
3 | 2023-05-12 15:00 | 246 |
246
bob.a@gmail.com
|
bob.a@gmail.com |
a.b@yahoo.co.uk |
4 | 2023-05-12 17:00 | 3579 |
3579
ted.w@gmail.com
|
3579 |
ted.w@gmail.com |
5 | 2023-05-12 19:00 | 3579 |
3579
ted.w@gmail.com
|
ted.w@gmail.com |
ted.w@gmail.com |
6 | 2023-05-13 15:00 | 246 |
246
a.b@yahoo.co.uk
|
bob.a@gmail.com |
a.b@yahoo.co.uk |
7 | 2023-05-13 16:30 | 246 |
246
a.b@yahoo.co.uk 246
bob.ab@gmail.com
|
a.b@yahoo.co.uk |
a.b@yahoo.co.uk |
Step 3: Privacy Request
When you receive a privacy request, the stitched id is deleted in all records for the user subject of the privacy request.
The following table represents the same data as above, but shows the effect that a privacy request (for example at 2023-05-13 18:00) has for the sample events.
table 0-row-5 1-row-5 2-row-5 3-row-5 4-row-5 5-row-5 6-row-5 7-row-5 1-align-right 7-align-right 13-align-right 19-align-right 25-align-right 31-align-right 37-align-right 43-align-right layout-auto | ||||
---|---|---|---|---|
Time | Persistent IDECID |
NamespaceEmail
|
Stitched ID (after privacy request) | |
1 | 2023-05-12 11:00 | 246 |
246
a.b@yahoo.co.uk
|
246 |
2 | 2023-05-12 14:00 | 246 |
246
a.b@yahoo.co.uk
|
246 |
3 | 2023-05-12 15:00 | 246 |
246
a.b@yahoo.co.uk
|
246 |
4 | 2023-05-12 17:00 | 3579 |
3579
ted.w@gmail.com
|
3579 |
5 | 2023-05-12 19:00 | 3579 |
3579
ted.w@gmail.com
|
3579 |
6 | 2023-05-13 15:00 | 246 |
246
a.b@yahoo.co.uk
|
246 |
7 | 2023-05-13 16:30 | 246 |
246
a.b@yahoo.co.uk 246
bob.ab@gmail.com
|
246 |
Prerequisites
The following prerequisites apply specifically to graph-based stitching:
- The event dataset in ÃÛ¶¹ÊÓƵ Experience Platform, to which you want to apply stitching, must have one column that identifies a visitor on every row, the persistent ID. For example, a visitor ID generated by an ÃÛ¶¹ÊÓƵ Analytics AppMeasurement library or an ECID generated by the Experience Platform Identity Service.
- The persistent ID must also be defined as an identity in the schema.
- The identity graph from Experience Platform Identity Service must have a namespace (for example
Email
, orPhone
) you want to use during stitching to resolve the transient ID. See Experience Platform Identity Service for more information.
Limitations
The following limitations apply specifically to graph-based stitching:
-
Timestamps are not taken into account when querying for the transient id using the specified namespace. So, it is possible that a persistent ID is stitched with a transient ID from a record that has an earlier timestamp.
-
No shared device support. When multiple identities are returned, by querying the identity graph using a namespace, the first lexicographic identity is used.
-
There is a hard limit of three months of backfilling identities into the identity graph. You would use backfilling identities in case you are not using an Experience Platform application, like Real-time Customer Data Platform, to populate the identity graph.
-
The Identity Service guardrails apply. See, for example, the following static limits:
- Maximum number of identities in a graph: 50.
- Maximum number of links to an identity for a single batch ingestion: 50.
- Maximum number of identities in an XDM record for graph ingestion: 20.
- Minimum number of identities in an XDM record for graph ingestion: 2.