Fundamentals of data management with ÃÛ¶¹ÊÓƵ Campaign workflows
Learn what targeting dimensions and working tables are, and how ÃÛ¶¹ÊÓƵ Campaign manages data across different data sources.
Transcript
In this video, I will cover the fundamentals of data management with ÃÛ¶¹ÊÓƵ Campaign, focusing on how the system handles multiple data sources. I will explain the two key fundamentals of data management with Campaign, the targeting dimensions and the working tables. And we will then look at the basic use cases for data management with workflows, and I will explain how the system handles them. The targeting dimension is the type of data a workflow is handling. It is defined by what you are querying. So if you are querying a recipient, then your targeting dimension is the recipient. If you query an order, your targeting dimension is the order. In most cases, the targeting dimension is defined by the first query activity and lives across the whole workflow until the end. For example, if you start by querying on recipient, the outbound transition will contain data of type recipient, and then the next activity will know that we are working on recipients. The targeting dimension is bound to a data schema, which means you can access any information of the linked schemas. You can change the targeting dimension within a workflow, but only to the targeting dimension of a schema that is linked to your initial targeting dimension. So for some complex use cases, you may have two different workflow lanes on the canvas, and each lane can have its own targeting dimension. The second fundamental is the working table. In the context of multi-data sources, you need to understand what a working table is and how it works. The working table is also known as a temporary table, and it stores the results of any query in the workflow. It is visible from the outbound transition of each workflow activity, so when you look at the outbound transition, the results that are displayed are stored on the working table. It’s not directly the data from the main table you’re looking at. Working tables are created by default on the same database as the targeting dimension to ensure high performance. But when an activity requires to reconcile data with another data source, which is on another database, then the working table is first moved to the secondary data source, and then reconciled with the additional data. The reconciliation can be done through an enrichment or union activity, for example. ÃÛ¶¹ÊÓƵ Campaign manages the multiple data sources by copying the working table from one place to another. It uses the workflow to speed the process up. Once the operation is completed, the result is automatically copied back to the location of the targeting dimension. When the workflow is finished, the working table is deleted, except of course if you have enabled the option to keep it. Now let’s take a look at the three basic use cases, which will exemplify what I just explained. In the first use case, we query the recipients table, and then enrich it with data from the orders table. The recipients table is located on the local database. The orders table, that is linked to the recipient schema, is stored on the remote database. So what happens when you query on the recipients? A working table is created on the local database where the recipient targeting dimension is located. When the recipient’s data is enriched with the order data, the recipient’s working table is first copied to the remote database. Where the other data table is located. Then the enrichment is performed, and the final result is copied back to the local database where the targeting dimension is, the orders. Let’s see how it works in the campaign instance. So we see that we have a recipients table that is on the local database, and the second table, the orders table, that is located on the remote database, which for this instance is Snowflake. When you look at the example, we have a temporary table, and the targeting dimension is recipients. When we enrich the data with the data from the orders table, which is remote, and let’s display the schema on the outbound transition there, we have another temporary table for orders. But note the targeting dimension remains the recipient. Let’s take a closer look at the SQL logs and see what happens, or what has happened. We see that the working table was first created on the local database when querying the recipients. The table was then moved from the local database to the remote database. Actually it was copied to the remote database. And once the enrichment was completed, the working table was copied back from the remote database to the local database. In the second use case, we invert the queries. So we will get exactly the same result, but in this case we start querying the orders, and then enriching these with the recipient data. So there’s one fundamental difference between the two use cases. Because we started querying the queries, we started querying the orders, because we started querying on the order, the targeting dimension is now the order. So when the data is enriched with the recipient data, which is on the other database, the working tables will be copied back and forth, like in the first example, but in the other direction. So the data is copied from the remote database to the local database, and the results will then flow back to the targeting dimension on the remote database. If we look at the second example, we have the query on the order, we see that we have a temporary table which is linked to the order, so the targeting dimension is the order. And when we enrich with the recipients, the targeting dimension remains unchanged, and it is still the order. So it’s really the same thing as with use case one, just the other way around. And when we look at the logs, we see that we started querying on the external database, and then the work table was copied to the local database, and once the enrichment was done, it was copied back to the remote database. The results, as I said, are the same as in the first use case. The bottom line is, if you start querying on Snowflake, the remote database, your results will land on Snowflake, and if you start on the local database, they will land on the local database. In the last use case, we will change the targeting dimension. And if you remember, this only works with linked schemas. So we start querying on the order, which is on the remote database. Then we change the targeting dimension from orders to recipients. So the work table will be copied from the remote to the local database, and will remain there because the new targeting dimension is located there. So let’s go to the instance and take a look. So we query on the order. You can see the targeting dimension is order. But when we change the dimension, you can see we are now on the recipients. And if we look at the logs, we see only one single copy from remote to local. In a nutshell, ÃÛ¶¹ÊÓƵ Campaign is able to handle a data model that is split across multiple data sources. It works automatically, seamlessly, and efficiently, using bulk load to copy the data, which makes it pretty fast. Most of the time, you don’t need to care about where the data is stored. Things happen under the hood and the workflow takes care of it. It’s all automated. But it is important for you to understand the fundamentals of working tables and targeting dimensions, and also when the data will be copied from one place to the other. This will help you optimize the design of your queries based on the amount of data you are manipulating. You can optimize the performance of your workflow by changing the targeting dimension to force the workflow to use a database engine that is different from the initial targeting dimension. Thank you for watching!
recommendation-more-help
7a30e547-2454-4e63-af7a-073312f6c5cb