Data Distiller 101
This Data distiller overview will demonstrate how to overcome common data distiller challenges as well as the key uses best practices for success.
Key Discussion Points
- Data distiller overview
- Data Distill FAQ’s and their solutions
- Key use cases
Hi, everyone. We’re going to give about a minute for folks to join.
So as people are filtering in. The topic of our webinar today is Data Distiller 101. So if you’ve been curious about data distiller, the purpose of this webinar is to provide an overview of, data distiller. The key use cases it supports. And those will cover some of the most common customer solutions for data distiller. Some housekeeping items as we get started. This webinar is being recorded and the recording will be shared with you after the session. All participants are in listen only mode. However, if you do have questions, go ahead and post those in the meeting chat pod. And we’ll have time for Q&A. At the end. We’ll go through your questions. And if there’s anything that we are unable to answer live today, we will do our best to take those away. And, follow up.
I’ll go ahead and go through our agenda for today as well. So first and foremost, let me introduce our presenters. So, we firstly have, Russell alarm. Russell is a principal consultant, with ÃÛ¶¹ÊÓƵ for 17 years. He is a multi solution architect focusing on ÃÛ¶¹ÊÓƵ Experience Cloud solution and ÃÛ¶¹ÊÓƵ Experience Platform app services. And I will be the other presenter today. I’m Brenda Scurlock. I’m a senior consultant with ÃÛ¶¹ÊÓƵ for little over two and a half years. I’m a field engineer focusing on experience, platform and related applications.
And our agenda for today is going to include the data distiller overview. So within that first section we’re going to discuss with you the primary business drivers for using Data Distiller.
We’ll cover the value proposition of the tool as well as the key capabilities. And then we’re going to jump into some, use cases. So, we’ll walk you through the primary use case patterns that have been identified for the data distiller product. And then we have, selected five use cases to do a closer walk through on with you. All right. And then we’ll wrap up the rest of the session by providing you with some additional resources to learn more about data Distiller. And then we’ll do Q&A at the end.
All right. So we’re just about at five after the hour. So I think we can go ahead and get started. I’m going to pass it over to Russell for the data distiller overview.
Thank you very much. Brian. So we’re going to do, an overview of what Data Distillery is all about. I wanted to go to the next slide and we’ll, discuss about the main, drivers. We use the in data distiller. So usually, the different we have different person is when it comes to, uses of data distiller. So you have data architect. You know, did engineer, data scientist and also the marketing entities. Why is it that, there is, the opportunity here for, this person is, to actually, leverage data distill primarily because, when you have ingested the data in, in, the data, like the sometimes additional, data messaging that needs to happen. So an example here would be that, as an also, I want to be able to, to segment to use segment that the business needs. We curated or even additional contextual data that are needed.
And that means that sometimes, after you have ingested the data, in the data, the these additional insights that are needed from the data that you have ingested. So what you can do, you can use data distiller by using query service. You retrieve the insights from, the data that you have ingested. And example here would be a customer lifetime value usually. This particular data point is not available as part of the data that you have ingested. If it if it’s not available, what you can do, you can leverage data distilling to actually bring that data out and, and leverage that, in, in your marketing, your segments. The other aspect here is, that, I want to use, this data in ways that the different use cases of my stakeholders. The idea here is that, the data, comes, from multiple areas of the business. You have multiple use cases. So sometimes you need to bring all that data together, right? Some insights, from, from those multiple data points. Right. So, how do you actually, deliver, multiple use cases? All those data is by actually curating the data from those data sets, and you derive all the data from, let’s say, for example, you have five data sets. You have your you have ingested. You want to actually derive information from these five data sets. So what you can do, you use data distiller to bring all these data into one data set, which is called a derived data set. What then does it satisfy multiple stakeholders? Desire to use the data and also, multiple use cases that you have. And the other aspect whereby we see, this data being used is, from gaining operational and transparency insights around microphone. So usually what happens is that you have your data, from, ÃÛ¶¹ÊÓƵ Journey Optimizer. You have your data from, the ÃÛ¶¹ÊÓƵ Experience platform in the Data Lake. What you want to be able to do is that you want to be able to report on those data from Journey Optimizer, and, your, your segments in in AP. So what you can do, you can bring these information together and create operational dashboards that allows you to get to gain insights, from information like, how many emails were sent open and clicked. So, but you can apply this to the different segments. That was that was driving, this information. So you can build these dashboards, using data Distiller. Let’s go on the next slide. So, what what is the value proposition here? Is that, you have your three main use cases that we have come across. As I mentioned, we have the derived data set whereby you have like multiple data set that you have ingested and then what you can do, you run your your sequel queries, you bring all these data, set information together into one derived data set and then leverage that data set, for your segmentation.
The other aspect is regarding, reporting, as mentioned around you and in and CDP, where you are able to bring all those information together and then, create, operational, reporting dashboards, but also, we also launch, in 2023, we launch, the AI and ML feature pipeline. So we will go through these as well. So these are the three main use cases, but we have other use cases that we have come across. But what I would like to to mention here is that the even though you’re going to have all those use cases, those use cases will have full primary, category.
So you will have the cleaning of the data that you need to do. Once you’ve done that, then you can ship the data the way you want, to shape the data, and then you manipulate the data and then you enrich as we go through these use cases, we’re going to dive into the cleaning, the shaping, the manipulating and the enriching of the data. Using a distiller.
Let’s go. On the next slide. So where does data in? Soon as you bring the data into a, you have the destination and the streaming ingestion. And as part of the data ingestion, whether it is streaming in batch, you have the data. The digital prep allows you to make some changes. You can manipulate the data as the data is streaming, into YouTube. You can make that change. But once the data land in the data lake, these, not much, transformation that can be done here. As mentioned, as an example, I’ve given you your own. You have multiple data set. You want to derive, meaningful data from all those, information that you brought into app. And this is where, the red line here, the red line from onward. This is where data displaced us.
So from Data Lake onwards, this is where you will be able to do the cleaning of the data. The, enriching of the data and the shaping and the manipulation of the data. Once you’ve done all these, you can create, all the data set you can create the dashboards, you can send these across to multiple, app services such as customer journey analytics or even on Journey Optimizer and MEP tools that you may have.
Let’s, go on the next slide, I believe, this is the slide whereby we bring up the different capabilities, of of Data Distiller Studio. We have it’s sequel based. So, we have the, the SQL based processing engine. It’s scalable. The, the we we, we measure, data. The use of data is, using, computer.
The other aspect here is that, you have what we call, a do be defined functions. You have anonymous blog, snapshots, incremental processing and sampling, these, features that allows you to, for example, the anonymous blogs allows you to finish, you start with one query and then you go on the next query, you go to the next query on the after the first query has run. So let’s say for example, you want to have you have three data set you are running. This you’re running a query on this three data set. You want to make sure that query finish. Then you move on to the next step of you, you’re on your, you know, findings. So that’s what any of these blocks does.
So, in terms of operationalizing, of data processing workflows, we have and easy automation and scheduling. As part of this, you have also monitoring and alerting. Let’s use the monitoring, in terms of integration workflow for delivering extended insights, as mentioned, you are able to actually connect, you, for example, the derived data set that you have, you have used data as part of, of these, these, manipulation that you have done, of your data. You can connect this with your via the AI tools and, you can extend the reporting, of the capabilities that you currently have in AP using data distiller. You can share all that data as well with any third party clients.
That post on to, you know, I believe.
Yes. Thank you. Russell. All right. So now that you have a good understanding of the, the business purpose of data distiller, let’s talk about some of those specific use cases. All right. So before we dig into the specific use cases, let me outline the for use case patterns or categories that have been identified for data distiller. And these are patterns identified and defined based on customer feedback interactions as well as actual usage of the product.
So our first pattern is clean. And so cleaning allows the user to standardize data, and do rule based filtering such as filtering noisy bot data out, identity cleansing or maybe performing data quality checks. Okay. The second category is going to be shape. So this category, this, allows you to reshape the data. So let’s say you want to be able to reformat your incoming data in terms of array manipulation, transposing the data. Joining, data from an existing data set, or mapping some additional IDs into it. Okay. And you can see some currency standardization. Those are just some of the examples for reshaping of the data.
Our third use case category is manipulate. So this, pattern is where we augment the data, to get to the granularity that we’re looking for. So, for example, this could be adding a new field that includes some aggregate data, to support the AI reporting use cases, or for example, using a windowing function to limit the data, to only data within a defined reporting period.
Right. And then finally, our fourth use case category is enrich. So this, use case pattern, allows enrichment of the data set by deriving additional attributes for downstream audience or campaign activation or analysis use cases. All right. So next we’re going to describe some of those specific use cases with you. And you’ll see how they fit into these for use case patterns.
All right. So these are the five use cases that we are going to be walking you through today.
And let me start with use case number one. All right. So this first use case is about a South American retail company that wanted to understand how its customers interact across its multiple brands. The problem was their transaction records, browsing history, CRM data, payment and profile data were all defined at the root company level.
Not at the brand level. So what this meant was that if a customer purchased items from three different brands, or if they logged into, their three different accounts tied to the different brands, the company had no way to see this, across those brands. Silos, right? They only knew that three items had been purchased and that the user had logged in multiple times. Okay, so they only had insight into the totals for the company. But what they wanted was to get more granular at the brand level. So this is where data distiller came into play. So by using data distiller, they were able to take all their data and group it by the individual brands and generate a brand centric data model which was then used within their buy dashboards.
All right. So let’s look at the specific solution here. So you can see how data distiller was leveraged. Under the source tables are the feeds that they were originally ingesting. Okay. So browsing history CRM online and offline transactions, payment app, profile attribute. It’s by using data Distiller. They were able to join all of these data sets together via the ID, customer ID and device IDs. And regenerate that into a data model to find at the brand level, which is the output that you see to the right of those source tables.
So after the data had been shaped and manipulated, we now have, new tables like spend per customer, overall sales, average order value and number of customers. And these are all rolled up to the brand level, which is what they were looking for in their reporting. So now within their reporting tool, they can not only view data at the brand level, but they could also compare brands to each other and understand the synergies and interactions between them.
So just to summarize the key benefits of data distiller in this use case, first this customer was able to create their own data model that was optimized for their reporting needs. Okay. Secondly, the resulting reports allowed them to understand the engagement level of customers by brand as well as the, ability to analyze the demographics of more engaged customers. And finally, data distiller allowed them to view brand performance and visually, as well as comparing those brands to each other.
All right. Let’s look at our second use case. So this use case is for a telecommunications company that needed to enrich their next best to offer emails with more personalized data. They wanted to understand what products the customer was browsing, but not ultimately purchasing. And then use this email or use this, information in an email sent out of campaign. So within period, with using segmentation in step, they were able to find the customers that met that criteria. But the segmentation wasn’t allowing them to extract the details of the products that the customers were viewing but not purchasing. So this is where, data distiller comes into play again. Using data distiller, they’re able to derive those products that were, part of the abandoned cart. And they did this by joining together, browsing history, product pricing, and customer information. So let’s look at let’s take a look at what that specific solution looked like. So in this case, they were able to use data distiller to schedule a query, to run hourly that joined their analytics data. So, you know, the customer browsing behavior to their profile attributes. They were able to within this, select only existing customers. So they were able to filter out any prospects and join this data to their product pricing table to select only the most expensive SKUs browsed. Then they were able to join this to the acid mapping, which allowed them to link that back to anonymous browsing sessions. So they were able to link those back to a customer.
Then, using data distiller, they took this joined data, and they were able to derive those products viewed that were not purchased. Okay. So then they were able to save those UN purchase products as profile level attributes within a new data set. Okay. So there using all of the four pillars of the use case patterns, cleaning the data, reshaping, manipulating and enriching.
Okay. So now they know which profiles abandoned a product. And they’re able to create a segment built only to select those profiles which had on purchase products. And then they’re able to send that segment to ÃÛ¶¹ÊÓƵ Campaign. Additionally, because they now have the actual, product SKUs that were not purchased, they were also able to export, that list of derived products and use that within, their retargeting email as personalization attributes.
Okay. So the key benefits of data distiller in this use case are the ability to derive attributes on a scheduled basis, the ability to use those derived attributes as part of segmentation, and then use those same attributes, to power personalization, downstream.
Okay. And then our third use case here. So this use case, is for a luxury retailer who was looking to optimize their data for reporting and attribution modeling. So this their goals with this was to have more advanced data driven attribution models, to be able to explore customer journeys from email, click to store or site purchase, and then unify that data under a single identity.
Okay, so they were able to use data distiller for this to achieve, this as well as enriching their transaction data with details about the store in which purchases made, as well as the loyalty status of the customer at time of purchase.
Let’s look at this specific solution. So what the customer did in this case, their key source tables that they’re starting with are their transaction details, product information, store information, acquisition loyalty, member information. And these tables were not directly used in CJA, but they were used as inputs to generate data sets specific to their, analysis use cases. So data distiller was leveraged in a couple of different ways. So first they used data distiller to shape those source data sets, including the analytics data set by creating a new common customer ID across the data sets. They could then use that within CJA as the single identifier across all data sets.
Secondly, they used data distiller to shape the transaction data by adding new columns to it for region and loyalty segment. So they did this in a couple of different ways. Firstly, by joining transaction records to the store table to find the region where the transaction took place. And then secondly, they joined transactions to customer segments to retrieve the loyalty status at the time of purchase. This was important because since a customer’s loyalty status could change weekly, having accurate attribution to a loyalty status was very important. In this case.
Finally, they used data distiller to, create new data sets that could be used within a, so those new data sets included a read purchasers table, which only included customers from the transactions table who had made more than a single purchase in a calendar year.
As well as a gross spend per customer table, which included, transaction data aggregated up to the, master customer ID level. So the benefits that they got out of this particular use case, they now have the ability to analyze how online touchpoints and engagements impact in-store purchases. They have the ability to analyze marketing performance across channels, regions, as well as across loyalty statuses.
And they are able to join their source data together as inputs to generate overall richer output data sets. All right. To to power increasingly meaningful reporting.
All right. And now I will pass it back over to Russell to walk you through the last two use cases. Thank you. Thank you. So, for this particular use case, we’re going to look at, how you can leverage each, data distiller to customize, the different insights, for operational dashboarding.
So this, for the use case, what we will be doing is we will be shaping the data, manipulating the data, and enrich the data as we go through.
The idea here is that, why is it our customer wanted to do this particular use case? It’s primarily to go beyond what’s available out of the box. Matrix. Available in, in AP and the needed more, metrics so you can use, data distiller, to achieve that. The customer objective here was to harness the key capabilities of data distiller, along with the out of the box reporting data models in in APM so that the they are able to enrich and reshape the data for their unique reporting needs.
Let’s go. To the next slide. The I guess him I mentioned, at your on is that you are able to customize those metrics, from, ÃÛ¶¹ÊÓƵ Journey Optimizer and auto CDP, which you can do, you can bring the data related to your profiles, your segments, and your destination, all those metrics are available in the app. You can customize all these, using Data Distiller and the journeys, the campaign dashboard, as well.
So bringing all those metrics together, you are able to bundle more than 60 plus metrics, and create more than ten different dashboards.
In APM, why is it that you you I mean, how can you do that? You can quickly do these by building custom charts and dashboards available on top of data distiller that allows you to do those, those reporting that more and more of our customers wants to go beyond the foundation, metrics based on their needs and unique use cases. Let’s go on the next slide. So what is it that the distiller, provides, provides, a variety of data to visualize session? You will be able to, have a slide, after this one where I will show you, the complexity of, dashboard that you can create. So in terms of data visualization, you can create these, according to your, to your requirement. And these will be in real time. The dashboard authoring is also the data comes in in real time. You will be able to create this dashboard based on this visualization that you have created. And that still remains real time.
You have the, SQL, DTM running the SQL data modeling. You can think of it as Brendan mentioned in, freedom, you know, three use cases.
You have to shape the data. You have to manipulate the data. So for you to do that, you need to have an understanding how the data there’s data set, linked to each other. And how do we do that. You do a data modeling and we did a distiller. You are able to do that provides you the capability of doing that. The accelerated storm the the previous two features, I talk about the data visualization and data authoring. I mentioned that, it’s real time. Why is it real time is because the accelerated Store is the, SQL engine that runs in real time while you are actually dragging those, those, visualization, you building this visualization and the data comes in the accelerated store is running the SQL in real time, and you are able to actually see the result in your dashboard in real time using the accelerated storm. On top of that, the key capability here, you seeing, those, operational dashboards, is that you have the ability to connect to your BRT tools internally that use and all these capabilities you can to weave and display. Let’s go to the next slide.
So the idea here is as the data comes into the data lake, you are able to generate, the, the snapshot of data as they come into app into real time customer profile, whether it is B2B, B2C, all those metrics comes in and, you have the data model. You remember we were talking about how did a distill a can will allow you to understand the data links between each of the data sets.
So you have the profile snapshot, comes in from from your unified profile in app.
Then you can bring this directly to your item, dashboard, or even into the dashboard in a but as I mentioned, you know, our customers need more additional, metrics that will provide more information, to their business. So how do they do that? Providing that data to data distiller. Data distiller will shape that data, will clean the data, will enrich the data. And then to read into a data set, then you have those additional metrics such like, we talk about the customer lifetime value. For example, you have this additional metrics on top of that. And we’re by now this is available. Then you push this into your own dashboard, in in your BI tool or even the dashboard that you create in a inside API.
Let’s go into next slide. So this is an example of a dashboard that you can create using data distiller. As you can see here, you have the funnel, you have the different trendlines, you have the graphs, the charts, you have the geolocation, graph you have there as well. So the possibilities out there for you to leverage that data distiller with all the different capabilities, especially around the accelerated stool the dashboard is driven by, in real time, using the data as they come through in AP.
Let’s go on the next slide.
Now we go into use case five whereby we have the capability, within the app. Now to have a, AI, ml, feature pipeline. That’s, we did a distill.
What is it that our customer wants to do here? They want to be able to train their model with a ÃÛ¶¹ÊÓƵ data. Right? So we are talking about taking the data out of EAP into their, the environment, into their landscape, and then leverage that prediction for use cases in ÃÛ¶¹ÊÓƵ Experience Platform. What that means is then you’re able to take the data out of app, you run your model, and then once you’ve done your prediction and once you’ve done all your, you know, modeling, then you bring this back into the app. And, the objective here is to leverage data tool to explore and optimize the data in order to share with your AI and ML environments for training and scoring. Once you’ve done the training and the scoring, then you bring that data back into app. And that’s what this pipeline in this AI and ML pipeline allows you to do. You are able to shape that data, you’re able to manipulate the data, and then you bring it back into, ÃÛ¶¹ÊÓƵ Experience platform.
Next slide please. So, we have this to, to, segment, right? You have the data distiller, on the left whereby you are able to explore the data, engineer the different features you share, which we’ve your AI and ML environment. Once you’ve done that, you train and school the models and then you bring it back into, various platform apps.
So next slide please. Okay. So the challenge that, our customers have come across, is how to train AI, ML models with ÃÛ¶¹ÊÓƵ, data. And that’s what this pipeline allows you to do. And, we previously, didn’t have this pipeline. There was the question mark referred to the fact that we don’t we didn’t have, this pipeline before, but now you are able to actually leverage these, in, in we did a distiller to allows you to share, the ÃÛ¶¹ÊÓƵ Experience spectrum data with your, in ML environments such as Databricks, SageMaker, etc… Right? You, you train your model and then you are able to bring these features back into in such as propensity models. Right. School. Right. So you can, you can bring this propensity score back into, into your profile of your customers. Yeah. Next slide please. And we have templates available, as well whereby you are able to for quick starting, if you want to learn more about, the Python notebook templates, we’re going to share, the links with you, on this python, notebook as well. They, they are templates, which allows you to, for example, train a propensity model. It’s called the propensity model. And then, ingest and activate, the model prediction. So we are able to explore, train, activate. So you’re able to do all these, with this, you know, ML pipeline, it allows you, to, to, to bring your scoring, data into, into AP skew, on the next, the next page.
So, our product, manager at ÃÛ¶¹ÊÓƵ have built, a really concise, documentation around all the abilities that are available. And on top of that, you they have actually provided you the solution, with the scripts, some examples, scripts that you can leverage, whether it is for dashboarding, whether it is for, sequel queries that you can run, you can click on this, link and when, when we send you the, the slides, you can explore these, it’s really, really powerful. A lot to learn from there. Definitely from me.
Excellent. Thank you, thank you. Russell, that is all of the content that we have for you guys today on data distiller. Hopefully you should now have a good idea of, the main business drivers, for using data distiller as well as those key use cases that we walked you through. Now we can open up for Q&A. So if you do have any questions, feel free to post those in the either the Q&A or the chat pod. I can see, we have access to both of them. So, we’ll give a few minutes for questions to come through. And while we do that, I’m actually going to launch a quick poll. This is just to give us, some high level feedback on the webinar session today, as well as gain ideas for future webinars.
So I’m going to go ahead and launch that poll. Just two questions there and let’s take a look at the, questions. So I see that we do have one from Abhishek.
Yep. That is Steven as well. So experience even data to profile profile data using data distiller. I think what we are seeing here, Abhishek, is, is that you can combine right data so that you are able to have a derive, data set out of those data set. Right? Yes. You can see that, we are able to actually, if you are referring to use case to. Yes. By leveraging, you know, even data set and profile data set, you can derive additional information. Right. And then bring this into, into another data set. Yes.
Yep. Exactly. And I think in this use case. So, they were also not just deriving, but they were also enriching from from a look up data set of the product pricing information as well.
Yeah.
Any other questions? All right. Great. Well, thank you, everyone, for attending today. We will be distributing the, recording and the slides from today’s webinar. And look forward to, future webinars. In the series.
Thank you everyone. Thank you. Cheers. Bye bye.
Key takeaways
Overview and Purpose of Data Distiller
Data Distiller is designed to provide an overview of its key use cases and customer solutions. It supports data architects, data engineers, data scientists, and marketing entities by enabling data segmentation, curation, and contextual data addition.
Primary Use Cases
The webinar highlighted five primary use cases for Data Distiller:
- Creating brand-centric data models for a South American retail company.
- Enriching next-best-offer emails with personalized data for a telecommunications company.
- Optimizing data for reporting and attribution modeling for a luxury retailer.
- Customizing insights for operational dashboarding.
- Leveraging AI and ML feature pipelines for training and scoring models.
Key Capabilities
Data Distiller offers SQL-based processing, scalable data management, ÃÛ¶¹ÊÓƵ-defined functions, automation and scheduling, monitoring and alerting, and integration with third-party tools for extended insights.
Data Transformation and Enrichment
Data Distiller allows for cleaning, shaping, manipulating, and enriching data. This includes standardizing data, reshaping data formats, augmenting data for granularity, and deriving additional attributes for downstream use.
Operational Dashboards and AI/ML Integration
Data Distiller enables the creation of real-time operational dashboards and supports AI/ML feature pipelines. This allows users to train models with ÃÛ¶¹ÊÓƵ data, score models, and integrate predictions back into ÃÛ¶¹ÊÓƵ Experience Platform for enhanced data-driven decision-making.