ÃÛ¶¹ÊÓƵ

Create the Luma propensity model schemas and datasets

NOTE
Data Science Workspace is no longer available for purchase.
This documentation is intended for existing customers with prior entitlements to Data Science Workspace.

This tutorial provides you with the prerequisites and assets required for all other ÃÛ¶¹ÊÓƵ Experience Platform Data Science Workspace tutorials. Once complete, the following schemas and datasets will be available to you and your organization.

Schemas:

  • Luma web data schema
  • Propensity model scoring results schema

Datasets:

  • Luma web dataset
  • Propensity model training dataset
  • Propensity model scoring dataset
  • Propensity model scoring results dataset

Download the assets assets

The following tutorial uses a custom Luma purchase propensity model. Before proceeding, download the required assets zip folder. This folder contains:

  • The purchase propensity model notebook
  • A notebook used to ingest data to a training and scoring dataset (a subset of the Luma web data)
  • A demo JSON file containing the web data of 730,000 Luma users
  • An optional Python 3 EDA (exploratory data analysis) Notebook which can be used to assist in understanding the web data and model.
NOTE
You can use your own schema and data for any of the tutorials. However, the demo model provided in the assets does not work unless it’s provided the proper configuration files and requirements file. This demo propensity model was designed to work with Luma web data.

Create the Luma web data schema and ingest the data

In order to create a model, you must have a dataset in Experience Platform which is used to train and score your model. The following video tutorial from the Data Science Workspace course walks you through creating the Luma schema and ingesting the data used by the purchase propensity model.

video poster

Transcript
Hello, in this video we are going to show you how to ingest the CourseLuma demo dataset into ÃÛ¶¹ÊÓƵ Experience Platform. We will be using this data for the remainder of the course lessons to help showcase Data Science Workspace. If you are following along, let’s start by making sure we are in the sandbox we want to ingest the data in. In my case, this is the development sandbox called Data Science Workspace Course. In order to ingest the demo data, we need to start by creating a schema to define the data structure. Navigate to the Data Management section in Experience Platform and select Schemas. Next, we want to select the Create Schema button, then select XDM Experience Event option. The data we are using is Luma Web Data, and we want this schema to capture the time-stamped behaviors of our web users, so the Experience Event option makes the most sense here. After naming our schema and providing a description, we need to add some schema field groups, previously known as mixins, to capture our data. Let’s start by adding the Consumer Experience Event field group. Once the Consumer Experience Event field group has been added, you will see a number of additional fields highlighted in the canvas. These are the fields that were added to our schema. Our web data also has some IDs associated with it, so we want to add an additional field group to capture our end user IDs. Let’s add the End User ID Details field group as well. Now that we have a schema to capture our data, we need to set a primary identity. To do this, let’s select our End User ID Details field group and navigate our schema to find an MCID column. Once we open the MCID column, select the ID field from within the column. The Field Properties sidebar opens. Scroll down and select the Identity checkbox, then select the Primary Identity checkbox, followed by selecting the ECID namespace from the Identity Namespace dropdown. Then select Apply, followed by Saving your changes. There you have it. We now have our schema and are ready to create a new dataset. To create a new dataset, select Datasets from within the Data Management tab. Then in the top right corner, select Create Dataset. Two options are provided. We want to select the Create Dataset from Schema option. Once you select Create Dataset from Schema, a list of schemas is provided. Select the schema we just made, then select Next. You will be asked to name the dataset and provide a description. Once complete, we are redirected to the Dataset Activity tab for our dataset. If you have not already, download the LumaDemo dataset from this module. In this demo, we are going to ingest our data by dragging and dropping the demo files into the Add Data section. Upon doing so, a batch ID is created. You’ll need to wait a couple minutes for the batch to finish processing. Going back to our architecture diagram, we can see that we used batch mode and our data is going to be mapped to our XDM schema, then added to the data lake where we can query and transform the data using Data Science Workspace. Once complete, a successful status is displayed. Before we finish, let’s quickly preview the data by selecting the Preview button from the Dataset Activity tab. We can see that our data was added to our schema and take a quick look at the course data that’s going to be used. With that, we have our course data in the data lake and we are ready to proceed with the remainder of the course. Thanks for watching!

Create the training, scoring, and scoring results datasets

In order to run the recipe builder notebook or use the API to train and score a model, you need to specify the dataset(s) and schema(s) that are used for training/scoring. The following video tutorial walks you through setting up the training, scoring, and scoring results datasets, as well as, the scoring results schema used in the Luma purchase propensity model.

video poster

Transcript
Hello, in this video we are going to show you how to create the course training, scoring, and scoring results datasets in ÃÛ¶¹ÊÓƵ Experience Platform. We will be using this data to train and score a model in a subsequent tutorial. We will then store the model results in a scoring results dataset that is enabled for real-time customer profile. If you are following along, let’s start by navigating to the dataset tab and select Create New Dataset. We want to create our dataset from a schema. The training and scoring datasets use the same schema as our Luma web data. All we have to do is select the schema we made in the course data ingestion tutorial and provide a name and description for the new dataset. I’ll name mine Data Science Course Training Dataset, followed by doing the same steps to create a Data Science Course Scoring dataset. Next, we want to select the notebooks tab and open up JupyterLab. We have provided a notebook named Data Science Training and Scoring Ingestion. Import and open the provided Spark notebook. This notebook is going to split our Luma web data into a training and scoring dataset and then write the split data to the two datasets we just made. In order for the notebook to run, you need to provide the dataset IDs for your training, and Luma dataset. Once you have run all the cells, check to make sure your training and scoring dataset contain the new data. The training dataset should be considerably larger than the scoring dataset. Next we want to create a new schema for our scoring results dataset. We will use this schema and dataset to store the model output. Select the Create Schema button, then select XDM Individual Profile as the option. This schema will need a custom field group, so we want to select the Create New Field Group option. Provide a display name and description for the new field group, then select Add Field Groups. After adding the field group, select the Add icon to create a new field and open the Field Properties menu. Our propensity model is going to output two values that I want to capture, a primary identifier and a boolean value for will order. Let’s start by creating the boolean value for will order. The field name should use camelcase. The display name is used in the UI, such as the Segment Builder. We want the type to be set to boolean and the default value for our boolean to be set to false. Select Apply and the new field is created, but we are not done yet. Next we need to capture our customer identifier. Select the Add option and create the Experience Cloud ID field, or ECID for short. Set the type to String, then scroll down and check the required box. We must have this value in order to create a profile and attach our will order prediction value. Because this is our identifier, we also need to select the Identity checkbox, followed by selecting the Primary Identity checkbox, and then select the ECID namespace from the Identity Namespaces dropdown. After finishing, select Apply. Now that we have our custom field group to capture the model output data, we can name the schema and provide a description. But before we save the schema, we want to enable it for profile by selecting the Profile toggle. In order for data to be ingested into profile, both the schema and dataset must be enabled for profile. Once complete, select Save to finish the schema. We are now ready to create the Profile Prediction dataset. Following the same steps we did to create the Training and Scoring dataset, we want to create the Scoring Results dataset with our new schema. I’ll name mine Data Science Course Model Prediction dataset. Once the dataset has been created, select the Profile toggle. This dataset will store all our model output scoring results, and because the schema and dataset are enabled for profile, the results will also be sent to real-time customer profile. With that, we have the required datasets to create the Course Propensity model. Thanks for watching.

Next steps

By following this tutorial, you have successfully created the required schemas and datasets for the Luma propensity model. You’re now ready to continue to the next tutorial and create the model using the recipe builder notebook tutorial.

Additionally, you can explore the data using the provided Exploratory Data Analysis (EDA) notebook. This notebook can be used to help understand patterns in the Luma data, check data sanity, and summarizes the relevant data for the predictive propensity model. To learn more about Exploratory Data Analysis, visit the EDA documenation.

recommendation-more-help
cc79fe26-64da-411e-a6b9-5b650f53e4e9