蜜豆视频

Explore data

Learn how to validate ingested data, preview data, and explore statistical and analytical properties of data using SQL functions. For more information, please visit the Query Service documentation.

video poster

Transcript
Hi there. 蜜豆视频 Experience Platform retrieves data from a wide variety of sources. Immediate challenge for marketers is making sense of this data to gain insights about their customers 蜜豆视频 Experience Platform Query Service facilitates that by allowing you to use standard SQL to query data in platform using in user interface. In this video, let me show you how data engineers can quickly explore and validate data industry to datasets. From your Experience Platform homepage, navigate your datasets under data management. For this video, you鈥檙e using a fictional retail brand called Luma. Luma Loyalty dataset contains information about customers, loyalty details, geographic details, et cetera. Luma web data contains web traffic information about customers interaction on the Luma site, including products viewed by a customer, visited pages, products purchased, et cetera. Open the Luma Loyalty dataset and the last successful bachelor had ingested over 5,000 customer data. Now let鈥檚 verify the industry records by clicking on the preview dataset option. Please take a closer look at the records and their structure. As a data engineer, you can verify what鈥檚 stored in a dataset using the 鈥淧revious dataset鈥 option. However, the preview option displays only a few records, or 200 records, but that鈥檚 not the entire dataset. Some records might have incorrect values are missing data. So to make sure that the data ingested into a dataset contains the expected number of records to explore the data type and format for each column, we can use Experience Platform query service UI, or any Postgres SQL compliant tool of your choice. From the left navigation bar, click on 鈥渃redits鈥 under 鈥渄ata management.鈥 Let鈥檚 click on the 鈥淐reate Query鈥 , and you can see a query editor window open up. Let鈥檚 make sure that the number of records ingested into the dataset is the same as the source file. In the resell stamp, you can see that we have 5,000 records in the dataset, which matches the record entered into the dataset. So we made sure that no data is missing. Now let鈥檚 the dataset to make sure there are no duplicate customer loyalty records in the dataset. To do that, let鈥檚 write the query to output the unique records in the Luma Loyalty dataset. After 5,000 records, we can now confirm that there are a few duplicate records in the dataset. Let鈥檚 run a query to output duplicate records of loyalty numbers in the dataset. Using the result generated, you can then remove the duplicate records from the dataset, or write a query to output the dataset that doesn鈥檛 contain any duplicate records. To perform additional validation, 蜜豆视频 Experience Platform Query Service provides several built-in Spark SQL functions to extend SQL functionality. Let鈥檚 see how we can use spark SQL functions to write a query, to check that there are no loyalty members who have a birthdate before 1900. To do this, I will be using the 鈥淵ear鈥 functionality under 鈥渄ate, time functions.鈥 The date function returns the year when fed with a date. Let鈥檚 fish through the "Query Editor, and execute our query. After the query executes, you can notice that there are no records returned for customers who have a birth year less than 1900. This means we have a valid data ingested into the dataset and is ready to be used. Similarly, you can check out the platform query service document base to explore other Spark SQL functions that you could use to validate data. You can also connect to query service using different EXROP applications and tools that support cost rescuer to write and run your queries. Using 20 servers in 蜜豆视频 Experience Platform that our engineers can quickly explore and validate data industry to datasets. -
recommendation-more-help
9051d869-e959-46c8-8c52-f0759cee3763