General guidance for query execution in Query Service
This document details important details to know when writing queries in ÃÛ¶¹ÊÓƵ Experience Platform Query Service.
For detailed information on the SQL syntax used in Query Service, please read the SQL syntax documentation.
Query execution models
ÃÛ¶¹ÊÓƵ Experience Platform Query Service has two models of query execution: interactive and non-interactive. Interactive execution is used for query development and report generation in business intelligence tools, while non-interactive is used for larger jobs and operational queries as a part of a data processing workflow.
Interactive query execution
Queries can be executed interactively by submitting them through the Query Service UI or through a connected client. When running Query Service through a connected client, an active session runs between the client and Query Service until either the submitted query returns or times out.
Interactive query execution has the following limitations:
LIMIT 0
in your query. The query timeout of 10 minutes still applies.By default, the results of interactive queries are returned to the client and are not persisted. In order to persist the results as a dataset in Experience Platform, the query must use the CREATE TABLE AS SELECT
syntax.
Non-interactive query execution
Queries submitted through the Query Service API are run non-interactively. Non-interactive execution means that Query Service receives the API call and executes the query in the order it is received. Non-interactive queries always result in either the generation of a new dataset in Experience Platform to receive the results, or the insertion of new rows into an existing dataset.
Accessing a specific field within an object
To access a field within an object in your query, you can use either dot notation (.
) or bracket notation ([]
). The following SQL statement uses dot notation to traverse the endUserIds
object down to the mcid
object.
SELECT endUserIds._experience.mcid
FROM {ANALYTICS_TABLE_NAME}
WHERE endUserIds._experience.mcid IS NOT NULL
AND TIMESTAMP = to_timestamp('{TARGET_YEAR}-{TARGET_MONTH}-{TARGET_DAY}')
LIMIT 1
{ANALYTICS_TABLE_NAME}
The following SQL statement uses bracket notation to traverse the endUserIds
object down to the mcid
object.
SELECT endUserIds['_experience']['mcid']
FROM {ANALYTICS_TABLE_NAME}
WHERE endUserIds._experience.mcid IS NOT NULL
AND TIMESTAMP = to_timestamp('{TARGET_YEAR}-{TARGET_MONTH}-{TARGET_DAY}')
LIMIT 1
{ANALYTICS_TABLE_NAME}
Both of the example queries above return a flattened object, rather than a single value:
endUserIds._experience.mcid
--------------------------------------------------------
(48168239533518554367684086979667672499,"(ECID)",true)
(1 row)
The returned endUserIds._experience.mcid
object contains the corresponding values for the following parameters:
id
namespace
primary
When the column is only declared down to the object, it returns the entire object as a string. To view only the ID, use:
SELECT endUserIds._experience.mcid.id
FROM {ANALYTICS_TABLE_NAME}
WHERE endUserIds._experience.mcid IS NOT NULL
AND TIMESTAMP = to_timestamp('{TARGET_YEAR}-{TARGET_MONTH}-{TARGET_DAY}')
LIMIT 1
endUserIds._experience.mcid.id
----------------------------------------
48168239533518554367684086979667672499
(1 row)
Quotes
Single quotes, double quotes, and back quotes have different usages within Query Service queries.
Single quotes
The single quote ('
) is used to create text strings. For example, it can be used in the SELECT
statement to return a static text value in the result, and in the WHERE
clause to evaluate the content of a column.
The following query declares a static text value ('datasetA'
) for a column:
SELECT
'datasetA',
timestamp,
web.webPageDetails.name
FROM {ANALYTICS_TABLE_NAME}
WHERE TIMESTAMP = to_timestamp('{TARGET_YEAR}-{TARGET_MONTH}-{TARGET_DAY}')
LIMIT 10
The following query uses a single-quoted string ('homepage'
) in its WHERE clause to return events for a specific page.
SELECT
timestamp,
endUserIds._experience.mcid.id
FROM {ANALYTICS_TABLE_NAME}
WHERE web.webPageDetails.name = 'homepage'
AND TIMESTAMP = to_timestamp('{TARGET_YEAR}-{TARGET_MONTH}-{TARGET_DAY}')
LIMIT 10
Double quotes
The double quote ("
) is used to declare an identifier with spaces.
The following query uses double quotes to return values from specified columns when one column contains a space in its identifier:
SELECT
no_space_column,
"space column"
FROM
( SELECT
'column1' as no_space_column,
'column2' as "space column"
)
Back quotes
The back quote `
is used to escape reserved column names only when using dot notation syntax. For example, since order
is a reserved word in SQL, you must use back quotes to access the field commerce.order
:
SELECT
commerce.`order`
FROM {ANALYTICS_TABLE_NAME}
WHERE TIMESTAMP = to_timestamp('{TARGET_YEAR}-{TARGET_MONTH}-{TARGET_DAY}')
LIMIT 10
Back quotes are also used to access a field that starts with a number. For example, to access the field 30_day_value
, you would need to use back quote notation.
SELECT
commerce.`30_day_value`
FROM {ANALYTICS_TABLE_NAME}
WHERE TIMESTAMP = to_timestamp('{TARGET_YEAR}-{TARGET_MONTH}-{TARGET_DAY}')
LIMIT 10
Back quotes are not needed if you are using bracket-notation.
SELECT
commerce['order']
FROM {ANALYTICS_TABLE_NAME}
WHERE TIMESTAMP = to_timestamp('{TARGET_YEAR}-{TARGET_MONTH}-{TARGET_DAY}')
LIMIT 10
Viewing table information
After connecting to Query Service, you can see all your available tables on Platform by using either the \d
or SHOW TABLES
commands.
Standard table view
The \d
command shows the standard PostgreSQL view for listing tables. An example of this command’s output can be seen below:
List of relations
Schema | Name | Type | Owner
--------+-----------------+-------+----------
public | luma_midvalues | table | postgres
public | luma_postvalues | table | postgres
(2 rows)
Detailed table view
SHOW TABLES
command is a custom command that provides more detailed information about the tables. An example of this command’s output can be seen below:
name | dataSetId | dataSet | description | resolved
-----------------+--------------------------+----------------+-------------+----------
luma_midvalues | 5bac030c29bb8d12fa992e58 | Luma midValues | | false
luma_postvalues | 5c86b896b3c162151785b43c | Luma midValues | | false
(2 rows)
Schema information
To view more detailed information about the schemas within the table, you can use the \d {TABLE_NAME}
command, where {TABLE_NAME}
is the name of the table whose schema information you want to view.
The following example shows the schema information for the luma_midvalues
table, which would be seen by using \d luma_midvalues
:
Table "public.luma_midvalues"
Column | Type | Collation | Nullable | Default
-------------------+-----------------------------+-----------+----------+---------
timestamp | timestamp | | |
_id | text | | |
productlistitems | anyarray | | |
commerce | luma_midvalues_commerce | | |
receivedtimestamp | timestamp | | |
enduserids | luma_midvalues_enduserids | | |
datasource | datasource | | |
web | luma_midvalues_web | | |
placecontext | luma_midvalues_placecontext | | |
identitymap | anymap | | |
marketing | marketing | | |
environment | luma_midvalues_environment | | |
_experience | luma_midvalues__experience | | |
device | device | | |
search | search | | |
Additionally, you can get further information about a particular column by appending the name of the column to the table name. This would be written in the format \d {TABLE_NAME}_{COLUMN}
.
The following example shows additional information for the web
column, and would be invoked by using the following command: \d luma_midvalues_web
:
Composite type "public.luma_midvalues_web"
Column | Type | Collation | Nullable | Default
----------------+-----------------------------------+-----------+----------+---------
webpagedetails | luma_midvalues_web_webpagedetails | | |
webreferrer | web_webreferrer | | |
Joining datasets
You can join multiple datasets together to include data from other datasets in your query.
The following example would join the following two datasets (your_analytics_table
and custom_operating_system_lookup
) and creates a SELECT
statement for the top 50 operating systems by number of page views.
Query
SELECT
b.operatingsystem AS OperatingSystem,
SUM(a.web.webPageDetails.pageviews.value) AS PageViews
FROM your_analytics_table a
JOIN custom_operating_system_lookup b
ON a._experience.analytics.environment.operatingsystemID = b.operatingsystemid
WHERE TIMESTAMP >= TO_TIMESTAMP('2018-01-01') AND TIMESTAMP <= TO_TIMESTAMP('2018-12-31')
GROUP BY OperatingSystem
ORDER BY PageViews DESC
LIMIT 50;
Results
Deduplication
Query Service supports data deduplication, or the removal of duplicate rows from data. For more information on deduplication, please read the Query Service deduplication guide.
Time zone computations in Query Service
Query Service standardizes persisted data in ÃÛ¶¹ÊÓƵ Experience Platform using the UTC timestamp format. For more information on how to translate your time zone requirement to and from a UTC timestamp, please see the FAQ section on how to change the time zone to and from a UTC Timestamp.
Next steps
By reading this document, you have been introduced to some important considerations when writing queries using Query Service. For more information on how to use the SQL syntax to write your own queries, please read the SQL syntax documentation.
For more samples of queries that can be used within Query Service, please read the following use case documentation: