Skip to content

SQL Reference

SHOW TABLES statement

SHOW TABLES can be used to list the tables on your account. The table name is the name you specified as dataset when configuring the workers binding (refer to Get started with Workers Analytics Engine, for more information). The table is automatically created when you write event data in your worker.

SHOW TABLES
[FORMAT <format>]

Refer to FORMAT clause for the available FORMAT options.

SHOW TIMEZONES statement

SHOW TIMEZONES can be used to list all of the timezones supported by the SQL API. Most common timezones are supported.

SHOW TIMEZONES
[FORMAT <format>]

SHOW TIMEZONE statement

SHOW TIMEZONE responds with the current default timezone in use by SQL API. This should always be Etc/UTC.

SHOW TIMEZONE
[FORMAT <format>]

SELECT statement

SELECT is used to query tables.

Usage:

SELECT <expression_list>
[FROM <table>|(<subquery>)]
[WHERE <expression>]
[GROUP BY <expression>, ...]
[ORDER BY <expression_list>]
[LIMIT <n>|ALL]
[FORMAT <format>]

Below you can find the syntax of each clause. Refer to the SQL API docs for some example queries.

SELECT clause

The SELECT clause specifies the list of columns to be included in the result. Columns can be aliased using the AS keyword.

Usage:

SELECT <expression> [AS <alias>], ...

Examples:

-- return the named columns
SELECT blob2, double3
-- return all columns
SELECT *
-- alias columns to more descriptive names
SELECT
blob2 AS probe_name,
double3 AS temperature

Additionally, expressions using supported functions and operators can be used in place of column names:

SELECT
blob2 AS probe_name,
double3 AS temp_c,
double3*1.8+32 AS temp_f -- compute a value
SELECT
blob2 AS probe_name,
if(double3 <= 0, 'FREEZING', 'NOT FREEZING') AS description -- use of functions
SELECT
blob2 AS probe_name,
avg(double3) AS avg_temp -- aggregation function

FROM clause

FROM is used to specify the source of the data for the query.

Usage:

FROM <table_name>|(subquery)

Examples:

-- query data written to a workers dataset called "temperatures"
FROM temperatures
-- use a subquery to manipulate the table
FROM (
SELECT
blob1 AS probe_name,
count() as num_readings
FROM
temperatures
GROUP BY
probe_name
)

Note that queries can only operate on a single table. UNION, JOIN etc. are not currently supported.

WHERE clause

WHERE is used to filter the rows returned by a query.

Usage:

WHERE <condition>

<condition> can be any expression that evaluates to a boolean.

Comparison operators can be used to compare values and boolean operators can be used to combine conditions.

Expressions containing functions and operators are supported.

Examples:

-- simple comparisons
WHERE blob1 = 'test'
WHERE double1 = 4
-- inequalities
WHERE double1 > 4
-- use of operators (see below for supported operator list)
WHERE double1 + double2 > 4
WHERE blob1 = 'test1' OR blob2 = 'test2'
-- expression using inequalities, functions and operators
WHERE if(unit = 'f', (temp-32)/1.8, temp) <= 0

GROUP BY clause

When using aggregate functions, GROUP BY specifies the groups over which the aggregation is run.

Usage:

GROUP BY <expression>, ...

For example. If you had a table of temperature readings:

-- return the average temperature for each probe
SELECT
blob1 AS probe_name,
avg(double1) AS average_temp
FROM temperature_readings
GROUP BY probe_name

In the usual case the <expression> can just be a column name but it is also possible to supply a complex expression here. Multiple expressions or column names can be supplied separated by commas.

ORDER BY clause

ORDER BY can be used to control the order in which rows are returned.

Usage:

ORDER BY <expression> [ASC|DESC], ...

<expression> can just be a column name.

ASC or DESC determines if the ordering is ascending or descending. ASC is the default, and can be omitted.

Examples:

-- order by double2 then double3, both in ascending order
ORDER BY double2, double3
-- order by double2 in ascending order then double3 is descending order
ORDER BY double2, double3 DESC

LIMIT clause

LIMIT specifies a maximum number of rows to return.

Usage:

LIMIT <n>|ALL

Supply the maximum number of rows to return or ALL for no restriction.

For example:

LIMIT 10 -- return at most 10 rows

FORMAT clause

FORMAT controls how to the returned data is encoded.

Usage:

FORMAT [JSON|JSONEachRow|TabSeparated]

If no format clause is included then the default format of JSON will be used.

Override the default by setting a format. For example:

FORMAT JSONEachRow

The following formats are supported:

JSON

Data is returned as a single JSON object with schema data included:

{
"meta": [
{
"name": "<column 1 name>",
"type": "<column 1 type>"
},
{
"name": "<column 2 name>",
"type": "<column 2 type>"
},
...
],
"data": [
{
"<column 1 name>": "<column 1 value>",
"<column 2 name>": "<column 2 value>",
...
},
{
"<column 1 name>": "<column 1 value>",
"<column 2 name>": "<column 2 value>",
...
},
...
],
"rows": 10
}

JSONEachRow

Data is returned with a separate JSON object per row. Rows are newline separated and there is no header line or schema data:

{"<column 1 name>": "<column 1 value>", "<column 2 name>": "<column 2 value>"}
{"<column 1 name>": "<column 1 value>", "<column 2 name>": "<column 2 value>"}
...

TabSeparated

Data is returned with newline separated rows. Columns are separated with tabs. There is no header.

column 1 value column 2 value
column 1 value column 2 value
...

Supported functions

count

Usage:

count()
count(DISTINCT column_name)

Count is an aggregation function that returns the number of rows in each group or results set.

Count can also be used to count the number of distinct (unique) values in each column:

Example:

-- return the total number of rows
count()
-- return the number of different values in the column
count(DISTINCT column_name)

sum

Usage:

sum([DISTINCT] column_name)

Sum is an aggregation function that returns the sum of column values across all rows in each group or results set. Sum also supports DISTINCT, but in this case it will only sum the unique values in the column.

Example:

-- return the total cost of all items
sum(item_cost)
-- return the total of all unique item costs
sum(DISTINCT item_cost)

avg

Usage:

avg([DISTINCT] column_name)

Avg is an aggregation function that returns the mean of column values across all rows in each group or results set. Avg also supports DISTINCT, but in this case it will only average the unique values in the column.

Example:

-- return the mean item cost
avg(item_cost)
-- return the mean of unique item costs
avg(DISTINCT item_cost)

min

Usage:

min(column_name)

Min is an aggregation function that returns the minimum value of a column across all rows.

Example:

-- return the minimum item cost
min(item_cost)

max

Usage:

max(column_name)

Max is an aggregation function that returns the maximum value of a column across all rows.

Example:

-- return the maximum item cost
max(item_cost)

quantileWeighted

Usage:

quantileWeighted(q, column_name, weight_column_name)

quantileWeighted is an aggregation function that returns the value at the qth quantile in the named column across all rows in each group or results set. Each row will be weighted by the value in weight_column_name. Typically this would be _sample_interval (refer to how sampling works, for more information).

Example:

-- estimate the median value of <double1>
quantileWeighted(0.5, double1, _sample_interval)
-- in a table of query times, estimate the 95th centile query time
quantileWeighted(0.95, query_time, _sample_interval)

if

Usage:

if(<condition>, <true_expression>, <false_expression>)

Returns <true_expression> if <condition> evaluates to true, else returns <false_expression>.

Example:

if(temp > 20, 'It is warm', 'Bring a jumper')

intDiv

Usage:

intDiv(a, b)

Divide a by b, rounding the answer down to the nearest whole number.

toUInt32

Usage:

toUInt32(<expression>)

Converts any numeric expression, or expression resulting in a string representation of a decimal, into an unsigned 32 bit integer.

Behaviour for negative numbers is undefined.

length

Usage:

length({string})

Returns the length of a string. This function is UTF-8 compatible.

Examples:

SELECT length('a string') AS s;
SELECT length(blob1) AS s FROM your_dataset;

isEmpty

Usage:

isEmpty({string})

Returns a boolean saying whether the string was empty. This computation can also be done as a binary operation: {string} = ''.

Examples:

SELECT isEmpty('a string') AS b;
SELECT isEmpty(blob1) AS b FROM your_dataset;

toLower

Usage:

toLower({string})

Returns the string converted to lowercase. This function is Unicode compatible. This may not be perfect for all languages and users with stringent needs, should do the operation in their own code.

Examples:

SELECT toLower('STRING TO DOWNCASE') AS s;
SELECT toLower(blob1) AS s FROM your_dataset;

toUpper

Usage:

toUpper({string})

Returns the string converted to uppercase. This function is Unicode compatible. The results may not be perfect for all languages and users with strict needs. These users should do the operation in their own code.

Examples:

SELECT toUpper('string to uppercase') AS s;
SELECT toUpper(blob1) AS s FROM your_dataset;

startsWith

Usage:

startsWith({string}, {string})

Returns a boolean of whether the first string has the second string at its start.

Examples:

SELECT startsWith('prefix ...', 'prefix') AS b;
SELECT startsWith(blob1, 'prefix') AS b FROM your_dataset;

endsWith

Usage:

endsWith({string}, {string})

Returns a boolean of whether the first string contains the second string at its end.

Examples:

SELECT endsWith('prefix suffix', 'suffix') AS b;
SELECT endsWith(blob1, 'suffix') AS b FROM your_dataset;

position

Usage:

position({needle:string} IN {haystack:string})

Returns the position of one string, needle, in another, haystack. In SQL, indexes are usually 1-based. That means that position returns 1 if your needle is at the start of the haystack. It only returns 0 if your string is not found.

Examples:

SELECT position(':' IN 'hello: world') AS p;
SELECT position(':' IN blob1) AS p FROM your_dataset;

substring

Usage:

substring({string}, {offset:integer}[. {length:integer}])

Extracts part of a string, starting at the Unicode code point indicated by the offset and returning the number of code points requested by the length. As previously mentioned, in SQL, indexes are usually 1-based. That means that the offset provided to substring should be at least 1.

Examples:

SELECT substring('hello world', 6) AS s;
SELECT substring('hello: world', 1, position(':' IN 'hello: world')-1) AS s;

format

Usage:

format({string}[, ...])

This function supports formatting strings, integers, floats, datetimes, intervals, etc, except NULL. The function does not support literal { and } characters in the format string.

Examples:

SELECT format('blob1: {}', blob1) AS s FROM dataset;

See also: formatDateTime

toDateTime

Usage:

toDateTime(<expression>[, 'timezone string'])

toDateTime converts an expression to a datetime. This function does not support ISO 8601-style timezones; if your time is not in UTC then you must provide the timezone using the second optional argument.

Examples:

-- double1 contains a unix timestamp in seconds
toDateTime(double1)
-- blob1 contains an datetime in the format 'YYYY-MM-DD hh:mm:ss'
toDateTime(blob1)
-- literal values:
toDateTime(355924804) -- unix timestamp
toDateTime('355924804') -- string containing unix timestamp
toDateTime('1981-04-12 12:00:04') -- string with datetime in 'YYYY-MM-DD hh:mm:ss' format
-- interpret a date relative to New York time
toDateTime('2022-12-01 16:17:00', 'America/New_York')

now

Usage:

now()

Returns the current time as a DateTime.

toUnixTimestamp

Usage:

toUnixTimestamp(<datetime>)

toUnixTimestamp converts a datetime into an integer unix timestamp.

Examples:

-- get the current unix timestamp
toUnixTimestamp(now())

formatDateTime

Usage:

formatDateTime(<datetime expression>, <format string>[, <timezone string>])

formatDateTime prints a datetime as a string according to a provided format string. See ClickHouse's docs for a list of supported formatting options.

Examples:

-- prints the current YYYY-MM-DD in UTC
formatDateTime(now(), '%Y-%m-%d')
-- prints YYYY-MM-DD in the datetime's timezone
formatDateTime(<a datetime with a timezone>, '%Y-%m-%d')
formatDateTime(toDateTime('2022-12-01 16:17:00', 'America/New_York'), '%Y-%m-%d')
-- prints YYYY-MM-DD in UTC
formatDateTime(<a datetime with a timezone>, '%Y-%m-%d', 'Etc/UTC')
formatDateTime(toDateTime('2022-12-01 16:17:00', 'America/New_York'), '%Y-%m-%d', 'Etc/UTC')

toStartOfInterval

Usage:

toStartOfInterval(<datetime>, INTERVAL '<n>' <unit>[, <timezone string>])

toStartOfInterval rounds down a datetime to the nearest offset of a provided interval. This can be useful for grouping data into equal-sized time ranges.

Examples:

-- round the current time down to the nearest 15 minutes
toStartOfInterval(now(), INTERVAL '15' MINUTE)
-- round a timestamp down to the day
toStartOfInterval(timestamp, INTERVAL '1' DAY)
-- count the number of datapoints filed in each hourly window
SELECT
toStartOfInterval(timestamp, INTERVAL '1' HOUR) AS hour,
sum(_sample_interval) AS count
FROM your_dataset
GROUP BY hour
ORDER BY hour ASC

extract

Usage:

extract(<time unit> from <datetime>)

extract returns an integer number of time units from a datetime. It supports YEAR, MONTH, DAY, HOUR, MINUTE and SECOND.

Examples:

-- extract the number of seconds from a timestamp (returns 15 in this example)
extract(SECOND from toDateTime('2022-06-06 11:30:15'))

Supported operators

The following operators are supported:

Arithmetic operators

OperatorDescription
+addition
-subtraction
*multiplication
/division
%modulus

Comparison operators

OperatorDescription
=equals
<less than
>greater than
<=less than or equal to
>=greater than or equal to
<> or !=not equal
INtrue if the preceding expression's value is in the list
column IN ('a', 'list', 'of', 'values')
NOT INtrue if the preceding expression's value is not in the list
column NOT IN ('a', 'list', 'of', 'values')

We also support the BETWEEN operator for checking a value is in an inclusive range: a [NOT] BETWEEN b AND c.

Boolean operators

OperatorDescription
ANDboolean "AND" (true if both sides are true)
ORboolean "OR" (true if either side or both sides are true)
NOTboolean "NOT" (true if following expression is false and visa-versa)

Unary operators

OperatorDescription
-negation operator (for example, -42)

Literals

TypeSyntax
integer42, -42
double4.2, -4.2
string'so long and thanks for all the fish'
booleantrue or false
time intervalINTERVAL '42' DAY
Intervals of YEAR, MONTH, DAY, HOUR, MINUTE and SECOND are supported