Workers Analytics Engine SQL API
The Workers Analytics Engine SQL API is an HTTP API that allows executing SQL queries against your Workers Anaytics Engine datasets.
The API is hosted at
Authentication is done via bearer token. An
Authorization: Bearer <token> header must be supplied with every request to the API.
Use the dashboard to create a token with permssion to read analytics data on your account:
- Visit the page in the Cloudflare dashboard.
- Select Create Token.
- Select Create Custom Token.
- Complete the Create Custom Token form as follows:
- Give your token a descriptive name.
- For Permissions select Account | Account Analytics | Read
- Optionally configure account and IP restrictions and TTL.
- Submit and confirm the form to create the token.
- Make a note of the token string.
Querying the API
You can use cURL to test the API as follows, replacing the
<account_id> with your 32 character account ID (available in the dashboard) and the
<token> with the token string you generated above.
curl -X POST "https://api.cloudflare.com/client/v4/accounts/<account_id>/analytics_engine/sql" -H "Authorization: Bearer <token>" -d "SELECT 'Hello Workers Analytics Engine' AS message"
If you have already published some data, you might try executing the following to confirm that the dataset has been created in the DB.
curl -X POST "https://api.cloudflare.com/client/v4/accounts/<account_id>/analytics_engine/sql" -H "Authorization: Bearer <token>" -d "SHOW TABLES"
A new table will automatically be created for each dataset once you start writing events to it from your worker.
The table will have the following columns:
At very high volumes of data, Analytics Engine will downsample data in order to be able to maintain performance. Sampling can occur on write and on read. Sampling is based on the index of your dataset so that only indexes that receive large numbers of events will be sampled. For example, if your worker serves multiple customers, you might consider making customer ID the index field. This would mean that if one customer starts making a high rate of requests then events from that customer could be sampled while other customers data remains unsampled.
We have tested this system of sampling over a number of years at Cloudflare and it has enabled us to scale our web analytics systems to very high throughput, while still providing statistically meaningful results irrespective of the amount of traffic a website receives.
The rate at which the data is sampled is exposed via the
_sample_interval column. This means that if you are doing statistical analysis of your data, you may need to take this column into account. For example:
|Original query||Query taking into account sampling|
Select data with column aliases
Column aliases can be used in queries to give names to the blobs and doubles in your dataset:
SELECTtimestamp,blob1 AS location_id,double1 AS inside_temp,double2 AS outside_tempFROM temperaturesWHERE timestamp > NOW() - INTERVAL '1' DAY
Aggregation taking into account sample interval
Calculate number of readings taken at each location in the last 7 days. In this case, we are grouping by the index field so an exact count can be calculated even in the case that the data has been sampled:
SELECTindex1 AS location_id,SUM(_sample_interval) AS n_readingsFROM temperaturesWHERE timestamp > NOW() - INTERVAL '7' DAYGROUP BY index1
Calculate the average temperature over the last 7 days at each location. Sample interval is taken into account:
SELECTindex1 AS location_id,SUM(_sample_interval * double1) / SUM(_sample_interval) AS average_tempFROM temperaturesWHERE timestamp > NOW() - INTERVAL '7' DAYGROUP BY index1