---
title: R2 SQL
description: A distributed SQL engine for R2 Data Catalog
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# R2 SQL

Note

R2 SQL is in **open beta**, and any developer with an [R2 subscription](https://developers.cloudflare.com/r2/pricing/) can start using it. Currently, outside of standard R2 storage and operations, you will not be billed for your use of R2 SQL. We will update [the pricing page](https://developers.cloudflare.com/r2-sql/platform/pricing) and provide at least 30 days notice before enabling billing.

Query Apache Iceberg tables managed by R2 Data Catalog using SQL.

R2 SQL is Cloudflare's serverless, distributed, analytics query engine for querying [Apache Iceberg ↗](https://iceberg.apache.org/) tables stored in [R2 Data Catalog](https://developers.cloudflare.com/r2/data-catalog/). R2 SQL is designed to efficiently query large amounts of data by automatically utilizing file pruning, Cloudflare's distributed compute, and R2 object storage.

Terminal window

```

❯ npx wrangler r2 sql query "3373912de3f5202317188ae01300bd6_data-catalog" \

"SELECT * FROM default.transactions LIMIT 10"


 ⛅️ wrangler 4.38.0

────────────────────────────────────────────────────────────────────────────

▲ [WARNING] 🚧 `wrangler r2 sql query` is an open-beta command. Please report any issues to https://github.com/cloudflare/workers-sdk/issues/new/choose


┌─────────────────────────────┬──────────────────────────────────────┬─────────┬──────────┬──────────────────────────────────┬───────────────┬───────────────────┬──────────┐

│ __ingest_ts                 │ transaction_id                       │ user_id │ amount   │ transaction_timestamp            │ location      │ merchant_category │ is_fraud │

├─────────────────────────────┼──────────────────────────────────────┼─────────┼──────────┼──────────────────────────────────┼───────────────┼───────────────────┼──────────┤

│ 2025-09-20T22:30:11.872554Z │ fdc1beed-157c-4d2d-90cf-630fdea58051 │ 1679    │ 13241.59 │ 2025-09-20T02:23:04.269988+00:00 │ NEW_YORK      │ RESTAURANT        │ false    │

├─────────────────────────────┼──────────────────────────────────────┼─────────┼──────────┼──────────────────────────────────┼───────────────┼───────────────────┼──────────┤

│ 2025-09-20T22:30:11.724378Z │ ea7ef106-8284-4d08-9348-ad33989b6381 │ 1279    │ 17615.79 │ 2025-09-20T02:23:04.271090+00:00 │ MIAMI         │ GAS_STATION       │ true     │

├─────────────────────────────┼──────────────────────────────────────┼─────────┼──────────┼──────────────────────────────────┼───────────────┼───────────────────┼──────────┤

│ 2025-09-20T22:30:11.724330Z │ afcdee4d-5c71-42be-97ec-e282b6937a8c │ 1843    │ 7311.65  │ 2025-09-20T06:23:04.267890+00:00 │ SEATTLE       │ GROCERY           │ true     │

├─────────────────────────────┼──────────────────────────────────────┼─────────┼──────────┼──────────────────────────────────┼───────────────┼───────────────────┼──────────┤

│ 2025-09-20T22:30:11.657007Z │ b99d14e0-dbe0-49bc-a417-0ee57f8bed99 │ 1976    │ 15228.21 │ 2025-09-16T23:23:04.269426+00:00 │ NEW_YORK      │ RETAIL            │ false    │

├─────────────────────────────┼──────────────────────────────────────┼─────────┼──────────┼──────────────────────────────────┼───────────────┼───────────────────┼──────────┤

│ 2025-09-20T22:30:11.656992Z │ 712cd094-ad4c-4d24-819a-0d3daaaceea1 │ 1184    │ 7570.89  │ 2025-09-20T00:23:04.269163+00:00 │ LOS_ANGELES   │ RESTAURANT        │ true     │

├─────────────────────────────┼──────────────────────────────────────┼─────────┼──────────┼──────────────────────────────────┼───────────────┼───────────────────┼──────────┤

│ 2025-09-20T22:30:11.656912Z │ b5a1aab3-676d-4492-92b8-aabcde6db261 │ 1196    │ 46611.25 │ 2025-09-20T16:23:04.268693+00:00 │ NEW_YORK      │ RETAIL            │ true     │

├─────────────────────────────┼──────────────────────────────────────┼─────────┼──────────┼──────────────────────────────────┼───────────────┼───────────────────┼──────────┤

│ 2025-09-20T22:30:11.613740Z │ 432d3976-8d89-4813-9099-ea2afa2c0e70 │ 1720    │ 21547.9  │ 2025-09-20T05:23:04.273681+00:00 │ SAN FRANCISCO │ GROCERY           │ true     │

├─────────────────────────────┼──────────────────────────────────────┼─────────┼──────────┼──────────────────────────────────┼───────────────┼───────────────────┼──────────┤

│ 2025-09-20T22:30:11.532068Z │ 25e0b851-3092-4ade-842f-e3189e07d4ee │ 1562    │ 29311.54 │ 2025-09-20T05:23:04.277405+00:00 │ NEW_YORK      │ RETAIL            │ false    │

├─────────────────────────────┼──────────────────────────────────────┼─────────┼──────────┼──────────────────────────────────┼───────────────┼───────────────────┼──────────┤

│ 2025-09-20T22:30:11.526037Z │ 8001746d-05fe-42fe-a189-40caf81d7aa2 │ 1817    │ 15976.5  │ 2025-09-15T16:23:04.266632+00:00 │ SEATTLE       │ RESTAURANT        │ true     │

└─────────────────────────────┴──────────────────────────────────────┴─────────┴──────────┴──────────────────────────────────┴───────────────┴───────────────────┴──────────┘

Read 11.3 kB across 4 files from R2

On average, 3.36 kB / s


```

Create an end-to-end data pipeline by following [this step by step guide](https://developers.cloudflare.com/r2-sql/get-started/), which shows you how to stream events into an Apache Iceberg table and query it with R2 SQL.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}}]}
```

---

---
title: Getting started
description: Create your first pipeline to ingest streaming data and write to R2 Data Catalog as an Apache Iceberg table.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Getting started

This guide will instruct you through:

* Creating your first [R2 bucket](https://developers.cloudflare.com/r2/buckets/) and enabling its [data catalog](https://developers.cloudflare.com/r2/data-catalog/).
* Creating an [API token](https://developers.cloudflare.com/r2/api/tokens/) needed for pipelines to authenticate with your data catalog.
* Creating your first pipeline with a simple ecommerce schema that writes to an [Apache Iceberg ↗](https://iceberg.apache.org/) table managed by R2 Data Catalog.
* Sending sample ecommerce data via HTTP endpoint.
* Validating data in your bucket and querying it with R2 SQL.

## Prerequisites

1. Sign up for a [Cloudflare account ↗](https://dash.cloudflare.com/sign-up/workers-and-pages).
2. Install [Node.js ↗](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm).

Node.js version manager

Use a Node version manager like [Volta ↗](https://volta.sh/) or [nvm ↗](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions. [Wrangler](https://developers.cloudflare.com/workers/wrangler/install-and-update/), discussed later in this guide, requires a Node version of `16.17.0` or later.

## 1\. Create an R2 bucket

* [ Wrangler CLI ](#tab-panel-9552)
* [ Dashboard ](#tab-panel-9553)

1. If not already logged in, run:  
Terminal window  
```  
npx wrangler login  
```
2. Create an R2 bucket:  
Terminal window  
```  
npx wrangler r2 bucket create pipelines-tutorial  
```

1. In the Cloudflare dashboard, go to the **R2 object storage** page.  
[ Go to **Overview** ](https://dash.cloudflare.com/?to=/:account/r2/overview)
2. Select **Create bucket**.
3. Enter the bucket name: pipelines-tutorial
4. Select **Create bucket**.

## 2\. Enable R2 Data Catalog

* [ Wrangler CLI ](#tab-panel-9554)
* [ Dashboard ](#tab-panel-9555)

Enable the catalog on your R2 bucket:

Terminal window

```

npx wrangler r2 bucket catalog enable pipelines-tutorial


```

When you run this command, take note of the "Warehouse" and "Catalog URI". You will need these later.

1. In the Cloudflare dashboard, go to the **R2 object storage** page.  
[ Go to **Overview** ](https://dash.cloudflare.com/?to=/:account/r2/overview)
2. Select the bucket: pipelines-tutorial.
3. Switch to the **Settings** tab, scroll down to **R2 Data Catalog**, and select **Enable**.
4. Once enabled, note the **Catalog URI** and **Warehouse name**.

## 3\. Create an API token

Pipelines must authenticate to R2 Data Catalog with an [R2 API token](https://developers.cloudflare.com/r2/api/tokens/) that has catalog and R2 permissions.

1. In the Cloudflare dashboard, go to the **R2 object storage** page.  
[ Go to **Overview** ](https://dash.cloudflare.com/?to=/:account/r2/overview)
2. Select **Manage API tokens**.
3. Select **Create Account API token**.
4. Give your API token a name.
5. Under **Permissions**, choose the **Admin Read & Write** permission.
6. Select **Create Account API Token**.
7. Note the **Token value**.

Note

This token also includes the R2 SQL Read permission, which allows you to query your data with R2 SQL.

## 4\. Create a pipeline

* [ Wrangler CLI ](#tab-panel-9556)
* [ Dashboard ](#tab-panel-9557)

First, create a schema file that defines your ecommerce data structure:

**Create `schema.json`:**

```

{

  "fields": [

    {

      "name": "user_id",

      "type": "string",

      "required": true

    },

    {

      "name": "event_type",

      "type": "string",

      "required": true

    },

    {

      "name": "product_id",

      "type": "string",

      "required": false

    },

    {

      "name": "amount",

      "type": "float64",

      "required": false

    }

  ]

}


```

Use the interactive setup to create a pipeline that writes to R2 Data Catalog:

Terminal window

```

npx wrangler pipelines setup


```

Follow the prompts:

1. **Pipeline name**: Enter `ecommerce`
2. **Stream configuration**:  
   * Enable HTTP endpoint: `yes`  
   * Require authentication: `no` (for simplicity)  
   * Configure custom CORS origins: `no`  
   * Schema definition: `Load from file`  
   * Schema file path: `schema.json` (or your file path)
3. **Sink configuration**:  
   * Destination type: `Data Catalog Table`  
   * R2 bucket name: `pipelines-tutorial`  
   * Namespace: `default`  
   * Table name: `ecommerce`  
   * Catalog API token: Enter your token from step 3  
   * Compression: `zstd`  
   * Roll file when size reaches (MB): `100`  
   * Roll file when time reaches (seconds): `10` (for faster data visibility in this tutorial)
4. **SQL transformation**: Choose `Use simple ingestion query` to use:  
```  
INSERT INTO ecommerce_sink SELECT * FROM ecommerce_stream  
```

After setup completes, note the HTTP endpoint URL displayed in the final output.

1. In the Cloudflare dashboard, go to **Pipelines** \> **Pipelines**.  
[ Go to **Pipelines** ](https://dash.cloudflare.com/?to=/:account/pipelines/overview)
2. Select **Create Pipeline**.
3. **Connect to a Stream**:  
   * Pipeline name: `ecommerce`  
   * Enable HTTP endpoint for sending data: Enabled  
   * HTTP authentication: Disabled (default)  
   * Select **Next**
4. **Define Input Schema**:  
   * Select **JSON editor**  
   * Copy in the schema:  
   ```  
   {  
     "fields": [  
       {  
         "name": "user_id",  
         "type": "string",  
         "required": true  
       },  
       {  
         "name": "event_type",  
         "type": "string",  
         "required": true  
       },  
       {  
         "name": "product_id",  
         "type": "string",  
         "required": false  
       },  
       {  
         "name": "amount",  
         "type": "f64",  
         "required": false  
       }  
     ]  
   }  
   ```  
   * Select **Next**
5. **Define Sink**:  
   * Select your R2 bucket: `pipelines-tutorial`  
   * Storage type: **R2 Data Catalog**  
   * Namespace: `default`  
   * Table name: `ecommerce`  
   * **Advanced Settings**: Change **Maximum Time Interval** to `10 seconds`  
   * Select **Next**
6. **Credentials**:  
   * Disable **Automatically create an Account API token for your sink**  
   * Enter **Catalog Token** from step 3  
   * Select **Next**
7. **Pipeline Definition**:  
   * Leave the default SQL query:  
   ```  
   INSERT INTO ecommerce_sink SELECT * FROM ecommerce_stream;  
   ```  
   * Select **Create Pipeline**
8. After pipeline creation, note the **Stream ID** for the next step.

## 5\. Send sample data

Send ecommerce events to your pipeline's HTTP endpoint:

Terminal window

```

curl -X POST https://{stream-id}.ingest.cloudflare.com \

  -H "Content-Type: application/json" \

  -d '[

    {

      "user_id": "user_12345",

      "event_type": "purchase",

      "product_id": "widget-001",

      "amount": 29.99

    },

    {

      "user_id": "user_67890",

      "event_type": "view_product",

      "product_id": "widget-002"

    },

    {

      "user_id": "user_12345",

      "event_type": "add_to_cart",

      "product_id": "widget-003",

      "amount": 15.50

    }

  ]'


```

Replace `{stream-id}` with your actual stream endpoint from the pipeline setup.

## 6\. Validate data in your bucket

1. In the Cloudflare dashboard, go to the **R2 object storage** page.
2. Select your bucket: `pipelines-tutorial`.
3. You should see Iceberg metadata files and data files created by your pipeline. Note: If you aren't seeing any files in your bucket, try waiting a couple of minutes and trying again.
4. The data is organized in the Apache Iceberg format with metadata tracking table versions.

## 7\. Query your data using R2 SQL

Set up your environment to use R2 SQL:

Terminal window

```

export WRANGLER_R2_SQL_AUTH_TOKEN=YOUR_API_TOKEN


```

Or create a `.env` file with:

```

WRANGLER_R2_SQL_AUTH_TOKEN=YOUR_API_TOKEN


```

Where `YOUR_API_TOKEN` is the token you created in step 3\. For more information on setting environment variables, refer to [Wrangler system environment variables](https://developers.cloudflare.com/workers/wrangler/system-environment-variables/).

Query your data:

Terminal window

```

npx wrangler r2 sql query "YOUR_WAREHOUSE_NAME" "

SELECT

    user_id,

    event_type,

    product_id,

    amount

FROM default.ecommerce

WHERE event_type = 'purchase'

LIMIT 10"


```

Replace `YOUR_WAREHOUSE_NAME` with the warehouse name from step 2.

You can also query this table with any engine that supports Apache Iceberg. To learn more about connecting other engines to R2 Data Catalog, refer to [Connect to Iceberg engines](https://developers.cloudflare.com/r2/data-catalog/config-examples/).

## Learn more

[ Managing R2 Data Catalogs ](https://developers.cloudflare.com/r2/data-catalog/manage-catalogs/) Enable or disable R2 Data Catalog on your bucket, retrieve configuration details, and authenticate your Iceberg engine. 

[ Try another example ](https://developers.cloudflare.com/r2-sql/tutorials/end-to-end-pipeline) Detailed tutorial for setting up a simple fraud detection data pipeline, and generate events for it in Python. 

[ Pipelines ](https://developers.cloudflare.com/pipelines/) Understand SQL transformations and pipeline configuration. 

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/get-started/","name":"Getting started"}}]}
```

---

---
title: Query data
description: Understand how to query data with R2 SQL
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Query data

Query [Apache Iceberg ↗](https://iceberg.apache.org/) tables managed by [R2 Data Catalog](https://developers.cloudflare.com/r2/data-catalog/). R2 SQL queries can be made via [Wrangler](https://developers.cloudflare.com/workers/wrangler/) or HTTP API.

## Get your warehouse name

To query data with R2 SQL, you'll need your warehouse name associated with your [catalog](https://developers.cloudflare.com/r2/data-catalog/manage-catalogs/). To retrieve it, you can run the [r2 bucket catalog get command](https://developers.cloudflare.com/workers/wrangler/commands/r2/#r2-bucket-catalog-get):

Terminal window

```

npx wrangler r2 bucket catalog get <BUCKET_NAME>


```

Alternatively, you can find it in the dashboard by going to the **R2 object storage** page, selecting the bucket, switching to the **Settings** tab, scrolling to **R2 Data Catalog**, and finding **Warehouse name**.

## Query via Wrangler

To begin, install [npm ↗](https://docs.npmjs.com/getting-started). Then [install Wrangler, the Developer Platform CLI](https://developers.cloudflare.com/workers/wrangler/install-and-update/).

Wrangler needs an API token with permissions to access R2 Data Catalog, R2 storage, and R2 SQL to execute queries. The `r2 sql query` command looks for the token in the `WRANGLER_R2_SQL_AUTH_TOKEN` environment variable.

Set up your environment:

Terminal window

```

export WRANGLER_R2_SQL_AUTH_TOKEN=YOUR_API_TOKEN


```

Or create a `.env` file with:

```

WRANGLER_R2_SQL_AUTH_TOKEN=YOUR_API_TOKEN


```

Where `YOUR_API_TOKEN` is the token you created with the [required permissions](#authentication). For more information on setting environment variables, refer to [Wrangler system environment variables](https://developers.cloudflare.com/workers/wrangler/system-environment-variables/).

To run a SQL query, run the [r2 sql query command](https://developers.cloudflare.com/workers/wrangler/commands/r2/#r2-sql-query):

Terminal window

```

npx wrangler r2 sql query <WAREHOUSE> "SELECT * FROM namespace.table_name limit 10;"


```

For a full list of supported SQL commands, refer to the [R2 SQL reference](https://developers.cloudflare.com/r2-sql/sql-reference/).

## Query via API

Below is an example of using R2 SQL via the REST endpoint:

Terminal window

```

curl -X POST \

  "https://api.sql.cloudflarestorage.com/api/v1/accounts/{ACCOUNT_ID}/r2-sql/query/{BUCKET_NAME}" \

  -H "Authorization: Bearer ${WRANGLER_R2_SQL_AUTH_TOKEN}" \

  -H "Content-Type: application/json" \

  -d '{

    "query": "SELECT * FROM namespace.table_name limit 10;"

  }'


```

The API requires an API token with the appropriate permissions in the Authorization header. Refer to [Authentication](#authentication) for details on creating a token.

For a full list of supported SQL commands, refer to the [R2 SQL reference](https://developers.cloudflare.com/r2-sql/sql-reference/).

## Authentication

To query data with R2 SQL, you must provide a Cloudflare API token with R2 SQL, R2 Data Catalog, and R2 storage permissions. R2 SQL requires these permissions to access catalog metadata and read the underlying data files stored in R2.

### Create API token in the dashboard

Create an [R2 API token](https://developers.cloudflare.com/r2/api/tokens/#permissions) with the following permissions:

* Access to R2 Data Catalog (read-only)
* Access to R2 storage (Admin read/write)
* Access to R2 SQL (read-only)

Use this token value for the `WRANGLER_R2_SQL_AUTH_TOKEN` environment variable when querying with Wrangler, or in the Authorization header when using the REST API.

### Create API token via API

To create an API token programmatically for use with R2 SQL, you'll need to specify R2 SQL, R2 Data Catalog, and R2 storage permission groups in your [Access Policy](https://developers.cloudflare.com/r2/api/tokens/#access-policy).

#### Example Access Policy

```

[

  {

    "id": "f267e341f3dd4697bd3b9f71dd96247f",

    "effect": "allow",

    "resources": {

      "com.cloudflare.edge.r2.bucket.4793d734c0b8e484dfc37ec392b5fa8a_default_my-bucket": "*",

      "com.cloudflare.edge.r2.bucket.4793d734c0b8e484dfc37ec392b5fa8a_eu_my-eu-bucket": "*"

    },

    "permission_groups": [

      {

        "id": "f45430d92e2b4a6cb9f94f2594c141b8",

        "name": "Workers R2 SQL Read"

      },

      {

        "id": "d229766a2f7f4d299f20eaa8c9b1fde9",

        "name": "Workers R2 Data Catalog Write"

      },

      {

        "id": "bf7481a1826f439697cb59a20b22293e",

        "name": "Workers R2 Storage Write"

      }

    ]

  }

]


```

To learn more about how to create API tokens for R2 SQL using the API, including required permission groups and usage examples, refer to the [Create API tokens via API documentation](https://developers.cloudflare.com/r2/api/tokens/#create-api-tokens-via-api).

## Additional resources

[ Manage R2 Data Catalogs ](https://developers.cloudflare.com/r2/data-catalog/manage-catalogs/) Enable or disable R2 Data Catalog on your bucket, retrieve configuration details, and authenticate your Iceberg engine. 

[ Build an end to end data pipeline ](https://developers.cloudflare.com/r2-sql/tutorials/end-to-end-pipeline) Detailed tutorial for setting up a simple fraud detection data pipeline, and generate events for it in Python. 

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/query-data/","name":"Query data"}}]}
```

---

---
title: SQL reference
description: Comprehensive reference for SQL syntax, functions, and data types supported in R2 SQL.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# SQL reference

Note

R2 SQL is in public beta. Supported SQL grammar may change over time.

R2 SQL is Cloudflare's serverless, distributed, analytics query engine for querying [Apache Iceberg ↗](https://iceberg.apache.org/) tables stored in [R2 Data Catalog](https://developers.cloudflare.com/r2/data-catalog/). This page documents the supported SQL syntax.

---

## Query syntax

```

SELECT column_list | expression | aggregation_function

FROM namespace_name.table_name

[JOIN namespace_name.table_name ON condition]

[WHERE conditions]

[GROUP BY column_list]

[HAVING conditions]

[ORDER BY expression [ASC | DESC]]

[LIMIT number]


```

---

## Schema discovery commands

### SHOW DATABASES

Lists all available namespaces.

```

SHOW DATABASES;


```

### SHOW NAMESPACES

Alias for `SHOW DATABASES`. Lists all available namespaces.

```

SHOW NAMESPACES;


```

### SHOW TABLES

Lists all tables within a specific namespace.

```

SHOW TABLES IN namespace_name;


```

### DESCRIBE

Describes the structure of a table, showing column names and data types.

```

DESCRIBE namespace_name.table_name;


```

---

## SELECT clause

### Syntax

```

SELECT [DISTINCT] column_specification [, column_specification, ...]


```

### Column specification

* **Column name**: `column_name`
* **All columns**: `*`
* **Qualified wildcard**: `table_name.*`
* **Column alias**: `column_name AS alias`
* **Expressions**: arithmetic, function calls, CASE expressions, and casts

### Select distinct

Use `DISTINCT` to eliminate duplicate rows from the result set:

```

SELECT DISTINCT region, department

FROM my_namespace.sales_data

WHERE total_amount > 1000

ORDER BY region, department

LIMIT 100


```

For large datasets where approximate results are acceptable, `approx_distinct()` is a faster alternative for counting unique values.

### Examples

```

SELECT * FROM my_namespace.sales_data LIMIT 10

SELECT customer_id, region, total_amount FROM my_namespace.sales_data LIMIT 10

SELECT region, total_amount * 1.1 AS total_with_tax FROM my_namespace.sales_data LIMIT 10


```

---

## Common table expressions (CTEs)

CTEs let you define named temporary result sets using `WITH` that you can reference in the main query. CTEs can reference different tables and can include JOINs. A CTE can also be joined with other CTEs or regular tables in the main query.

### Syntax

```

WITH cte_name AS (

    SELECT ...

    FROM namespace_name.table_name

    [WHERE ...]

)

SELECT ... FROM cte_name


```

### Chained CTEs

A CTE can reference a previously defined CTE.

```

WITH filtered AS (

    SELECT customer_id, department, total_amount

    FROM my_namespace.sales_data

    WHERE total_amount > 0

),

summary AS (

    SELECT department,

           COUNT(*) AS order_count,

           round(AVG(total_amount), 2) AS avg_amount

    FROM filtered

    GROUP BY department

)

SELECT *

FROM summary

WHERE order_count > 100

ORDER BY avg_amount DESC


```

### CTE joined with another table

```

WITH enterprise_zones AS (

    SELECT zone_id, domain, plan

    FROM my_namespace.zones

    WHERE plan = 'enterprise'

)

SELECT ez.domain, f.action, COUNT(*) AS cnt

FROM enterprise_zones ez

INNER JOIN my_namespace.firewall_events f ON ez.zone_id = f.zone_id

GROUP BY ez.domain, f.action

ORDER BY cnt DESC

LIMIT 20


```

### Two CTEs joined together

```

WITH top_zones AS (

    SELECT zone_id, COUNT(*) AS req_count

    FROM my_namespace.http_requests

    GROUP BY zone_id

    ORDER BY req_count DESC

    LIMIT 50

),

zone_threats AS (

    SELECT zone_id, COUNT(*) AS threat_count

    FROM my_namespace.firewall_events

    WHERE risk_score > 0.5

    GROUP BY zone_id

)

SELECT tz.zone_id, tz.req_count, COALESCE(zt.threat_count, 0) AS threat_count

FROM top_zones tz

LEFT JOIN zone_threats zt ON tz.zone_id = zt.zone_id

ORDER BY tz.req_count DESC

LIMIT 20


```

---

## FROM clause

### Syntax

```

SELECT * FROM namespace_name.table_name


```

R2 SQL queries can reference one or more tables. Tables are specified as `namespace_name.table_name`. Multiple tables can be combined using JOINs or comma-separated syntax. Refer to the [JOIN clause](#join-clause) section for details.

---

## JOIN clause

R2 SQL supports joining multiple Iceberg tables in a single query. All join types use standard SQL syntax.

### Supported join types

| Join type        | Syntax                          | Description                                                        |
| ---------------- | ------------------------------- | ------------------------------------------------------------------ |
| Inner join       | INNER JOIN ... ON               | Returns rows that match in both tables                             |
| Left outer join  | LEFT JOIN ... ON                | Returns all rows from the left table, NULLs for non-matching right |
| Right outer join | RIGHT JOIN ... ON               | Returns all rows from the right table, NULLs for non-matching left |
| Full outer join  | FULL OUTER JOIN ... ON          | Returns all rows from both tables, NULLs where no match            |
| Cross join       | CROSS JOIN                      | Cartesian product of both tables                                   |
| Implicit join    | FROM t1, t2 WHERE t1.id = t2.id | Comma-separated tables with join condition in WHERE                |

### Syntax

```

-- Explicit JOIN

SELECT columns

FROM namespace.table1 alias1

[INNER | LEFT | RIGHT | FULL OUTER | CROSS] JOIN namespace.table2 alias2

  ON alias1.column = alias2.column

[WHERE conditions]


-- Implicit join

SELECT columns

FROM namespace.table1 alias1, namespace.table2 alias2

WHERE alias1.column = alias2.column


```

### Multi-way joins

You can join three or more tables in a single query:

```

SELECT z.domain, h.method, f.action, COUNT(*) AS cnt

FROM my_namespace.zones z

INNER JOIN my_namespace.http_requests h ON z.zone_id = h.zone_id

INNER JOIN my_namespace.firewall_events f ON z.zone_id = f.zone_id

WHERE h.status_code >= 400

GROUP BY z.domain, h.method, f.action

ORDER BY cnt DESC

LIMIT 20


```

### Self-joins

A table can be joined with itself using different aliases:

```

SELECT f1.source_ip, f1.zone_id AS zone1, f2.zone_id AS zone2

FROM my_namespace.firewall_events f1

INNER JOIN my_namespace.firewall_events f2

  ON f1.source_ip = f2.source_ip

  AND f1.zone_id < f2.zone_id

WHERE f1.action = 'block'

LIMIT 20


```

### Join conditions

* Join conditions use the `ON` clause with equality (`=`) or expression-based predicates.
* Functions are supported in join predicates (for example, `ON LOWER(a.col) = LOWER(b.col)`).
* Multiple conditions can be combined with `AND`.

Note

Nested (parenthesized) joins are not supported. Write multi-way joins as a flat sequence of `JOIN` clauses instead of grouping them with parentheses.

```

-- Not supported

SELECT * FROM (t1 JOIN t2 ON t1.id = t2.id) JOIN t3 ON t2.id = t3.id


-- Supported

SELECT * FROM t1 JOIN t2 ON t1.id = t2.id JOIN t3 ON t2.id = t3.id


```

### Best practices for joins

* Include `WHERE` filters to reduce intermediate result sizes, especially for multi-way joins.
* Join large fact tables through a shared dimension table rather than directly cross-joining two large tables.
* Use `LIMIT` to cap result sizes.

---

## Subqueries

R2 SQL supports subqueries in multiple positions within a query.

### Subqueries in FROM (derived tables)

A subquery in the `FROM` clause creates a derived table that can be referenced in the outer query:

```

SELECT sub.domain, sub.total_requests

FROM (

    SELECT z.domain, COUNT(*) AS total_requests

    FROM my_namespace.zones z

    INNER JOIN my_namespace.http_requests h ON z.zone_id = h.zone_id

    GROUP BY z.domain

) sub

WHERE sub.total_requests > 1000

ORDER BY sub.total_requests DESC

LIMIT 20


```

Note

`LATERAL` derived tables are not supported. Subqueries in `FROM` cannot reference columns from other tables in the same `FROM` clause.

Derived tables can be joined with other derived tables or regular tables:

```

SELECT req.domain, req.total_reqs, fw.total_events

FROM (

    SELECT zone_id, domain, COUNT(*) AS total_reqs

    FROM my_namespace.zones z

    INNER JOIN my_namespace.http_requests h ON z.zone_id = h.zone_id

    GROUP BY zone_id, domain

) req

INNER JOIN (

    SELECT zone_id, COUNT(*) AS total_events

    FROM my_namespace.firewall_events

    GROUP BY zone_id

) fw ON req.zone_id = fw.zone_id

ORDER BY fw.total_events DESC

LIMIT 20


```

### `IN` / `NOT IN` subqueries

Filter rows based on whether a value exists in the result of a subquery:

```

-- Find requests from enterprise zones

SELECT method, status_code, COUNT(*) AS cnt

FROM my_namespace.http_requests

WHERE zone_id IN (

    SELECT zone_id FROM my_namespace.zones WHERE plan = 'enterprise'

)

GROUP BY method, status_code

ORDER BY cnt DESC

LIMIT 20


```

```

-- NOT IN example

SELECT zone_id, COUNT(*) AS cnt

FROM my_namespace.http_requests

WHERE zone_id NOT IN (

    SELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block'

)

GROUP BY zone_id

LIMIT 10


```

Warning

`NOT IN` subqueries are not supported on nullable columns. If the subquery column can contain `NULL` values, use `NOT EXISTS` instead. `SELECT DISTINCT` is also not supported inside subqueries — omit the `DISTINCT` keyword or use `NOT EXISTS`.

```

-- Instead of NOT IN on a nullable column:

SELECT z.domain

FROM my_namespace.zones z

WHERE NOT EXISTS (

    SELECT 1 FROM my_namespace.firewall_events f

    WHERE f.zone_id = z.zone_id

)

LIMIT 20


```

### `EXISTS` / `NOT EXISTS` subqueries

Test for the existence of rows matching a correlated condition:

```

-- Find zones with blocked firewall events (semi-join)

SELECT z.domain, z.plan

FROM my_namespace.zones z

WHERE EXISTS (

    SELECT 1 FROM my_namespace.firewall_events f

    WHERE f.zone_id = z.zone_id AND f.action = 'block'

)

ORDER BY z.domain

LIMIT 20


```

```

-- Find zones with NO firewall events (anti-join)

SELECT z.domain, z.plan

FROM my_namespace.zones z

WHERE NOT EXISTS (

    SELECT 1 FROM my_namespace.firewall_events f

    WHERE f.zone_id = z.zone_id

)

ORDER BY z.domain

LIMIT 20


```

### Scalar subqueries

A subquery that returns a single value can be used in `SELECT`, `WHERE`, or `HAVING`:

```

-- In SELECT (constant value per row)

SELECT z.domain, z.plan,

       (SELECT COUNT(*) FROM my_namespace.zones) AS total_zones

FROM my_namespace.zones z

WHERE z.plan = 'enterprise'

LIMIT 10


```

```

-- In WHERE (comparison)

SELECT z.domain, z.plan, z.requests_30d

FROM my_namespace.zones z

WHERE z.requests_30d > (

    SELECT AVG(requests_30d) FROM my_namespace.zones

)

ORDER BY z.requests_30d DESC

LIMIT 20


```

---

## WHERE clause

### Syntax

```

SELECT * FROM namespace_name.table_name WHERE condition [AND | OR condition ...]


```

### Conditions

#### Comparison operators

`=`, `!=`, `<>`, `<`, `>`, `<=`, `>=`

#### Null checks

* `column_name IS NULL`
* `column_name IS NOT NULL`

#### Boolean checks

* `IS TRUE`, `IS FALSE`, `IS NOT TRUE`, `IS NOT FALSE`
* `IS UNKNOWN`, `IS NOT UNKNOWN`

#### Range

* `column_name BETWEEN value1 AND value2`
* `column_name NOT BETWEEN value1 AND value2`

#### List membership

* `column_name IN ('value1', 'value2')`
* `column_name NOT IN ('value1', 'value2')`

#### Pattern matching

* `column_name LIKE 'pattern'`
* `column_name NOT LIKE 'pattern'`
* `column_name ILIKE 'pattern'` (case-insensitive)
* `column_name NOT ILIKE 'pattern'`
* `column_name SIMILAR TO 'regex_pattern'`

#### Logical operators

* `AND`
* `OR`
* `NOT`

### Examples

```

SELECT * FROM my_namespace.sales_data

WHERE timestamp BETWEEN '2025-09-24T01:00:00Z' AND '2025-09-25T01:00:00Z'


SELECT * FROM my_namespace.sales_data

WHERE status = 200 AND response_time > 1000


SELECT * FROM my_namespace.sales_data

WHERE (region = 'North' OR region = 'South')

  AND total_amount IS NOT NULL


SELECT * FROM my_namespace.sales_data

WHERE department ILIKE '%eng%'


```

---

## GROUP BY clause

### Syntax

```

SELECT column_list, aggregation_function(column)

FROM namespace_name.table_name

[WHERE conditions]

GROUP BY column_list


```

### Examples

```

SELECT department, COUNT(*) AS dept_count

FROM my_namespace.sales_data

GROUP BY department


SELECT department, category, SUM(total_amount) AS total

FROM my_namespace.sales_data

GROUP BY department, category


```

---

## HAVING clause

### Syntax

```

SELECT column_list, aggregation_function(column) AS alias

FROM namespace_name.table_name

GROUP BY column_list

HAVING aggregation_function(column) comparison_operator value


```

### Examples

```

SELECT department, COUNT(*) AS dept_count

FROM my_namespace.sales_data

GROUP BY department

HAVING COUNT(*) > 1000


SELECT region, SUM(total_amount) AS total

FROM my_namespace.sales_data

GROUP BY region

HAVING SUM(total_amount) > 1000000


```

---

## ORDER BY clause

### Syntax

```

ORDER BY expression [ASC | DESC] [, expression [ASC | DESC], ...]


```

* **ASC**: Ascending order (default)
* **DESC**: Descending order
* Multi-column ordering is supported

### Examples

```

SELECT customer_id, total_amount

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

ORDER BY total_amount DESC

LIMIT 50


SELECT department, COUNT(*) AS dept_count

FROM my_namespace.sales_data

GROUP BY department

ORDER BY dept_count DESC, department ASC


```

---

## LIMIT clause

### Syntax

```

LIMIT number


```

* **Type**: Integer only
* **Default**: 500

### Examples

```

SELECT * FROM my_namespace.sales_data LIMIT 100


```

---

## Set operations

Set operations combine the results of two or more `SELECT` statements.

### Syntax

```

SELECT ... FROM table1

UNION | UNION ALL | INTERSECT | EXCEPT

SELECT ... FROM table2


```

### Supported operations

| Operation | Description                                                        |
| --------- | ------------------------------------------------------------------ |
| UNION     | Returns all rows from both queries, removing duplicates            |
| UNION ALL | Returns all rows from both queries, including duplicates           |
| INTERSECT | Returns only rows that appear in both query results                |
| EXCEPT    | Returns rows from the first query that do not appear in the second |

### Examples

#### Union

```

-- Find zones that had either firewall blocks OR high-risk requests

SELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block'

UNION

SELECT zone_id FROM my_namespace.http_requests WHERE risk_score > 0.8


```

#### Intersect

```

-- Find zones with both firewall blocks AND entries in the zones table

SELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block'

INTERSECT

SELECT zone_id FROM my_namespace.zones WHERE plan = 'enterprise'


```

#### Except

```

-- Find enterprise zones that have no firewall events

SELECT zone_id FROM my_namespace.zones WHERE plan = 'enterprise'

EXCEPT

SELECT zone_id FROM my_namespace.firewall_events


```

### Requirements

* All queries in a set operation must return the same number of columns.
* Corresponding columns must have compatible data types.
* Column names in the result are taken from the first query.

---

## EXPLAIN

Returns the execution plan for a query without running it.

```

EXPLAIN SELECT department, COUNT(*) AS dept_count

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

GROUP BY department;


```

### EXPLAIN FORMAT JSON

Returns the execution plan as structured JSON for programmatic analysis.

```

EXPLAIN FORMAT JSON SELECT * FROM my_namespace.sales_data LIMIT 10;


```

---

## Expressions

Expressions can be used in `SELECT`, `WHERE`, `GROUP BY`, `HAVING`, and `ORDER BY` clauses.

### Literals

```

SELECT 42 AS int_val, 3.14 AS float_val, 'hello' AS str_val, TRUE AS bool_val, NULL AS null_val

FROM my_namespace.sales_data LIMIT 1


```

### Arithmetic operators

`+`, `-`, `*`, `/`, `%`

```

SELECT customer_id, total_amount * 1.1 AS total_with_tax, total_amount % 10 AS remainder

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 5


```

### String concatenation

```

SELECT customer_id || ' - ' || region AS label

FROM my_namespace.sales_data

LIMIT 5


```

### CASE expressions

Searched form:

```

SELECT customer_id,

    CASE

        WHEN total_amount > 1000 THEN 'high'

        WHEN total_amount > 100 THEN 'medium'

        ELSE 'low'

    END AS tier

FROM my_namespace.sales_data

LIMIT 10


```

Simple form:

```

SELECT customer_id,

    CASE region

        WHEN 'North' THEN 'N'

        WHEN 'South' THEN 'S'

        ELSE 'Other'

    END AS region_code

FROM my_namespace.sales_data

LIMIT 10


```

### Type casting

```

-- CAST

SELECT CAST(total_amount AS INT) AS amount_int FROM my_namespace.sales_data LIMIT 5


-- TRY_CAST (returns NULL on failure instead of error)

SELECT TRY_CAST(customer_id AS INT) AS id_int FROM my_namespace.sales_data LIMIT 5


-- Shorthand (::)

SELECT total_amount::INT AS amount_int FROM my_namespace.sales_data LIMIT 5


```

### EXTRACT

```

SELECT EXTRACT(YEAR FROM timestamp) AS yr,

       EXTRACT(MONTH FROM timestamp) AS mo,

       EXTRACT(DAY FROM timestamp) AS dy

FROM my_namespace.sales_data

LIMIT 1


```

---

## Data type reference

| Type      | Description     | Example Values               |
| --------- | --------------- | ---------------------------- |
| integer   | Whole numbers   | 1, 42, \-10, 0               |
| float     | Decimal numbers | 1.5, 3.14, \-2.7, 0.0        |
| string    | Text values     | 'hello', 'GET', '2024-01-01' |
| boolean   | Boolean values  | true, false                  |
| timestamp | RFC3339         | '2025-09-24T01:00:00Z'       |
| date      | Date values     | '2025-09-24'                 |
| struct    | Named fields    | struct\_col\['field\_name'\] |
| array     | Ordered list    | array\_col\[1\] (1-indexed)  |
| map       | Key-value pairs | map\_keys(map\_col)          |

---

## Operator precedence

1. **Comparison operators**: `=`, `!=`, `<`, `<=`, `>`, `>=`, `LIKE`, `BETWEEN`, `IS NULL`, `IS NOT NULL`
2. **AND** (higher precedence)
3. **OR** (lower precedence)

Use parentheses to override default precedence:

```

SELECT * FROM my_namespace.sales_data WHERE (status = 404 OR status = 500) AND region = 'North'


```

---

## Complete query examples

### Basic query

```

SELECT *

FROM my_namespace.sales_data

WHERE timestamp BETWEEN '2025-09-24T01:00:00Z' AND '2025-09-25T01:00:00Z'

LIMIT 100


```

### Filtered query with sorting

```

SELECT customer_id, timestamp, status, total_amount

FROM my_namespace.sales_data

WHERE status >= 400 AND total_amount > 5000

ORDER BY total_amount DESC

LIMIT 50


```

### Aggregation with HAVING

```

SELECT region, COUNT(*) AS region_count, AVG(total_amount) AS avg_amount

FROM my_namespace.sales_data

WHERE status = 'completed'

GROUP BY region

HAVING COUNT(*) > 1000

ORDER BY avg_amount DESC

LIMIT 20


```

### Conditional categorization

```

SELECT customer_id,

    CASE

        WHEN total_amount >= 1000 THEN 'Premium'

        WHEN total_amount >= 100 THEN 'Standard'

        ELSE 'Basic'

    END AS tier,

    total_amount

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

ORDER BY total_amount DESC

LIMIT 20


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/sql-reference/","name":"SQL reference"}}]}
```

---

---
title: Aggregate functions
description: Reference for all 33 aggregate functions supported in R2 SQL, organized by category.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Aggregate functions

Aggregate functions collapse multiple rows into a single result. They are used with `GROUP BY` to compute summaries per group, or without `GROUP BY` to compute a single result across all rows.

Note

The following aggregates are not supported: `PERCENTILE_CONT`, `MEDIAN`, `ARRAY_AGG`, `STRING_AGG`, and any `func(DISTINCT ...)`. Use the approximate alternatives where available.

---

## Basic aggregates

### COUNT

Counts rows. `COUNT(*)` counts all rows. `COUNT(column)` counts non-NULL values.

```

SELECT COUNT(*) AS total_rows

FROM my_namespace.sales_data


SELECT department, COUNT(*) AS dept_count

FROM my_namespace.sales_data

GROUP BY department

ORDER BY dept_count DESC


```

### SUM

Returns the sum of values in a column.

```

SELECT SUM(total_amount) AS grand_total

FROM my_namespace.sales_data


SELECT department, SUM(total_amount) AS dept_total

FROM my_namespace.sales_data

GROUP BY department

ORDER BY dept_total DESC


```

### AVG

Returns the average of values in a column. Alias: `mean`.

```

SELECT AVG(total_amount) AS avg_amount

FROM my_namespace.sales_data


SELECT department, AVG(total_amount) AS avg_amount

FROM my_namespace.sales_data

GROUP BY department

ORDER BY avg_amount DESC


```

### MIN

Returns the minimum value. Works on numeric and string columns.

```

SELECT MIN(total_amount) AS min_amount, MIN(customer_id) AS first_customer

FROM my_namespace.sales_data


SELECT department, MIN(total_amount) AS min_amount

FROM my_namespace.sales_data

GROUP BY department


```

### MAX

Returns the maximum value. Works on numeric and string columns.

```

SELECT MAX(total_amount) AS max_amount, MAX(customer_id) AS last_customer

FROM my_namespace.sales_data


SELECT department, MAX(total_amount) AS max_amount

FROM my_namespace.sales_data

GROUP BY department


```

---

## Approximate aggregates

Approximate aggregation functions produce statistically estimated results while using significantly less memory and compute than their exact counterparts. Use them when analyzing large datasets and an approximate result is acceptable.

### approx\_percentile\_cont

Returns the approximate value at a given percentile using a T-Digest algorithm. The percentile parameter must be between `0.0` and `1.0` inclusive.

```

SELECT approx_percentile_cont(total_amount, 0.5) AS median,

       approx_percentile_cont(total_amount, 0.95) AS p95

FROM my_namespace.sales_data


SELECT department,

       approx_percentile_cont(total_amount, 0.5) AS median

FROM my_namespace.sales_data

GROUP BY department

ORDER BY median DESC


```

### approx\_percentile\_cont\_with\_weight

Returns the approximate weighted percentile. Rows are weighted by the `weight` column.

```

SELECT approx_percentile_cont_with_weight(unit_price, quantity, 0.5) AS weighted_median

FROM my_namespace.sales_data

WHERE unit_price IS NOT NULL AND quantity IS NOT NULL


```

### approx\_median

Returns the approximate median. Equivalent to `approx_percentile_cont(column, 0.5)`.

```

SELECT approx_median(total_amount) AS median_amount

FROM my_namespace.sales_data


SELECT department, approx_median(total_amount) AS median

FROM my_namespace.sales_data

GROUP BY department


```

### approx\_distinct

Returns the approximate count of distinct values using HyperLogLog.

```

SELECT approx_distinct(customer_id) AS unique_customers

FROM my_namespace.sales_data


SELECT department, approx_distinct(customer_id) AS unique_customers

FROM my_namespace.sales_data

GROUP BY department


```

### approx\_top\_k

Returns the _k_ most frequent values with their approximate counts.

```

SELECT approx_top_k(department, 5) AS top_departments

FROM my_namespace.sales_data


```

---

## Statistical aggregates

### var / var\_samp

Returns the sample variance.

```

SELECT var(total_amount) AS variance

FROM my_namespace.sales_data


SELECT department, var(total_amount) AS variance

FROM my_namespace.sales_data

GROUP BY department


```

### var\_pop

Returns the population variance.

```

SELECT var_pop(total_amount) AS pop_variance

FROM my_namespace.sales_data


```

### stddev / stddev\_samp

Returns the sample standard deviation.

```

SELECT stddev(total_amount) AS std_dev

FROM my_namespace.sales_data


SELECT department, stddev(total_amount) AS std_dev

FROM my_namespace.sales_data

GROUP BY department


```

### stddev\_pop

Returns the population standard deviation.

```

SELECT stddev_pop(total_amount) AS pop_std_dev

FROM my_namespace.sales_data


```

### covar\_samp

Returns the sample covariance. Alias: `covar`.

```

SELECT covar_samp(total_amount, CAST(quantity AS DOUBLE)) AS covariance

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL AND quantity IS NOT NULL


```

### covar\_pop

Returns the population covariance.

```

SELECT covar_pop(total_amount, CAST(quantity AS DOUBLE)) AS pop_covariance

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL AND quantity IS NOT NULL


```

### corr

Returns the Pearson correlation coefficient between two columns.

```

SELECT corr(total_amount, CAST(quantity AS DOUBLE)) AS correlation

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL AND quantity IS NOT NULL


```

### regr\_slope

Returns the slope of the linear regression line.

```

SELECT regr_slope(total_amount, CAST(quantity AS DOUBLE)) AS slope

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL AND quantity IS NOT NULL


```

### regr\_intercept

Returns the y-intercept of the linear regression line.

```

SELECT regr_intercept(total_amount, CAST(quantity AS DOUBLE)) AS intercept

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL AND quantity IS NOT NULL


```

### regr\_count

Returns the count of non-NULL pairs.

```

SELECT regr_count(total_amount, CAST(quantity AS DOUBLE)) AS pair_count

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL AND quantity IS NOT NULL


```

### regr\_r2

Returns the coefficient of determination (R-squared).

```

SELECT regr_r2(total_amount, CAST(quantity AS DOUBLE)) AS r_squared

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL AND quantity IS NOT NULL


```

### regr\_avgx

Returns the average of the independent variable (x) for non-NULL pairs.

```

SELECT regr_avgx(total_amount, CAST(quantity AS DOUBLE)) AS avg_qty

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL AND quantity IS NOT NULL


```

### regr\_avgy

Returns the average of the dependent variable (y) for non-NULL pairs.

```

SELECT regr_avgy(total_amount, CAST(quantity AS DOUBLE)) AS avg_amount

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL AND quantity IS NOT NULL


```

### regr\_sxx

Returns the sum of squares of the independent variable.

```

SELECT regr_sxx(total_amount, CAST(quantity AS DOUBLE)) AS sxx

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL AND quantity IS NOT NULL


```

### regr\_syy

Returns the sum of squares of the dependent variable.

```

SELECT regr_syy(total_amount, CAST(quantity AS DOUBLE)) AS syy

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL AND quantity IS NOT NULL


```

### regr\_sxy

Returns the sum of products of the paired variables.

```

SELECT regr_sxy(total_amount, CAST(quantity AS DOUBLE)) AS sxy

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL AND quantity IS NOT NULL


```

---

## Bitwise aggregates

### bit\_and

Returns the bitwise AND of all values in a group.

```

SELECT department, bit_and(quantity) AS and_result

FROM my_namespace.sales_data

WHERE quantity IS NOT NULL

GROUP BY department


```

### bit\_or

Returns the bitwise OR of all values in a group.

```

SELECT department, bit_or(quantity) AS or_result

FROM my_namespace.sales_data

WHERE quantity IS NOT NULL

GROUP BY department


```

### bit\_xor

Returns the bitwise XOR of all values in a group.

```

SELECT department, bit_xor(quantity) AS xor_result

FROM my_namespace.sales_data

WHERE quantity IS NOT NULL

GROUP BY department


```

---

## Boolean aggregates

### bool\_and

Returns true if all values in a group are true.

```

SELECT department, bool_and(is_completed) AS all_completed

FROM my_namespace.sales_data

WHERE is_completed IS NOT NULL

GROUP BY department


```

### bool\_or

Returns true if any value in a group is true.

```

SELECT department, bool_or(is_completed) AS any_completed

FROM my_namespace.sales_data

WHERE is_completed IS NOT NULL

GROUP BY department


```

---

## Positional aggregates

### first\_value

Returns the first value in a group according to the specified ordering.

```

SELECT department,

       first_value(customer_id ORDER BY total_amount ASC) AS lowest_spender

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

GROUP BY department


```

### last\_value

Returns the last value in a group according to the specified ordering.

```

SELECT department,

       last_value(customer_id ORDER BY total_amount ASC) AS highest_spender

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

GROUP BY department


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/sql-reference/","name":"SQL reference"}},{"@type":"ListItem","position":4,"item":{"@id":"/r2-sql/sql-reference/aggregate-functions/","name":"Aggregate functions"}}]}
```

---

---
title: Complex types
description: Reference for querying struct, array, and map column types in R2 SQL.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Complex types

R2 SQL supports querying struct, array, and map column types stored in Iceberg tables. This page covers access patterns, supported functions, and examples for each type.

---

## Structs

Struct columns contain named fields. Access fields using bracket notation or the `get_field()` function.

### Bracket notation

```

SELECT pricing['price'] AS price,

       pricing['discount_percent'] AS discount

FROM my_namespace.products

LIMIT 5


```

### get\_field function

```

SELECT get_field(pricing, 'price') AS price,

       get_field(pricing, 'discount_percent') AS discount

FROM my_namespace.products

LIMIT 5


```

### Struct fields in WHERE

```

SELECT customer_id, pricing['price'] AS price

FROM my_namespace.products

WHERE pricing['price'] > 50

LIMIT 10


```

### Struct fields in ORDER BY

```

SELECT customer_id, pricing['price'] AS price

FROM my_namespace.products

WHERE pricing['price'] IS NOT NULL

ORDER BY pricing['price'] DESC

LIMIT 10


```

### Struct fields in GROUP BY

```

SELECT platforms['windows'] AS windows_support,

       COUNT(*) AS product_count,

       AVG(pricing['price']) AS avg_price

FROM my_namespace.products

WHERE pricing['price'] IS NOT NULL

GROUP BY platforms['windows']


```

### Creating structs inline

```

-- named_struct creates a struct with named fields

SELECT named_struct('id', customer_id, 'amount', total_amount) AS info

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 1


-- struct creates a struct with positional fields

SELECT struct(customer_id, total_amount, region) AS info

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 1


```

---

## Arrays

Array columns contain ordered lists of values. Array indexing is **1-based**.

### Index access

```

SELECT customer_id, tags[1] AS first_tag, tags[2] AS second_tag

FROM my_namespace.products

LIMIT 5


```

### Create arrays

#### make\_array

Creates an array from a list of values.

```

SELECT make_array(1, 2, 3) AS nums

FROM my_namespace.sales_data

LIMIT 1


```

#### string\_to\_array

Splits a string into an array by a delimiter.

```

SELECT string_to_array(categories, ',') AS cat_array

FROM my_namespace.products

WHERE categories IS NOT NULL

LIMIT 5


```

#### range

Generates an array of integers from start (inclusive) to stop (exclusive).

```

SELECT range(0, 5) AS nums

FROM my_namespace.sales_data

LIMIT 1


```

#### generate\_series

Generates an array of integers from start to stop (inclusive).

```

SELECT generate_series(1, 5) AS nums

FROM my_namespace.sales_data

LIMIT 1


```

### Inspect arrays

#### array\_length

Returns the number of elements in an array.

```

SELECT customer_id, array_length(tags) AS tag_count

FROM my_namespace.products

LIMIT 5


```

#### cardinality

Returns the total number of elements in an array. Alias for `array_length`.

```

SELECT customer_id, cardinality(tags) AS tag_count

FROM my_namespace.products

LIMIT 5


```

#### empty

Returns true if an array has zero elements.

```

SELECT customer_id, empty(tags) AS has_no_tags

FROM my_namespace.products

LIMIT 5


```

#### array\_ndims

Returns the number of dimensions of an array.

```

SELECT array_ndims(make_array(1, 2, 3)) AS ndims

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_dims

Returns the dimensions of an array.

```

SELECT array_dims(make_array(1, 2, 3)) AS dims

FROM my_namespace.sales_data

LIMIT 1


```

### Search arrays

#### array\_has

Returns true if an array contains a value.

```

SELECT customer_id, array_has(tags, 'premium') AS is_premium

FROM my_namespace.products

LIMIT 5


```

#### array\_has\_all

Returns true if the first array contains all elements of the second.

```

SELECT array_has_all(make_array(1, 2, 3, 4), make_array(2, 3)) AS has_all

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_has\_any

Returns true if the first array contains any element of the second.

```

SELECT array_has_any(make_array(1, 2, 3), make_array(3, 4, 5)) AS has_any

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_position

Returns the position of the first occurrence of a value (1-indexed). Returns 0 if not found.

```

SELECT array_position(make_array('a', 'b', 'c', 'b'), 'b') AS pos

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_positions

Returns all positions of a value as an array.

```

SELECT array_positions(make_array(1, 2, 1, 3, 1), 1) AS positions

FROM my_namespace.sales_data

LIMIT 1


```

### Transform arrays

#### array\_sort

Sorts array elements.

```

SELECT array_sort(make_array(3, 1, 2)) AS sorted

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_reverse

Reverses the order of array elements.

```

SELECT array_reverse(make_array(1, 2, 3)) AS reversed

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_distinct

Removes duplicate elements from an array.

```

SELECT array_distinct(make_array(1, 2, 2, 3, 3, 3)) AS unique_vals

FROM my_namespace.sales_data

LIMIT 1


```

#### flatten

Flattens a nested array by one level.

```

SELECT flatten(make_array(make_array(1, 2), make_array(3, 4))) AS flat

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_slice

Returns a slice of an array from a start index to an end index (both inclusive, 1-indexed).

```

SELECT array_slice(make_array(10, 20, 30, 40, 50), 2, 4) AS sliced

FROM my_namespace.sales_data

LIMIT 1


```

### Modify arrays

#### array\_append

Appends a value to the end of an array.

```

SELECT array_append(make_array(1, 2, 3), 4) AS appended

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_prepend

Prepends a value to the beginning of an array.

```

SELECT array_prepend(0, make_array(1, 2, 3)) AS prepended

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_concat

Concatenates two or more arrays.

```

SELECT array_concat(make_array(1, 2), make_array(3, 4)) AS merged

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_remove

Removes the first occurrence of a value from an array.

```

SELECT array_remove(make_array(1, 2, 3, 2), 2) AS result

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_remove\_all

Removes all occurrences of a value from an array.

```

SELECT array_remove_all(make_array(1, 2, 3, 2, 2), 2) AS result

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_remove\_n

Removes the first _n_ occurrences of a value from an array.

```

SELECT array_remove_n(make_array(1, 2, 2, 2, 3), 2, 2) AS result

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_replace

Replaces the first occurrence of a value in an array.

```

SELECT array_replace(make_array(1, 2, 3), 2, 99) AS result

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_replace\_n

Replaces the first _n_ occurrences of a value in an array.

```

SELECT array_replace_n(make_array(1, 2, 2, 2, 3), 2, 99, 2) AS result

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_replace\_all

Replaces all occurrences of a value in an array.

```

SELECT array_replace_all(make_array(1, 2, 3, 2), 2, 99) AS result

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_pop\_back

Removes the last element from an array.

```

SELECT array_pop_back(make_array(1, 2, 3)) AS result

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_pop\_front

Removes the first element from an array.

```

SELECT array_pop_front(make_array(1, 2, 3)) AS result

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_repeat

Repeats a value a given number of times as an array.

```

SELECT array_repeat(region, 3) AS repeated

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_resize

Resizes an array to a given length, filling with a default value.

```

SELECT array_resize(make_array(1, 2), 5, 0) AS resized

FROM my_namespace.sales_data

LIMIT 1


```

### Set operations on arrays

#### array\_intersect

Returns elements common to both arrays.

```

SELECT array_intersect(make_array(1, 2, 3), make_array(2, 3, 4)) AS common

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_union

Returns all unique elements from both arrays.

```

SELECT array_union(make_array(1, 2, 3), make_array(3, 4, 5)) AS merged

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_except

Returns elements in the first array that are not in the second.

```

SELECT array_except(make_array(1, 2, 3, 4), make_array(2, 4)) AS diff

FROM my_namespace.sales_data

LIMIT 1


```

### Aggregate array values

#### array\_max

Returns the maximum value in an array.

```

SELECT customer_id, array_max(scores) AS max_score

FROM my_namespace.products

LIMIT 5


```

#### array\_min

Returns the minimum value in an array.

```

SELECT customer_id, array_min(scores) AS min_score

FROM my_namespace.products

LIMIT 5


```

#### array\_any\_value

Returns the first non-NULL value in an array.

```

SELECT array_any_value(make_array(NULL, 42, NULL)) AS first_val

FROM my_namespace.sales_data

LIMIT 1


```

#### array\_element

Returns the element at a given index (1-indexed). Equivalent to bracket-notation access (`arr[idx]`).

```

SELECT array_element(make_array(10, 20, 30), 2) AS second_val

FROM my_namespace.sales_data

LIMIT 1


```

### Convert arrays

#### array\_to\_string

Joins array elements into a string with a separator.

```

SELECT customer_id, array_to_string(tags, ', ') AS tag_list

FROM my_namespace.products

LIMIT 5


```

---

## Maps

Map columns store key-value pairs. Use `map_keys`, `map_values`, and `map_extract` to query them.

### map\_keys

Returns all keys from a map as an array.

```

SELECT map_keys(metadata) AS keys

FROM my_namespace.products

LIMIT 5


```

### map\_values

Returns all values from a map as an array.

```

SELECT map_values(metadata) AS vals

FROM my_namespace.products

LIMIT 5


```

### map\_extract

Returns the value for a specific key.

```

SELECT map_extract(metadata, 'source') AS source,

       map_extract(metadata, 'store_name') AS store

FROM my_namespace.products

LIMIT 5


```

### Creating maps inline

```

SELECT map(make_array('a', 'b'), make_array(1, 2)) AS m

FROM my_namespace.sales_data

LIMIT 1


```

---

## Complete function index

### Struct functions

| Function                    | Description                          |
| --------------------------- | ------------------------------------ |
| struct\_col\['field'\]      | Bracket notation field access        |
| get\_field(struct, 'field') | Function-based field access          |
| named\_struct(k1, v1, ...)  | Create struct with named fields      |
| struct(v1, v2, ...)         | Create struct with positional fields |

### Array functions

| Function                            | Description                              |
| ----------------------------------- | ---------------------------------------- |
| make\_array(v1, v2, ...)            | Create array from values                 |
| string\_to\_array(str, delim)       | Split string into array                  |
| range(start, stop)                  | Generate integer range (exclusive stop)  |
| generate\_series(start, stop)       | Generate integer series (inclusive stop) |
| array\_length(arr)                  | Number of elements                       |
| cardinality(arr)                    | Number of elements                       |
| empty(arr)                          | True if empty                            |
| array\_ndims(arr)                   | Number of dimensions                     |
| array\_dims(arr)                    | Dimension information                    |
| array\_has(arr, val)                | Contains check                           |
| array\_has\_all(arr, arr2)          | Contains all check                       |
| array\_has\_any(arr, arr2)          | Contains any check                       |
| array\_position(arr, val)           | First position of value                  |
| array\_positions(arr, val)          | All positions of value                   |
| array\_sort(arr)                    | Sort elements                            |
| array\_reverse(arr)                 | Reverse order                            |
| array\_distinct(arr)                | Remove duplicates                        |
| flatten(arr)                        | Flatten one level                        |
| array\_slice(arr, start, end)       | Extract sub-array                        |
| array\_append(arr, val)             | Append to end                            |
| array\_prepend(val, arr)            | Prepend to start                         |
| array\_concat(arr1, arr2)           | Concatenate arrays                       |
| array\_remove(arr, val)             | Remove first occurrence                  |
| array\_remove\_all(arr, val)        | Remove all occurrences                   |
| array\_remove\_n(arr, val, n)       | Remove first _n_ occurrences             |
| array\_replace(arr, old, new)       | Replace first occurrence                 |
| array\_replace\_n(arr, old, new, n) | Replace first _n_ occurrences            |
| array\_replace\_all(arr, old, new)  | Replace all occurrences                  |
| array\_pop\_back(arr)               | Remove last element                      |
| array\_pop\_front(arr)              | Remove first element                     |
| array\_repeat(val, n)               | Repeat value _n_ times                   |
| array\_resize(arr, size, default)   | Resize with default fill                 |
| array\_intersect(arr1, arr2)        | Common elements                          |
| array\_union(arr1, arr2)            | Union of elements                        |
| array\_except(arr1, arr2)           | Difference of elements                   |
| array\_max(arr)                     | Maximum value                            |
| array\_min(arr)                     | Minimum value                            |
| array\_any\_value(arr)              | First non-NULL value                     |
| array\_to\_string(arr, delim)       | Join elements as string                  |
| array\_element(arr, idx)            | Element at index                         |

### Map functions

| Function                  | Description                          |
| ------------------------- | ------------------------------------ |
| map(keys\_arr, vals\_arr) | Create map from key and value arrays |
| map\_keys(map)            | All keys as array                    |
| map\_values(map)          | All values as array                  |
| map\_extract(map, key)    | Value for a specific key             |

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/sql-reference/","name":"SQL reference"}},{"@type":"ListItem","position":4,"item":{"@id":"/r2-sql/sql-reference/complex-types/","name":"Complex types"}}]}
```

---

---
title: Scalar functions
description: Reference for all 173 scalar functions supported in R2 SQL, organized by category.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Scalar functions

Scalar functions transform individual values and can be used in `SELECT`, `WHERE`, `GROUP BY`, `HAVING`, and `ORDER BY` clauses.

---

## Core functions

### arrow\_cast

Casts an expression to a specific Arrow data type by string name.

```

SELECT arrow_cast(total_amount, 'Float32') AS amount_f32

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 1


```

### arrow\_typeof

Returns the Arrow data type name of an expression.

```

SELECT arrow_typeof(total_amount) AS amount_type,

       arrow_typeof(customer_id) AS id_type

FROM my_namespace.sales_data

LIMIT 1


```

### coalesce

Returns the first non-NULL argument.

```

SELECT coalesce(department, region, 'unknown') AS first_val

FROM my_namespace.sales_data

LIMIT 5


```

### get\_field

Extracts a field from a struct by name.

```

SELECT get_field(named_struct('customer', customer_id, 'amount', total_amount), 'amount') AS amt

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 1


```

### greatest

Returns the largest value from a list of arguments.

```

SELECT greatest(total_amount, unit_price, quantity) AS max_val

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 5


```

### least

Returns the smallest value from a list of arguments.

```

SELECT least(total_amount, unit_price, quantity) AS min_val

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 5


```

### named\_struct

Creates a struct with named fields from key-value pairs.

```

SELECT named_struct('customer', customer_id, 'amount', total_amount) AS info

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 1


```

### nullif

Returns NULL if both arguments are equal, otherwise returns the first argument.

```

SELECT nullif(department, 'Unknown') AS dept

FROM my_namespace.sales_data

LIMIT 5


```

### nvl

Returns the second argument if the first is NULL. Alias: `ifnull`.

```

SELECT nvl(department, 'N/A') AS dept

FROM my_namespace.sales_data

LIMIT 5


```

### nvl2

Returns the second argument if the first is not NULL, otherwise returns the third.

```

SELECT nvl2(department, 'has_dept', 'no_dept') AS dept_status

FROM my_namespace.sales_data

LIMIT 5


```

### overlay

Replaces a substring at a given position.

```

SELECT customer_id,

       overlay(customer_id PLACING 'XX' FROM 1 FOR 2) AS masked

FROM my_namespace.sales_data

LIMIT 3


```

### struct

Creates a struct with positional fields. Alias: `row`.

```

SELECT struct(customer_id, total_amount, region) AS info

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 1


```

---

## Crypto functions

### digest

Returns a hash of a string using a specified algorithm. Supported algorithms: `md5`, `sha224`, `sha256`, `sha384`, `sha512`.

```

SELECT customer_id, digest(customer_id, 'sha256') AS hash

FROM my_namespace.sales_data

LIMIT 1


```

### md5

Returns the MD5 hash of a string.

```

SELECT customer_id, md5(customer_id) AS hash

FROM my_namespace.sales_data

LIMIT 1


```

### sha224

Returns the SHA-224 hash of a string.

```

SELECT sha224(customer_id) AS hash

FROM my_namespace.sales_data

LIMIT 1


```

### sha256

Returns the SHA-256 hash of a string.

```

SELECT customer_id, sha256(customer_id) AS hash

FROM my_namespace.sales_data

LIMIT 1


```

### sha384

Returns the SHA-384 hash of a string.

```

SELECT sha384(customer_id) AS hash

FROM my_namespace.sales_data

LIMIT 1


```

### sha512

Returns the SHA-512 hash of a string.

```

SELECT sha512(customer_id) AS hash

FROM my_namespace.sales_data

LIMIT 1


```

---

## Datetime functions

### current\_date

Returns today's date. Alias: `today`.

```

SELECT current_date() AS today_date

FROM my_namespace.sales_data

LIMIT 1


```

### current\_time

Returns the current time. Precision is quantized to 10ms boundaries.

```

SELECT current_time() AS now_time

FROM my_namespace.sales_data

LIMIT 1


```

### date\_bin

Bins a timestamp into fixed-size intervals aligned to an origin.

```

SELECT date_bin(INTERVAL '1 hour', timestamp, '2025-01-01T00:00:00Z') AS hour_bin,

       COUNT(*) AS cnt

FROM my_namespace.sales_data

GROUP BY date_bin(INTERVAL '1 hour', timestamp, '2025-01-01T00:00:00Z')

ORDER BY hour_bin

LIMIT 5


```

### date\_part

Extracts a component from a timestamp. Alias: `datepart`.

Supported fields: `year`, `month`, `day`, `hour`, `minute`, `second`, `millisecond`, `microsecond`, `week`, `dow`, `doy`, `quarter`, `epoch`.

```

SELECT date_part('hour', timestamp) AS hr,

       date_part('minute', timestamp) AS mn

FROM my_namespace.sales_data

LIMIT 1


```

### date\_trunc

Truncates a timestamp to a specified unit. Alias: `datetrunc`.

Supported units: `year`, `month`, `week`, `day`, `hour`, `minute`, `second`.

```

SELECT date_trunc('day', timestamp) AS day_trunc, COUNT(*) AS cnt

FROM my_namespace.sales_data

GROUP BY date_trunc('day', timestamp)

ORDER BY day_trunc

LIMIT 5


```

### from\_unixtime

Converts a Unix epoch (seconds) to a timestamp.

```

SELECT from_unixtime(1770000000) AS ts

FROM my_namespace.sales_data

LIMIT 1


```

### make\_date

Constructs a date from year, month, and day components.

```

SELECT make_date(2026, 3, 1) AS d

FROM my_namespace.sales_data

LIMIT 1


```

### make\_time

Constructs a time from hour, minute, and second components.

```

SELECT make_time(14, 30, 0) AS t

FROM my_namespace.sales_data

LIMIT 1


```

### now

Returns the current timestamp. Aliases: `current_timestamp`.

Precision is quantized to 10ms boundaries.

```

SELECT now() AS current_ts

FROM my_namespace.sales_data

LIMIT 1


```

### to\_char

Formats a timestamp as a string using strftime format. Alias: `date_format`.

```

SELECT to_char(timestamp, '%Y-%m-%d %H:%M') AS formatted

FROM my_namespace.sales_data

LIMIT 1


```

### to\_date

Parses a date from a string using a format pattern.

```

SELECT to_date('2026-03-01', '%Y-%m-%d') AS d

FROM my_namespace.sales_data

LIMIT 1


```

### to\_local\_time

Strips timezone information from a timestamp.

```

SELECT to_local_time(timestamp) AS local_ts

FROM my_namespace.sales_data

LIMIT 1


```

### to\_time

Parses a time from a string using a format pattern.

```

SELECT to_time('14:30:00', '%H:%M:%S') AS t

FROM my_namespace.sales_data

LIMIT 1


```

### to\_timestamp

Parses a timestamp from a string using a format pattern.

```

SELECT to_timestamp('2026-03-01 12:00:00', '%Y-%m-%d %H:%M:%S') AS ts

FROM my_namespace.sales_data

LIMIT 1


```

### to\_timestamp\_micros

Converts microseconds since Unix epoch to a timestamp.

```

SELECT to_timestamp_micros(1770000000000000) AS ts

FROM my_namespace.sales_data

LIMIT 1


```

### to\_timestamp\_millis

Converts milliseconds since Unix epoch to a timestamp.

```

SELECT to_timestamp_millis(1770000000000) AS ts

FROM my_namespace.sales_data

LIMIT 1


```

### to\_timestamp\_nanos

Converts nanoseconds since Unix epoch to a timestamp. Large values may overflow.

```

SELECT to_timestamp_nanos(1770000000000000000) AS ts

FROM my_namespace.sales_data

LIMIT 1


```

### to\_timestamp\_seconds

Converts seconds since Unix epoch to a timestamp.

```

SELECT to_timestamp_seconds(1770000000) AS ts

FROM my_namespace.sales_data

LIMIT 1


```

### to\_unixtime

Converts a timestamp to a Unix epoch (seconds).

```

SELECT to_unixtime(timestamp) AS epoch

FROM my_namespace.sales_data

LIMIT 1


```

---

## Encoding functions

### decode

Decodes a string to binary data. Supported encoding: `base64`.

```

SELECT decode('aGVsbG8=', 'base64') AS raw

FROM my_namespace.sales_data

LIMIT 1


```

### encode

Encodes binary data to a string. Supported encoding: `base64`.

```

SELECT encode(CAST('hello' AS BYTEA), 'base64') AS b64

FROM my_namespace.sales_data

LIMIT 1


```

---

## JSON functions

### json\_as\_text

Returns any JSON value as unquoted text.

```

SELECT json_as_text(doc, 'description') AS description

FROM my_namespace.sales_data

LIMIT 5


```

### json\_contains

Returns true if the specified key path exists in the JSON.

```

SELECT customer_id, json_contains(doc, 'email') AS has_email

FROM my_namespace.sales_data

LIMIT 5


```

### json\_get

Extracts a value by key path. Returns a union type — use the typed variants (`json_get_str`, `json_get_int`, etc.) for predictable results.

```

SELECT json_get(doc, 'name') AS name

FROM my_namespace.sales_data

LIMIT 5


```

### json\_get\_array

Returns a JSON array as a list of strings.

```

SELECT json_get_array(doc, 'tags') AS tags

FROM my_namespace.sales_data

LIMIT 5


```

### json\_get\_bool

Returns a boolean value from a JSON column by key path.

```

SELECT json_get_bool(doc, 'active') AS is_active

FROM my_namespace.sales_data

LIMIT 5


```

### json\_get\_float

Returns a float value from a JSON column by key path.

```

SELECT json_get_float(doc, 'price') AS price

FROM my_namespace.sales_data

LIMIT 5


```

### json\_get\_int

Returns an integer value from a JSON column by key path.

```

SELECT json_get_int(doc, 'age') AS age

FROM my_namespace.sales_data

LIMIT 5


```

### json\_get\_json

Returns nested JSON as a raw JSON string.

```

SELECT json_get_json(doc, 'metadata') AS metadata

FROM my_namespace.sales_data

LIMIT 5


```

### json\_get\_str

Returns a string value from a JSON column by key path.

```

SELECT json_get_str(doc, 'name') AS name

FROM my_namespace.sales_data

LIMIT 5


```

### json\_length

Returns the length of a JSON array or object.

```

SELECT json_length(doc, 'items') AS item_count

FROM my_namespace.sales_data

LIMIT 5


```

---

## Math functions

### abs

Returns the absolute value of a number.

```

SELECT abs(total_amount - 500) AS distance_from_500

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 5


```

### cbrt

Returns the cube root of a number.

```

SELECT cbrt(CAST(quantity AS DOUBLE)) AS cbrt_qty

FROM my_namespace.sales_data

WHERE quantity IS NOT NULL

LIMIT 5


```

### ceil

Returns the smallest integer greater than or equal to a number.

```

SELECT ceil(total_amount) AS rounded_up

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 5


```

### degrees

Converts radians to degrees.

```

SELECT degrees(pi()) AS full_circle

FROM my_namespace.sales_data

LIMIT 1


```

### exp

Returns _e_ raised to the given power.

```

SELECT exp(total_amount / 1000.0) AS exp_val

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 5


```

### factorial

Returns the factorial of a non-negative integer.

```

SELECT factorial(5) AS fact5

FROM my_namespace.sales_data

LIMIT 1


```

### floor

Returns the largest integer less than or equal to a number.

```

SELECT floor(total_amount) AS rounded_down

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 5


```

### gcd

Returns the greatest common divisor of two integers.

```

SELECT gcd(12, 8) AS gcd_val

FROM my_namespace.sales_data

LIMIT 1


```

### Hyperbolic functions

`sinh`, `cosh`, `tanh`, `asinh`, `acosh`, `atanh`

```

SELECT sinh(1.0) AS sh, cosh(1.0) AS ch, tanh(1.0) AS th

FROM my_namespace.sales_data

LIMIT 1


```

### isnan

Returns true if the value is NaN.

```

SELECT isnan(0.0 / 0.0) AS is_nan

FROM my_namespace.sales_data

LIMIT 1


```

### iszero

Returns true if the value is zero.

```

SELECT iszero(0.0) AS is_zero

FROM my_namespace.sales_data

LIMIT 1


```

### lcm

Returns the least common multiple of two integers.

```

SELECT lcm(4, 6) AS lcm_val

FROM my_namespace.sales_data

LIMIT 1


```

### ln

Returns the natural logarithm.

```

SELECT ln(total_amount) AS ln_val

FROM my_namespace.sales_data

WHERE total_amount > 0

LIMIT 5


```

### log

Returns the logarithm of a value for a given base.

```

SELECT log(10.0, total_amount) AS log10_val

FROM my_namespace.sales_data

WHERE total_amount > 0

LIMIT 5


```

### log2

Returns the base-2 logarithm.

```

SELECT log2(total_amount) AS log2_val

FROM my_namespace.sales_data

WHERE total_amount > 0

LIMIT 5


```

### log10

Returns the base-10 logarithm.

```

SELECT log10(total_amount) AS log10_val

FROM my_namespace.sales_data

WHERE total_amount > 0

LIMIT 5


```

### nanvl

Returns the first argument if it is not NaN, otherwise returns the second.

```

SELECT nanvl(0.0 / 0.0, -1.0) AS safe_val

FROM my_namespace.sales_data

LIMIT 1


```

### pi

Returns the value of pi.

```

SELECT pi() AS pi_val

FROM my_namespace.sales_data

LIMIT 1


```

### power

Raises a number to a power. Alias: `pow`.

```

SELECT power(total_amount, 2.0) AS amount_squared

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 5


```

### radians

Converts degrees to radians.

```

SELECT radians(180.0) AS pi_val

FROM my_namespace.sales_data

LIMIT 1


```

### random

Returns a random float between 0 and 1.

```

SELECT random() AS rnd

FROM my_namespace.sales_data

LIMIT 1


```

### round

Rounds a number to a specified number of decimal places.

```

SELECT round(total_amount, 2) AS rounded

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 5


```

### signum

Returns the sign of a number: -1, 0, or 1.

```

SELECT signum(total_amount - 500) AS sign_val

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 5


```

### sqrt

Returns the square root of a number.

```

SELECT sqrt(CAST(quantity AS DOUBLE)) AS sqrt_qty

FROM my_namespace.sales_data

WHERE quantity IS NOT NULL

LIMIT 5


```

### Trigonometric functions

`sin`, `cos`, `tan`, `asin`, `acos`, `atan`, `atan2`, `cot`

```

SELECT sin(1.0) AS s, cos(1.0) AS c, tan(1.0) AS t,

       asin(0.5) AS as_val, acos(0.5) AS ac_val, atan(1.0) AS at_val

FROM my_namespace.sales_data

LIMIT 1


```

### trunc

Truncates a number to a specified number of decimal places.

```

SELECT trunc(total_amount, 0) AS truncated

FROM my_namespace.sales_data

WHERE total_amount IS NOT NULL

LIMIT 5


```

---

## Regex functions

### regexp\_count

Returns the number of matches of a pattern in a string.

```

SELECT department, regexp_count(department, '[aeiou]') AS vowels

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### regexp\_instr

Returns the position of the first match of a pattern.

```

SELECT department, regexp_instr(department, '[0-9]') AS digit_pos

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### regexp\_like

Returns true if a string matches a regular expression pattern.

```

SELECT department, regexp_like(department, '^[A-Z]{2}') AS starts_two_caps

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### regexp\_match

Returns the first match of a pattern as an array.

```

SELECT department, regexp_match(department, '([A-Z][a-z]+)') AS first_word

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 3


```

### regexp\_replace

Replaces matches of a pattern with a replacement string.

```

SELECT department, regexp_replace(department, '[0-9]', '#') AS no_digits

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

---

## String functions

### ascii

Returns the ASCII code of the first character.

```

SELECT customer_id, ascii(customer_id) AS first_code

FROM my_namespace.sales_data

LIMIT 3


```

### bit\_length

Returns the length of a string in bits.

```

SELECT customer_id, bit_length(customer_id) AS bits

FROM my_namespace.sales_data

LIMIT 3


```

### btrim

Trims characters from both sides of a string. Alias: `trim`.

```

SELECT btrim('  hello  ') AS trimmed

FROM my_namespace.sales_data

LIMIT 1


```

### chr

Returns the character for a given ASCII code.

```

SELECT chr(65) AS letter

FROM my_namespace.sales_data

LIMIT 1


```

### concat

Concatenates two or more strings.

```

SELECT concat(department, ' - ', region) AS label

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### concat\_ws

Concatenates strings with a separator.

```

SELECT concat_ws('/', region, department) AS path

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### contains

Returns true if a string contains a substring.

```

SELECT customer_id, contains(department, 'Sales') AS is_sales

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### ends\_with

Returns true if a string ends with a suffix.

```

SELECT customer_id, ends_with(department, 'ing') AS ends_ing

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### levenshtein

Returns the Levenshtein edit distance between two strings.

```

SELECT department, levenshtein(department, 'Engineering') AS dist

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### lower

Converts a string to lowercase.

```

SELECT lower(department) AS dept_lower

FROM my_namespace.sales_data

LIMIT 5


```

### ltrim

Trims characters from the left side of a string.

```

SELECT ltrim('  hello') AS trimmed

FROM my_namespace.sales_data

LIMIT 1


```

### octet\_length

Returns the length of a string in bytes.

```

SELECT customer_id, octet_length(customer_id) AS bytes

FROM my_namespace.sales_data

LIMIT 3


```

### repeat

Repeats a string a given number of times.

```

SELECT repeat(region, 2) AS doubled

FROM my_namespace.sales_data

LIMIT 3


```

### replace

Replaces all occurrences of a substring.

```

SELECT department, replace(department, ' ', '_') AS underscored

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### rtrim

Trims characters from the right side of a string.

```

SELECT rtrim('hello  ') AS trimmed

FROM my_namespace.sales_data

LIMIT 1


```

### split\_part

Splits a string by a delimiter and returns the specified part (1-indexed).

```

SELECT customer_id, split_part(customer_id, '-', 1) AS first_part

FROM my_namespace.sales_data

WHERE customer_id IS NOT NULL

LIMIT 5


```

### starts\_with

Returns true if a string starts with a prefix.

```

SELECT customer_id, starts_with(department, 'Eng') AS is_eng

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### to\_hex

Converts an integer to a hexadecimal string.

```

SELECT to_hex(255) AS hex_ff

FROM my_namespace.sales_data

LIMIT 1


```

### upper

Converts a string to uppercase.

```

SELECT upper(region) AS region_upper

FROM my_namespace.sales_data

LIMIT 5


```

### uuid

Generates a random UUID.

```

SELECT uuid() AS new_id

FROM my_namespace.sales_data

LIMIT 1


```

---

## Unicode functions

### character\_length

Returns the number of characters in a string. Aliases: `length`, `char_length`.

```

SELECT department, character_length(department) AS len

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### find\_in\_set

Returns the position of a string within a comma-separated list.

```

SELECT find_in_set('North', 'South,North,East,West') AS pos

FROM my_namespace.sales_data

LIMIT 1


```

### initcap

Capitalizes the first letter of each word.

```

SELECT initcap('hello world') AS capped

FROM my_namespace.sales_data

LIMIT 1


```

### left

Returns the leftmost _n_ characters of a string.

```

SELECT department, left(department, 5) AS prefix

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### lpad

Left-pads a string to a specified length.

```

SELECT region, lpad(region, 15, '.') AS padded

FROM my_namespace.sales_data

LIMIT 5


```

### reverse

Reverses a string.

```

SELECT department, reverse(department) AS rev

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### right

Returns the rightmost _n_ characters of a string.

```

SELECT department, right(department, 3) AS suffix

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### rpad

Right-pads a string to a specified length.

```

SELECT region, rpad(region, 15, '.') AS padded

FROM my_namespace.sales_data

LIMIT 5


```

### strpos

Returns the position of a substring (1-indexed). Aliases: `instr`, `position`.

```

SELECT department, strpos(department, 'a') AS a_pos

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### substr

Returns a substring starting at a position for a given length. Alias: `substring`.

```

SELECT department, substr(department, 1, 8) AS first_eight

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

### substr\_index

Returns the substring before the _n_\-th occurrence of a delimiter. Alias: `substring_index`.

```

SELECT customer_id, substr_index(customer_id, '-', 1) AS first_segment

FROM my_namespace.sales_data

WHERE customer_id IS NOT NULL

LIMIT 5


```

### translate

Replaces characters in a string based on a mapping.

```

SELECT department, translate(department, 'aeiou', '12345') AS coded

FROM my_namespace.sales_data

WHERE department IS NOT NULL

LIMIT 5


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/sql-reference/","name":"SQL reference"}},{"@type":"ListItem","position":4,"item":{"@id":"/r2-sql/sql-reference/scalar-functions/","name":"Scalar functions"}}]}
```

---

---
title: Troubleshooting guide
description: Troubleshoot common R2 SQL errors including query structure, type, and timeout issues.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Troubleshooting guide

This guide covers potential errors and limitations you may encounter when using R2 SQL. R2 SQL is in open beta, and supported functionality will evolve and change over time.

## Query structure errors

### Missing required clauses

**Error**: `expected exactly 1 table in FROM clause`

**Problem**: R2 SQL requires a `FROM` clause in your query.

```

-- Invalid - Missing FROM clause

SELECT user_id WHERE status = 200;


-- Valid

SELECT user_id

FROM my_namespace.http_requests

WHERE status = 200 AND timestamp BETWEEN '2025-09-24T01:00:00Z' AND '2025-09-25T01:00:00Z';


```

**Solution**: Always include `FROM` with a fully qualified table name (`namespace_name.table_name`).

---

## FROM clause issues

### Join performance issues

**Symptom**: Query returns 502 Bad Gateway or times out.

**Problem**: Multi-way joins across large tables can exceed resource limits, especially with `COUNT(DISTINCT)` or other memory-intensive aggregations.

```

-- May timeout: cross-joining two large fact tables

SELECT COUNT(DISTINCT h.ray_id), COUNT(DISTINCT f.event_id)

FROM my_namespace.http_requests h

INNER JOIN my_namespace.firewall_events f ON h.zone_id = f.zone_id


```

**Solution**:

* Add `WHERE` filters to reduce intermediate result sizes.
* Join through dimension tables instead of directly joining fact tables.
* Use `approx_distinct()` instead of `COUNT(DISTINCT)` for approximate counts.
* Break complex multi-way joins into smaller queries using CTEs or sequential queries.

```

-- Better: filter both sides and use approx_distinct

SELECT z.plan,

       approx_distinct(h.ray_id) AS unique_requests

FROM my_namespace.zones z

INNER JOIN my_namespace.http_requests h ON z.zone_id = h.zone_id

WHERE z.plan = 'enterprise'

  AND h.status_code >= 400

GROUP BY z.plan


```

### `NOT IN` on nullable columns

**Symptom**: `NOT IN` subquery returns unexpected results or errors.

**Problem**: `NOT IN` subqueries are not supported when the subquery column can contain `NULL` values.

```

-- Fails: nullable_col may contain NULLs

SELECT zone_id

FROM my_namespace.http_requests

WHERE zone_id NOT IN (

    SELECT nullable_col FROM my_namespace.other_table

)

LIMIT 20


```

**Solution**: Use `NOT EXISTS` with a correlated subquery instead.

```

-- Works: NOT EXISTS handles NULLs correctly

SELECT h.zone_id

FROM my_namespace.http_requests h

WHERE NOT EXISTS (

    SELECT 1 FROM my_namespace.other_table o

    WHERE o.nullable_col = h.zone_id

)

LIMIT 20


```

### Correlated subquery performance

**Symptom**: `EXISTS` or `NOT EXISTS` subquery runs slowly.

**Problem**: Correlated subqueries with complex conditions can be slow because the inner query is evaluated for each row of the outer query.

```

-- Slower: multiple filter conditions in correlated subquery

SELECT z.domain

FROM my_namespace.zones z

WHERE EXISTS (

    SELECT 1 FROM my_namespace.firewall_events f

    WHERE f.zone_id = z.zone_id

      AND f.risk_score > 0.9

      AND f.colo = 'SJC'

)

LIMIT 20


```

**Solution**:

* Simplify correlated conditions where possible.
* Consider rewriting as a `JOIN` with `GROUP BY` instead of `EXISTS`.
* Use an `IN` subquery with pre-aggregated results instead of `EXISTS`.

---

## WHERE clause issues

### JSON object filtering

**Error**: `unsupported binary operator` or `Error during planning: could not parse compound`

**Problem**: JSON functions are not yet implemented. You cannot filter on fields inside JSON objects using JSON path operators.

```

-- Invalid - JSON path operators not supported

SELECT * FROM my_namespace.requests WHERE json_data->>'level' = 'error'


-- Valid - Filter on the entire JSON column

SELECT * FROM my_namespace.logs WHERE json_data IS NOT NULL LIMIT 100


```

**Solution**:

* Denormalize frequently queried JSON fields into separate columns.
* Filter on the entire JSON field, and handle parsing in your application.

Note

Struct columns are supported and can be filtered using bracket notation. Refer to [Complex types](https://developers.cloudflare.com/r2-sql/sql-reference/complex-types/) for details.

```

SELECT * FROM my_namespace.products WHERE pricing['price'] > 50 LIMIT 100


```

---

## LIMIT clause issues

### Invalid limit values

**Error**: `maximum LIMIT is 10000`

**Problem**: LIMIT values must be between 1 and 10,000.

```

-- Invalid - Out of range

SELECT * FROM my_namespace.events LIMIT 50000


-- Valid

SELECT * FROM my_namespace.events LIMIT 10000


```

**Solution**: Use LIMIT values between 1 and 10,000.

### Pagination attempts

**Error**: `unsupported feature: OFFSET clause is not supported`

**Problem**: OFFSET is not supported.

```

-- Invalid - Pagination not supported

SELECT * FROM my_namespace.events LIMIT 100 OFFSET 200


-- Valid - Use cursor-based pagination with ORDER BY and WHERE

-- Page 1

SELECT * FROM my_namespace.events

WHERE timestamp >= '2024-01-01'

ORDER BY timestamp

LIMIT 100


-- Page 2 - Use the last timestamp from the previous page

SELECT * FROM my_namespace.events

WHERE timestamp > '2024-01-01T10:30:00Z'

ORDER BY timestamp

LIMIT 100


```

**Solution**: Implement cursor-based pagination using `ORDER BY` and `WHERE` conditions.

---

## Schema issues

### DDL and DML operations

**Error**: `only read-only queries are allowed`

**Problem**: R2 SQL is a read-only query engine. DDL and DML statements are not supported.

```

-- Invalid - Schema changes not supported

ALTER TABLE my_namespace.events ADD COLUMN new_field STRING

UPDATE my_namespace.events SET status = 200 WHERE user_id = '123'

CREATE TABLE my_namespace.test (id INT)

DROP TABLE my_namespace.events


```

**Solution**: Manage your schema through your data ingestion pipeline and R2 Data Catalog.

---

## Performance optimization

### Query performance issues

If your queries are running slowly:

1. **Always include partition (timestamp) filters**: This is the most important optimization.  
```  
-- Good - Narrows data scan to one day  
SELECT * FROM my_namespace.events  
WHERE timestamp BETWEEN '2024-01-01' AND '2024-01-02'  
LIMIT 100  
```
2. **Use selective filtering**: Include specific conditions to reduce result sets.  
```  
-- Good - Multiple filters reduce scanned data  
SELECT * FROM my_namespace.events  
WHERE status = 200 AND region = 'US' AND timestamp > '2024-01-01'  
LIMIT 100  
```
3. **Select specific columns**: Avoid `SELECT *` when you only need a few fields.  
```  
-- Good - Only reads the columns you need  
SELECT user_id, status, timestamp  
FROM my_namespace.events  
WHERE timestamp > '2024-01-01'  
LIMIT 100  
```
4. **Use EXPLAIN to inspect the execution plan**: Verify that predicate pushdown and file pruning are working.  
```  
EXPLAIN SELECT user_id, status  
FROM my_namespace.events  
WHERE timestamp > '2024-01-01' AND status = 200  
```
5. **Enable compaction**: Enable compaction in R2 Data Catalog to reduce the number of small files scanned per query.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/troubleshooting/","name":"Troubleshooting guide"}}]}
```

---

---
title: Tutorials
description: Step-by-step R2 SQL tutorials for querying data stored in R2.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Tutorials

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/tutorials/","name":"Tutorials"}}]}
```

---

---
title: Build an end to end data pipeline
description: This tutorial demonstrates how to build a complete data pipeline using Cloudflare Pipelines, R2 Data Catalog, and R2 SQL.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Build an end to end data pipeline

**Last reviewed:**  9 months ago 

Learn how to create an end-to-end data pipeline using Cloudflare Pipelines, R2 Data Catalog, and R2 SQL for real-time transaction analysis.

In this tutorial, you will learn how to build a complete data pipeline using Cloudflare Pipelines, R2 Data Catalog, and R2 SQL. This also includes a sample Python script that creates and sends financial transaction data to your Pipeline that can be queried by R2 SQL or any Apache Iceberg-compatible query engine.

This tutorial demonstrates how to:

* Set up R2 Data Catalog to store our transaction events in an Apache Iceberg table
* Set up a Cloudflare Pipeline
* Create transaction data with fraud patterns to send to your Pipeline
* Query your data using R2 SQL for fraud analysis

## Prerequisites

1. Sign up for a [Cloudflare account ↗](https://dash.cloudflare.com/sign-up).
2. Install [Node.js ↗](https://nodejs.org/en/).
3. Install [Python 3.8+ ↗](https://python.org) for the data generation script.

Node.js version manager

Use a Node version manager like [Volta ↗](https://volta.sh/) or [nvm ↗](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions.

Wrangler requires a Node version of 16.17.0 or later.

## 1\. Set up authentication

You will need API tokens to interact with Cloudflare services.

1. In the Cloudflare dashboard, go to the **API tokens** page.  
[ Go to **Account API tokens** ](https://dash.cloudflare.com/?to=/:account/api-tokens)
2. Select **Create Token**.
3. Select **Get started** next to Create Custom Token.
4. Enter a name for your API token.
5. Under **Permissions**, choose:  
   * **Workers Pipelines** with Read, Send, and Edit permissions  
   * **Workers R2 Data Catalog** with Read and Edit permissions  
   * **Workers R2 SQL** with Read permissions  
   * **Workers R2 Storage** with Read and Edit permissions
6. Optionally, add a TTL to this token.
7. Select **Continue to summary**.
8. Click **Create Token**
9. Note the **Token value**.

Export your new token as an environment variable:

Terminal window

```

export WRANGLER_R2_SQL_AUTH_TOKEN= #paste your token here


```

If this is your first time using Wrangler, make sure to log in.

Terminal window

```

npx wrangler login


```

## 2\. Create an R2 bucket and enable R2 Data Catalog

* [ Wrangler CLI ](#tab-panel-9561)
* [ Dashboard ](#tab-panel-9562)

Create an R2 bucket:

Terminal window

```

npx wrangler r2 bucket create fraud-pipeline


```

1. In the Cloudflare dashboard, go to the **R2 object storage** page.  
[ Go to **Overview** ](https://dash.cloudflare.com/?to=/:account/r2/overview)
2. Select **Create bucket**.
3. Enter the bucket name: `fraud-pipeline`
4. Select **Create bucket**.

Enable the catalog on your R2 bucket:

* [ Wrangler CLI ](#tab-panel-9563)
* [ Dashboard ](#tab-panel-9564)

Terminal window

```

npx wrangler r2 bucket catalog enable fraud-pipeline


```

When you run this command, take note of the "Warehouse" and "Catalog URI". You will need these later.

1. In the Cloudflare dashboard, go to the **R2 object storage** page.  
[ Go to **Overview** ](https://dash.cloudflare.com/?to=/:account/r2/overview)
2. Select the bucket: `fraud-pipeline`.
3. Switch to the **Settings** tab, scroll down to **R2 Data Catalog**, and select **Enable**.
4. Once enabled, note the **Catalog URI** and **Warehouse name**.

Note

Copy the `warehouse` (ACCOUNTID\_BUCKETNAME) and paste it in the `export` below. We will use it later in the tutorial.

Terminal window

```

export WAREHOUSE= #Paste your warehouse here


```

### (Optional) Enable compaction on your R2 Data Catalog

R2 Data Catalog can automatically compact tables for you. In production event streaming use cases, it is common to end up with many small files, so it is recommended to enable compaction. Since the tutorial only demonstrates a sample use case, this step is optional.

* [ Wrangler CLI ](#tab-panel-9565)
* [ Dashboard ](#tab-panel-9566)

Terminal window

```

npx wrangler r2 bucket catalog compaction enable fraud-pipeline --token $WRANGLER_R2_SQL_AUTH_TOKEN


```

1. In the Cloudflare dashboard, go to the **R2 object storage** page.  
[ Go to **Overview** ](https://dash.cloudflare.com/?to=/:account/r2/overview)
2. Select the bucket: `fraud-pipeline`.
3. Switch to the **Settings** tab, scroll down to **R2 Data Catalog**, click on edit icon, and select **Enable**.
4. You can choose a target file size or leave the default. Click save.

## 3\. Set up the pipeline infrastructure

### 3.1\. Create the Pipeline stream

* [ Wrangler CLI ](#tab-panel-9567)
* [ Dashboard ](#tab-panel-9568)

First, create a schema file called `raw_transactions_schema.json` with the following `json` schema:

```

{

  "fields": [

    { "name": "transaction_id", "type": "string", "required": true },

    { "name": "user_id", "type": "int64", "required": true },

    { "name": "amount", "type": "float64", "required": false },

    { "name": "transaction_timestamp", "type": "string", "required": false },

    { "name": "location", "type": "string", "required": false },

    { "name": "merchant_category", "type": "string", "required": false },

    { "name": "is_fraud", "type": "bool", "required": false }

  ]

}


```

Create a stream to receive incoming fraud detection events:

Terminal window

```

npx wrangler pipelines streams create raw_events_stream \

  --schema-file raw_transactions_schema.json \

  --http-enabled true \

  --http-auth false


```

Note

Note the **HTTP Ingest Endpoint URL** from the output. This is the endpoint you will use to send data to your pipeline.

Terminal window

```

# The http ingest endpoint from the output (see example below)

export STREAM_ENDPOINT= #the http ingest endpoint from the output (see example below)


```

The output should look like this:

Terminal window

```

🌀 Creating stream 'raw_events_stream'...

✨ Successfully created stream 'raw_events_stream' with id 'stream_id'.


Creation Summary:

General:

  Name:  raw_events_stream


HTTP Ingest:

  Enabled:         Yes

  Authentication:  Yes

  Endpoint:        https://stream_id.ingest.cloudflare.com

  CORS Origins:    None


Input Schema:

┌───────────────────────┬────────┬────────────┬──────────┐

│ Field Name            │ Type   │ Unit/Items │ Required │

├───────────────────────┼────────┼────────────┼──────────┤

│ transaction_id        │ string │            │ Yes      │

├───────────────────────┼────────┼────────────┼──────────┤

│ user_id               │ int64  │            │ Yes      │

├───────────────────────┼────────┼────────────┼──────────┤

│ amount                │float64 │            │ No       │

├───────────────────────┼────────┼────────────┼──────────┤

│ transaction_timestamp │ string │            │ No       │

├───────────────────────┼────────┼────────────┼──────────┤

│ location              │ string │            │ No       │

├───────────────────────┼────────┼────────────┼──────────┤

│ merchant_category     │ string │            │ No       │

├───────────────────────┼────────┼────────────┼──────────┤

│ is_fraud              │ bool   │            │ No       │

└───────────────────────┴────────┴────────────┴──────────┘


```

### 3.2\. Create the data sink

Create a sink that writes data to your R2 bucket as Apache Iceberg tables:

Terminal window

```

npx wrangler pipelines sinks create raw_events_sink \

  --type "r2-data-catalog" \

  --bucket "fraud-pipeline" \

  --roll-interval 30 \

  --namespace "fraud_detection" \

  --table "transactions" \

  --catalog-token $WRANGLER_R2_SQL_AUTH_TOKEN


```

Note

This creates a `sink` configuration that will write to the Iceberg table `fraud_detection.transactions` in your R2 Data Catalog every 30 seconds. Pipelines automatically appends an `__ingest_ts` column that is used to partition the table by `DAY`.

### 3.3\. Create the pipeline

Connect your stream to your sink with SQL:

Terminal window

```

npx wrangler pipelines create raw_events_pipeline \

  --sql "INSERT INTO raw_events_sink SELECT * FROM raw_events_stream"


```

1. In the Cloudflare dashboard, go to **Pipelines** \> **Pipelines**.  
[ Go to **Pipelines** ](https://dash.cloudflare.com/?to=/:account/pipelines/overview)
2. Select **Create Pipeline**.
3. **Connect to a Stream**:  
   * Pipeline name: `raw_events`  
   * Enable HTTP endpoint for sending data: Enabled  
   * HTTP authentication: Disabled (default)  
   * Select **Next**
4. **Define Input Schema**:  
   * Select **JSON editor**  
   * Copy in the schema:  
   ```  
   {  
     "fields": [  
       { "name": "transaction_id", "type": "string", "required": true },  
       { "name": "user_id", "type": "int64", "required": true },  
       { "name": "amount", "type": "float64", "required": false },  
       {  
         "name": "transaction_timestamp",  
         "type": "string",  
         "required": false  
       },  
       { "name": "location", "type": "string", "required": false },  
       { "name": "merchant_category", "type": "string", "required": false },  
       { "name": "is_fraud", "type": "bool", "required": false }  
     ]  
   }  
   ```  
   * Select **Next**
5. **Define Sink**:  
   * Select your R2 bucket: `fraud-pipeline`  
   * Storage type: **R2 Data Catalog**  
   * Namespace: `fraud_detection`  
   * Table name: `transactions`  
   * **Advanced Settings**: Change **Maximum Time Interval** to `30 seconds`  
   * Select **Next**
6. **Credentials**:  
   * Disable **Automatically create an Account API token for your sink**  
   * Enter **Catalog Token** from step 1  
   * Select **Next**
7. **Pipeline Definition**:  
   * Leave the default SQL query:  
   ```  
   INSERT INTO raw_events_sink SELECT * FROM raw_events_stream;  
   ```  
   * Select **Create Pipeline**
8. After pipeline creation, note the **Stream ID** for the next step.

## 4\. Generate sample fraud detection data

Create a Python script to generate realistic transaction data with fraud patterns:

fraud\_data\_generator.py

```

import requests

import json

import uuid

import random

import time

import os

from datetime import datetime, timezone, timedelta


# Configuration - exported from the prior steps

STREAM_ENDPOINT = os.environ["STREAM_ENDPOINT"]# From the stream you created

API_TOKEN = os.environ["WRANGLER_R2_SQL_AUTH_TOKEN"] #the same one created earlier

EVENTS_TO_SEND = 1000 # Feel free to adjust this


def generate_transaction():

    """Generate some random transactions with occasional fraud"""


    # User IDs

    high_risk_users = [1001, 1002, 1003, 1004, 1005]

    normal_users = list(range(1006, 2000))


    user_id = random.choice(high_risk_users + normal_users)

    is_high_risk_user = user_id in high_risk_users


    # Generate amounts

    if random.random() < 0.05:

        amount = round(random.uniform(5000, 50000), 2)

    elif random.random() < 0.03:

        amount = round(random.uniform(0.01, 1.00), 2)

    else:

        amount = round(random.uniform(10, 500), 2)


    # Locations

    normal_locations = ["NEW_YORK", "LOS_ANGELES", "CHICAGO", "MIAMI", "SEATTLE", "SAN FRANCISCO"]

    high_risk_locations = ["UNKNOWN_LOCATION", "VPN_EXIT", "MARS", "BAT_CAVE"]


    if is_high_risk_user and random.random() < 0.3:

        location = random.choice(high_risk_locations)

    else:

        location = random.choice(normal_locations)


    # Merchant categories

    normal_merchants = ["GROCERY", "GAS_STATION", "RESTAURANT", "RETAIL"]

    high_risk_merchants = ["GAMBLING", "CRYPTO", "MONEY_TRANSFER", "GIFT_CARDS"]


    if random.random() < 0.1:  # 10% high-risk merchants

        merchant_category = random.choice(high_risk_merchants)

    else:

        merchant_category = random.choice(normal_merchants)


    # Series of checks to either increase fraud score by a certain margin

    fraud_score = 0

    if amount > 2000: fraud_score += 0.4

    if amount < 1: fraud_score += 0.3

    if location in high_risk_locations: fraud_score += 0.5

    if merchant_category in high_risk_merchants: fraud_score += 0.3

    if is_high_risk_user: fraud_score += 0.2


    # Compare the fraud scores

    is_fraud = random.random() < min(fraud_score * 0.3, 0.8)


    # Generate timestamps (some fraud happens at unusual hours)

    base_time = datetime.now(timezone.utc)

    if is_fraud and random.random() < 0.4:  # 40% of fraud at night

        hour = random.randint(0, 5)  # Late night/early morning

        transaction_time = base_time.replace(hour=hour)

    else:

        transaction_time = base_time - timedelta(

            hours=random.randint(0, 168)  # Last week

        )


    return {

        "transaction_id": str(uuid.uuid4()),

        "user_id": user_id,

        "amount": amount,

        "transaction_timestamp": transaction_time.isoformat(),

        "location": location,

        "merchant_category": merchant_category,

        "is_fraud": True if is_fraud else False

    }


def send_batch_to_stream(events, batch_size=100):

    """Send events to Cloudflare Stream in batches"""


    headers = {

        "Authorization": f"Bearer {API_TOKEN}",

        "Content-Type": "application/json"

    }


    total_sent = 0

    fraud_count = 0


    for i in range(0, len(events), batch_size):

        batch = events[i:i + batch_size]

        fraud_in_batch = sum(1 for event in batch if event["is_fraud"] == True)


        try:

            response = requests.post(STREAM_ENDPOINT, headers=headers, json=batch)


            if response.status_code in [200, 201]:

                total_sent += len(batch)

                fraud_count += fraud_in_batch

                print(f"Sent batch of {len(batch)} events (Total: {total_sent})")

            else:

                print(f"Failed to send batch: {response.status_code} - {response.text}")


        except Exception as e:

            print(f"Error sending batch: {e}")


        time.sleep(0.1)


    return total_sent, fraud_count


def main():

    print("Generating fraud detection data...")


    # Generate events

    events = []

    for i in range(EVENTS_TO_SEND):

        events.append(generate_transaction())

        if (i + 1) % 100 == 0:

            print(f"Generated {i + 1} events...")


    fraud_events = sum(1 for event in events if event["is_fraud"] == True)

    print(f"📊 Generated {len(events)} total events ({fraud_events} fraud, {fraud_events/len(events)*100:.1f}%)")


    # Send to stream

    print("Sending data to Pipeline stream...")

    sent, fraud_sent = send_batch_to_stream(events)


    print(f"\nComplete!")

    print(f"   Events sent: {sent:,}")

    print(f"   Fraud events: {fraud_sent:,} ({fraud_sent/sent*100:.1f}%)")

    print(f"   Data is now flowing through your pipeline!")


if __name__ == "__main__":

    main()


```

Install the required Python dependency and run the script:

Terminal window

```

pip install requests

python fraud_data_generator.py


```

## 5\. Query the data with R2 SQL

Now you can analyze your fraud detection data using R2 SQL. Here are some example queries:

### 5.1\. View recent transactions

Terminal window

```

npx wrangler r2 sql query "$WAREHOUSE" "

SELECT

    transaction_id,

    user_id,

    amount,

    location,

    merchant_category,

    is_fraud,

    transaction_timestamp

FROM fraud_detection.transactions

WHERE __ingest_ts > '2025-09-24T01:00:00Z'

AND is_fraud = true

LIMIT 10"


```

### 5.2\. Filter the raw transactions into a new table to highlight high-value transactions

Create a new sink that will write the filtered data to a new Apache Iceberg table in R2 Data Catalog:

Terminal window

```

npx wrangler pipelines sinks create fraud_filter_sink \

  --type "r2-data-catalog" \

  --bucket "fraud-pipeline" \

  --roll-interval 30 \

  --namespace "fraud_detection" \

  --table "fraud_transactions" \

  --catalog-token $WRANGLER_R2_SQL_AUTH_TOKEN


```

Now you will create a new SQL query to process data from the original `raw_events_stream` stream and only write flagged transactions that are over the `amount` of 1,000.

Terminal window

```

npx wrangler pipelines create fraud_events_pipeline \

  --sql "INSERT INTO fraud_filter_sink SELECT * FROM raw_events_stream WHERE is_fraud=true and amount > 1000"


```

Note

It may take a few minutes for the new Pipeline to fully Initialize and start processing the data. Also keep in mind the 30 second `roll-interval`.

Query the table and check the results:

Terminal window

```

npx wrangler r2 sql query "$WAREHOUSE" "

SELECT

    transaction_id,

    user_id,

    amount,

    location,

    merchant_category,

    is_fraud,

    transaction_timestamp

FROM fraud_detection.fraud_transactions

LIMIT 10"


```

Also verify that the non-fraudulent events are being filtered out:

Terminal window

```

npx wrangler r2 sql query "$WAREHOUSE" "

SELECT

    transaction_id,

    user_id,

    amount,

    location,

    merchant_category,

    is_fraud,

    transaction_timestamp

FROM fraud_detection.fraud_transactions

WHERE is_fraud = false

LIMIT 10"


```

You should see the following output:

```

Query executed successfully with no results


```

## Conclusion

You have successfully built an end to end data pipeline using Cloudflare's data platform. Through this tutorial, you have learned to:

1. **Use R2 Data Catalog**: Leveraged Apache Iceberg tables for efficient data storage
2. **Set up Cloudflare Pipelines**: Created streams, sinks, and pipelines for data ingestion
3. **Generated sample data**: Created transaction data with some basic fraud patterns
4. **Query your tables with R2 SQL**: Access raw and processed data tables stored in R2 Data Catalog

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":4,"item":{"@id":"/r2-sql/tutorials/end-to-end-pipeline/","name":"Build an end to end data pipeline"}}]}
```

---

---
title: Platform
description: R2 SQL platform details including pricing and configuration.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Platform

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/platform/","name":"Platform"}}]}
```

---

---
title: Pricing
description: R2 SQL pricing based on data scanned, with free tier details and billing examples.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Pricing

R2 SQL charges based on a single dimension:

* **Data scanned**: The volume of compressed data read from R2 to execute your query.

R2 SQL pricing is additive to standard [R2 storage and operations](https://developers.cloudflare.com/r2/pricing/) and [R2 Data Catalog](https://developers.cloudflare.com/r2/data-catalog/platform/pricing/) charges. When the query engine reads files, those requests count as R2 Class B (read) operations. R2 does not charge for egress, so there is no additional data transfer cost.

All included usage is on a monthly basis.

## R2 SQL pricing

| Free plan    | Paid plan    |                           |
| ------------ | ------------ | ------------------------- |
| Included     | 1 GB / month | 10 GB / month             |
| Data scanned | N/A          | $0.0025 / GB ($2.50 / TB) |

### What counts as data scanned

Data scanned is the compressed bytes read from R2 object storage to answer your query. This matches what you see in your R2 bucket — if a Parquet file is 100 MB on disk, scanning that file bills for 100 MB.

* **Minimum per query**: 10 MB. If a query scans less than 10 MB, you are billed for 10 MB.
* **Failed queries**: Queries that fail due to a system error or syntax error caught before execution are not charged. Queries that fail mid-execution due to a runtime error are also not charged.
* **Metadata-only operations**: Operations such as `EXPLAIN`, `SHOW`, and `DESCRIBE` do not scan data and are free. Standard R2 and R2 Data Catalog request charges still apply.

## Billing examples

### Example 1: Ad-hoc analytics on 500 GB of Parquet data

A user stores 500 GB of Parquet data in R2 Data Catalog and runs queries that scan a total of 50 GB of compressed data during the month.

| Dimension             | Usage        | Included    | Billable     | Cost  |
| --------------------- | ------------ | ----------- | ------------ | ----- |
| R2 storage            | 500 GB-month | 10 GB-month | 490 GB-month | $7.35 |
| R2 SQL (data scanned) | 50 GB        | 10 GB       | 40 GB        | $0.10 |
| **Total**             | **$7.45**    |             |              |       |

### Example 2: Heavy query workload on 10 TB dataset

A data team stores 10 TB of compressed Parquet/Iceberg data and scans 50 TB of data per month across their queries. The team also makes 2 million catalog operations with compaction processing 500 GB.

| Dimension                         | Usage           | Included    | Billable       | Cost    |
| --------------------------------- | --------------- | ----------- | -------------- | ------- |
| R2 storage                        | 10,000 GB-month | 10 GB-month | 9,990 GB-month | $149.85 |
| R2 SQL (data scanned)             | 50,000 GB       | 10 GB       | 49,990 GB      | $124.98 |
| R2 Data Catalog operations        | 2,000,000       | 1,000,000   | 1,000,000      | $9.00   |
| R2 Data Catalog compaction (data) | 500 GB          | 10 GB       | 490 GB         | $2.45   |
| **Total**                         | **$286.28**     |             |                |         |

## Frequently asked questions

### Is there a minimum billing increment per query?

Yes. Each query is billed for a minimum of 10 MB of data scanned. This covers the overhead of initializing the query engine.

### Does data scanned include R2 egress fees?

No. R2 does not charge for egress. The query engine runs within the Cloudflare network adjacent to R2 storage, so there are no data transfer costs.

## Cloudflare billing policy

To learn more about how usage is billed, refer to [Cloudflare Billing Policy](https://developers.cloudflare.com/billing/understand/billing-policy/).

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/platform/","name":"Platform"}},{"@type":"ListItem","position":4,"item":{"@id":"/r2-sql/platform/pricing/","name":"Pricing"}}]}
```

---

---
title: Reference
description: R2 SQL reference documentation for limitations, best practices, and Wrangler commands.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Reference

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/reference/","name":"Reference"}}]}
```

---

---
title: Limitations and best practices
description: Supported features, known limitations, and best practices for R2 SQL queries.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Limitations and best practices

Note

R2 SQL is in open beta. Limitations and best practices will change over time.

This page summarizes supported features, limitations, and best practices.

## Quick reference

| Feature                                                 | Supported | Notes                                                                      |
| ------------------------------------------------------- | --------- | -------------------------------------------------------------------------- |
| SELECT, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT        | Yes       |                                                                            |
| Column aliases (AS)                                     | Yes       |                                                                            |
| Expressions (CASE, CAST, LIKE, BETWEEN, IN, arithmetic) | Yes       | Full expression support                                                    |
| EXPLAIN                                                 | Yes       | Returns execution plan as text or JSON                                     |
| 173 scalar functions                                    | Yes       | Math, string, datetime, regex, crypto, array, map, struct, JSON            |
| 33 aggregate functions                                  | Yes       | Basic, approximate, statistical, bitwise, boolean, positional              |
| Approximate aggregates                                  | Yes       | approx\_distinct, approx\_median, approx\_percentile\_cont, approx\_top\_k |
| Struct / Array / Map column types                       | Yes       | Bracket notation, get\_field(), array functions, map functions             |
| CTEs (WITH ... AS)                                      | Yes       | Can reference different tables and include JOINs                           |
| JOINs (INNER, LEFT, RIGHT, FULL OUTER, CROSS)           | Yes       | All standard join types                                                    |
| Implicit joins (comma FROM)                             | Yes       |                                                                            |
| Subqueries (IN, NOT IN)                                 | Yes       | NOT IN not supported on nullable columns — use NOT EXISTS instead          |
| Subqueries (EXISTS, NOT EXISTS)                         | Yes       | semi-join and anti-join patterns                                           |
| Scalar subqueries                                       | Yes       | In SELECT, WHERE, HAVING                                                   |
| Derived tables (FROM subqueries)                        | Yes       | Can be nested and joined. LATERAL derived tables not supported.            |
| Self-joins                                              | Yes       | Same table with different aliases                                          |
| Window functions (OVER)                                 | No        |                                                                            |
| SELECT DISTINCT                                         | Yes       |                                                                            |
| OFFSET                                                  | No        |                                                                            |
| UNION / INTERSECT / EXCEPT                              | Yes       | UNION ALL also supported                                                   |
| INSERT / UPDATE / DELETE                                | No        | Read-only                                                                  |
| CREATE / DROP / ALTER                                   | No        | Read-only                                                                  |

For the full SQL syntax, refer to the [SQL reference](https://developers.cloudflare.com/r2-sql/sql-reference/).

---

## Unsupported SQL features

| Feature                                                            | Error                                               |
| ------------------------------------------------------------------ | --------------------------------------------------- |
| OFFSET                                                             | unsupported feature: OFFSET clause is not supported |
| Window functions (OVER)                                            | unsupported feature: window functions (OVER clause) |
| INSERT / UPDATE / DELETE                                           | only read-only queries are allowed                  |
| CREATE / DROP / ALTER                                              | only read-only queries are allowed                  |
| UNNEST / PIVOT / UNPIVOT                                           | Not supported                                       |
| Wildcard modifiers (ILIKE, EXCLUDE, EXCEPT, REPLACE, RENAME on \*) | Not supported                                       |
| Nested (parenthesized) joins                                       | Not supported                                       |
| LATERAL derived tables                                             | Not supported                                       |
| LATERAL VIEW / QUALIFY                                             | Not supported                                       |

---

## Unsupported expression patterns

| Pattern                             | Alternative                                                                                                                            |
| ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| func(DISTINCT ...) on any aggregate | Use approx\_distinct for counting                                                                                                      |
| PERCENTILE\_CONT / PERCENTILE\_DISC | Use [approx\_percentile\_cont](https://developers.cloudflare.com/r2-sql/sql-reference/aggregate-functions/#approx%5Fpercentile%5Fcont) |
| MEDIAN                              | Use [approx\_median](https://developers.cloudflare.com/r2-sql/sql-reference/aggregate-functions/#approx%5Fmedian)                      |
| ARRAY\_AGG                          | No alternative (unsupported for memory safety)                                                                                         |
| STRING\_AGG                         | No alternative (unsupported for memory safety)                                                                                         |
| NOT IN subquery on nullable columns | Use NOT EXISTS with a correlated subquery instead                                                                                      |

---

## Runtime constraints

| Constraint                           | Details                                                                                                                                                                                    |
| ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Multi-table queries                  | JOINs, subqueries (IN, EXISTS, scalar, derived tables), and multi-table CTEs are supported. Performance depends on intermediate result size; use WHERE filters to manage join selectivity. |
| Partitioned and unpartitioned tables | Both partitioned and unpartitioned Iceberg tables are supported.                                                                                                                           |
| Parquet format only                  | No CSV, JSON, or other formats.                                                                                                                                                            |
| Read-only                            | R2 SQL is a query engine, not a database. No writes.                                                                                                                                       |
| now() / current\_time() precision    | Quantized to 10ms boundaries and forced to UTC.                                                                                                                                            |

---

## Common error codes

| Code  | Meaning                                                            |
| ----- | ------------------------------------------------------------------ |
| 40003 | Invalid SQL syntax                                                 |
| 40004 | Invalid query (unsupported feature, unknown column, type mismatch) |
| 80001 | Edge service connection failure (retry)                            |

---

## Best practices

1. Include time-range filters in `WHERE` to limit data scanned.
2. Use specific column names instead of `SELECT *` for better performance.
3. Use `LIMIT` to control result set size.
4. Use approximate aggregation functions (`approx_distinct`, `approx_median`, `approx_percentile_cont`) instead of exact alternatives on large datasets.
5. Enable compaction in R2 Data Catalog to reduce the number of files scanned per query.
6. Use `EXPLAIN` to inspect the execution plan and verify predicate pushdown.
7. Use `WHERE` filters with multi-way joins to reduce intermediate result sizes. Joining three or more large tables without filters can exceed resource limits.
8. Join large fact tables through dimension tables rather than directly joining two large fact tables. For example, join `http_requests` to `firewall_events` through a shared `zones` dimension rather than cross-joining both fact tables.
9. Be cautious with `COUNT(DISTINCT)` across multi-way joins. This combination can produce very large intermediate results. Consider using `approx_distinct()` or breaking the query into smaller steps.
10. Use explicit `JOIN` syntax instead of implicit joins (comma-separated `FROM`) for readability and to ensure the optimizer can choose optimal join ordering.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/reference/","name":"Reference"}},{"@type":"ListItem","position":4,"item":{"@id":"/r2-sql/reference/limitations-best-practices/","name":"Limitations and best practices"}}]}
```

---

---
title: Wrangler commands
description: Wrangler CLI commands for querying data with R2 SQL.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/r2-sql/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Wrangler commands

Note

R2 SQL is currently in open beta. Report R2 SQL bugs in [GitHub ↗](https://github.com/cloudflare/workers-sdk/issues/new/choose). R2 SQL expects there to be a [WRANGLER\_R2\_SQL\_AUTH\_TOKEN](https://developers.cloudflare.com/r2-sql/query-data/#authentication) environment variable to be set.

### `r2 sql query`

Execute SQL query against R2 Data Catalog

* [  npm ](#tab-panel-9558)
* [  pnpm ](#tab-panel-9559)
* [  yarn ](#tab-panel-9560)

Terminal window

```

npx wrangler r2 sql query [WAREHOUSE] [QUERY]


```

Terminal window

```

pnpm wrangler r2 sql query [WAREHOUSE] [QUERY]


```

Terminal window

```

yarn wrangler r2 sql query [WAREHOUSE] [QUERY]


```

* `[WAREHOUSE]` ` string ` required  
R2 Data Catalog warehouse name
* `[QUERY]` ` string ` required  
The SQL query to execute

Global flags

* `--v` ` boolean ` alias: --version  
Show version number
* `--cwd` ` string `  
Run as if Wrangler was started in the specified directory instead of the current working directory
* `--config` ` string ` alias: --c  
Path to Wrangler configuration file
* `--env` ` string ` alias: --e  
Environment to use for operations, and for selecting .env and .dev.vars files
* `--env-file` ` string `  
Path to an .env file to load - can be specified multiple times - values from earlier files are overridden by values in later files
* `--experimental-provision` ` boolean ` aliases: --x-provision default: true  
Experimental: Enable automatic resource provisioning
* `--experimental-auto-create` ` boolean ` alias: --x-auto-create default: true  
Automatically provision draft bindings with new resources
* `--install-skills` ` boolean ` default: false  
Install Cloudflare agents skills, if not already present, without asking the user for confirmation

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/r2-sql/","name":"R2 SQL"}},{"@type":"ListItem","position":3,"item":{"@id":"/r2-sql/reference/","name":"Reference"}},{"@type":"ListItem","position":4,"item":{"@id":"/r2-sql/reference/wrangler-commands/","name":"Wrangler commands"}}]}
```
