Skip to content

Changelog

New updates and improvements at Cloudflare.

R2 SQL
hero image
  1. R2 SQL now supports an expanded SQL grammar so you can write richer analytical queries without exporting data. This release adds CASE expressions, column aliases, arithmetic in clauses, 163 scalar functions, 33 aggregate functions, EXPLAIN, Common Table Expressions (CTEs),and full struct/array/map access. R2 SQL is Cloudflare's serverless, distributed, analytics query engine for querying Apache Iceberg tables stored in R2 Data Catalog. This page documents the supported SQL syntax.

    Highlights

    • Column aliasesSELECT col AS alias now works in all clauses
    • CASE expressions — conditional logic directly in SQL (searched and simple forms)
    • Scalar functions — 163 new functions across math, string, datetime, regex, crypto, encoding, and type inspection categories
    • Aggregate functions — statistical (variance, stddev, correlation, regression), bitwise, boolean, and positional aggregates join the existing basic and approximate functions
    • Complex types — query struct fields with bracket notation, use 46 array functions, and extract map keys/values
    • Common table expressions (CTEs) — use WITH ... AS to define named temporary result sets. Chained CTEs are supported. All CTEs must reference the same single table.
    • Full expression support — arithmetic, type casting (CAST, TRY_CAST, :: shorthand), and EXTRACT in SELECT, WHERE, GROUP BY, HAVING, and ORDER BY

    Examples

    CASE expressions with statistical aggregates

    SELECT source,
    CASE
    WHEN AVG(price) > 30 THEN 'premium'
    WHEN AVG(price) > 10 THEN 'mid-tier'
    ELSE 'budget'
    END AS tier,
    round(stddev(price), 2) AS price_volatility,
    approx_percentile_cont(price, 0.95) AS p95_price
    FROM my_namespace.sales_data
    GROUP BY source

    Struct and array access

    SELECT product_name,
    pricing['price'] AS price,
    array_to_string(tags, ', ') AS tag_list
    FROM my_namespace.products
    WHERE array_has(tags, 'Action')
    ORDER BY pricing['price'] DESC
    LIMIT 10

    Chained CTEs with time-series analysis

    WITH monthly AS (
    SELECT date_trunc('month', sale_timestamp) AS month,
    department,
    COUNT(*) AS transactions,
    round(AVG(total_amount), 2) AS avg_amount
    FROM my_namespace.sales_data
    WHERE sale_timestamp BETWEEN '2025-01-01T00:00:00Z' AND '2025-12-31T23:59:59Z'
    GROUP BY date_trunc('month', sale_timestamp), department
    ),
    ranked AS (
    SELECT month, department, transactions, avg_amount,
    CASE
    WHEN avg_amount > 1000 THEN 'high-value'
    WHEN avg_amount > 500 THEN 'mid-value'
    ELSE 'standard'
    END AS tier
    FROM monthly
    WHERE transactions > 100
    )
    SELECT * FROM ranked
    ORDER BY month, avg_amount DESC

    For the full function reference and syntax details, refer to the SQL reference. For limitations and best practices, refer to Limitations and best practices.

  1. R2 SQL now supports five approximate aggregation functions for fast analysis of large datasets. These functions trade minor precision for improved performance on high-cardinality data.

    New functions

    • APPROX_PERCENTILE_CONT(column, percentile) — Returns the approximate value at a given percentile (0.0 to 1.0). Works on integer and decimal columns.
    • APPROX_PERCENTILE_CONT_WITH_WEIGHT(column, weight, percentile) — Weighted percentile calculation where each row contributes proportionally to its weight column value.
    • APPROX_MEDIAN(column) — Returns the approximate median. Equivalent to APPROX_PERCENTILE_CONT(column, 0.5).
    • APPROX_DISTINCT(column) — Returns the approximate number of distinct values. Works on any column type.
    • APPROX_TOP_K(column, k) — Returns the k most frequent values with their counts as a JSON array.

    All functions support WHERE filters. All except APPROX_TOP_K support GROUP BY.

    Examples

    -- Percentile analysis on revenue data
    SELECT approx_percentile_cont(total_amount, 0.25),
    approx_percentile_cont(total_amount, 0.5),
    approx_percentile_cont(total_amount, 0.75)
    FROM my_namespace.sales_data
    -- Median per department
    SELECT department, approx_median(total_amount)
    FROM my_namespace.sales_data
    GROUP BY department
    -- Approximate distinct customers by region
    SELECT region, approx_distinct(customer_id)
    FROM my_namespace.sales_data
    GROUP BY region
    -- Top 5 most frequent departments
    SELECT approx_top_k(department, 5)
    FROM my_namespace.sales_data
    -- Combine approximate and standard aggregations
    SELECT COUNT(*),
    AVG(total_amount),
    approx_percentile_cont(total_amount, 0.5),
    approx_distinct(customer_id)
    FROM my_namespace.sales_data
    WHERE region = 'North'

    For the full syntax and additional examples, refer to the SQL reference.

  1. R2 SQL now supports aggregation functions, GROUP BY, HAVING, along with schema discovery commands to make it easy to explore your data catalog.

    Aggregation Functions

    You can now perform aggregations on Apache Iceberg tables in R2 Data Catalog using standard SQL functions including COUNT(*), SUM(), AVG(), MIN(), and MAX(). Combine these with GROUP BY to analyze data across dimensions, and use HAVING to filter aggregated results.

    -- Calculate average transaction amounts by department
    SELECT department, COUNT(*), AVG(total_amount)
    FROM my_namespace.sales_data
    WHERE region = 'North'
    GROUP BY department
    HAVING COUNT(*) > 50
    ORDER BY AVG(total_amount) DESC
    -- Find high-value departments
    SELECT department, SUM(total_amount)
    FROM my_namespace.sales_data
    GROUP BY department
    HAVING SUM(total_amount) > 50000

    Schema Discovery

    New metadata commands make it easy to explore your data catalog and understand table structures:

    • SHOW DATABASES or SHOW NAMESPACES - List all available namespaces
    • SHOW TABLES IN namespace_name - List tables within a namespace
    • DESCRIBE namespace_name.table_name - View table schema and column types
    Terminal window
    npx wrangler r2 sql query "{ACCOUNT_ID}_{BUCKET_NAME}" "DESCRIBE default.sales_data;"
    ⛅️ wrangler 4.54.0
    ─────────────────────────────────────────────
    ┌──────────────────┬────────────────┬──────────┬─────────────────┬───────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────┐
    column_name type required initial_default write_default doc
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    sale_id BIGINT false Unique identifier for each sales transaction
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    sale_timestamp TIMESTAMPTZ false Exact date and time when the sale occurred (used for partitioning) │
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    department TEXT false Product department (8 categories: Electronics, Beauty, Home, Toys, Sports, Food, Clothing, Books) │
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    category TEXT false Product category grouping (4 categories: Premium, Standard, Budget, Clearance) │
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    region TEXT false Geographic sales region (5 regions: North, South, East, West, Central) │
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    product_id INT false Unique identifier for the product sold
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    quantity INT false Number of units sold in this transaction (range: 1-50) │
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    unit_price DECIMAL(10, 2) false Price per unit in dollars (range: $5.00-$500.00) │
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    total_amount DECIMAL(10, 2) false Total sale amount before tax (quantity × unit_price with discounts applied) │
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    discount_percent INT false Discount percentage applied to this sale (0-50%) │
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    tax_amount DECIMAL(10, 2) false Tax amount collected on this sale
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    profit_margin DECIMAL(10, 2) false Profit margin on this sale as a decimal percentage
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    customer_id INT false Unique identifier for the customer who made the purchase
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    is_online_sale BOOLEAN false Boolean flag indicating if sale was made online (true) or in-store (false) │
    ├──────────────────┼────────────────┼──────────┼─────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
    sale_date DATE false Calendar date of the sale (extracted from sale_timestamp) │
    └──────────────────┴────────────────┴──────────┴─────────────────┴───────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────┘
    Read 0 B across 0 files from R2
    On average, 0 B / s

    To learn more about the new aggregation capabilities and schema discovery commands, check out the SQL reference. If you're new to R2 SQL, visit our getting started guide to begin querying your data.

  1. Today, we're launching the open beta for R2 SQL: A serverless, distributed query engine that can efficiently analyze petabytes of data in Apache Iceberg tables managed by R2 Data Catalog.

    R2 SQL is ideal for exploring analytical and time-series data stored in R2, such as logs, events from Pipelines, or clickstream and user behavior data.

    If you already have a table in R2 Data Catalog, running queries is as simple as:

    Terminal window
    npx wrangler r2 sql query YOUR_WAREHOUSE "
    SELECT
    user_id,
    event_type,
    value
    FROM events.user_events
    WHERE event_type = 'CHANGELOG' or event_type = 'BLOG'
    AND __ingest_ts > '2025-09-24T00:00:00Z'
    ORDER BY __ingest_ts DESC
    LIMIT 100"

    To get started with R2 SQL, check out our getting started guide or learn more about supported features in the SQL reference. For a technical deep dive into how we built R2 SQL, read our blog post.