Skip to content
Cloudflare Docs

R2 Data Catalog

R2 Data Catalog sinks write processed data from pipelines as Apache Iceberg tables to R2 Data Catalog. Iceberg tables provide ACID transactions, schema evolution, and time travel capabilities for analytics workloads.

To create an R2 Data Catalog sink, run the pipelines sinks create command and specify the sink type, target bucket, namespace, and table name:

Terminal window
npx wrangler pipelines sinks create my-sink \
--type r2-data-catalog \
--bucket my-bucket \
--namespace my_namespace \
--table my_table \
--catalog-token YOUR_CATALOG_TOKEN

The sink will create the specified namespace and table if they do not exist. Sinks cannot be created for existing Iceberg tables.

Format

R2 Data Catalog sinks only support Parquet format. JSON format is not supported for Iceberg tables.

Compression options

Configure Parquet compression for optimal storage and query performance:

Terminal window
--compression zstd

Available compression options:

  • zstd (default) - Best compression ratio
  • snappy - Fastest compression
  • gzip - Good compression, widely supported
  • lz4 - Fast compression with reasonable ratio
  • uncompressed - No compression

Row group size

Row groups are sets of rows in a Parquet file that are stored together, affecting memory usage and query performance. Configure the target row group size in MB:

Terminal window
--target-row-group-size 256

Batching and rolling policy

Control when data is written to Iceberg tables. Configure based on your needs:

  • Lower values: More frequent writes, smaller files, lower latency
  • Higher values: Less frequent writes, larger files, better query performance

Roll interval

Set how often files are written (default: 300 seconds):

Terminal window
--roll-interval 60 # Write files every 60 seconds

Roll size

Set maximum file size in MB before creating a new file:

Terminal window
--roll-size 100 # Create new file after 100MB

Authentication

R2 Data Catalog sinks require an API token with R2 Admin Read & Write permissions. This permission grants the sink access to both R2 Data Catalog and R2 storage.

Terminal window
--catalog-token YOUR_CATALOG_TOKEN