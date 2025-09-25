R2
R2 sinks write processed data from pipelines as raw files to R2 object storage. They currently support writing to JSON and Parquet formats.
To create an R2 sink, run the
pipelines sinks create command and specify the sink type and target bucket:
R2 sinks support two output formats:
Write data as newline-delimited JSON files:
Write data as Parquet files for better query performance and compression:
Compression options for Parquet:
zstd(default) - Best compression ratio
snappy- Fastest compression
gzip- Good compression, widely supported
lz4- Fast compression with reasonable ratio
uncompressed- No compression
Row group size: Row groups ↗ are sets of rows in a Parquet file that are stored together, affecting memory usage and query performance. Configure the target row group size in MB:
Files are written with UUID names within the partitioned directory structure. For example, with path
analytics and default partitioning:
Set a base directory in your bucket where files will be written:
R2 sinks automatically partition files by time using a configurable pattern. The default pattern is
year=%Y/month=%m/day=%d (Hive-style partitioning).
For available format specifiers, refer to strftime documentation ↗.
Control when files are written to R2. Configure based on your needs:
- Lower values: More frequent writes, smaller files, lower latency
- Higher values: Less frequent writes, larger files, better query performance
Set how often files are written (default: 300 seconds):
Set maximum file size in MB before creating a new file:
R2 sinks require an API credentials (Access Key ID and Secret Access Key) with Object Read & Write permissions to write data to your bucket.
