Getting started
This guide will instruct you through:
- Creating your first R2 bucket and enabling its data catalog.
- Creating an API token needed for query engines to authenticate with your data catalog.
- Using PyIceberg ↗ to create your first Iceberg table in a marimo ↗ Python notebook.
- Using PyIceberg ↗ to load sample data into your table and query it.
- Sign up for a Cloudflare account ↗.
- Install
Node.js
↗.
Node.js version manager
Use a Node version manager like Volta ↗ or nvm ↗ to avoid permission issues and change Node.js versions. Wrangler, discussed later in this guide, requires a Node version of 16.17.0
or later.
-
If not already logged in, run:
npx wrangler login -
Create an R2 bucket:
npx wrangler r2 bucket create r2-data-catalog-tutorial
- From the Cloudflare dashboard, select R2 Object Storage from the sidebar.
- Select Create bucket.
- Enter the bucket name: r2-data-catalog-tutorial
- Select Create bucket.
Then, enable the catalog on your chosen R2 bucket:
npx wrangler r2 bucket catalog enable r2-data-catalog-tutorial
- From the Cloudflare dashboard, select R2 Object Storage from the sidebar.
- Select the bucket: r2-data-catalog-tutorial.
- Switch to the Settings tab, scroll down to R2 Data Catalog, and select Enable.
- Once enabled, note the Catalog URI and Warehouse name.
Iceberg clients (including PyIceberg ↗) must authenticate to the catalog with a Cloudflare API token that has both R2 and catalog permissions.
-
From the Cloudflare dashboard, select R2 Object Storage from the sidebar.
-
Expand the API dropdown and select Manage API tokens.
-
Select Create API token.
-
Select the R2 Token text to edit your API token name.
-
Under Permissions, choose the Admin Read & Write permission.
-
Select Create API Token.
-
Note the Token value.
You need to install a Python package manager. In this guide, use uv ↗. If you do not already have uv installed, follow the installing uv guide ↗.
We will use marimo ↗ as a Python notebook.
-
Create a directory where our notebook will be stored:
mkdir r2-data-catalog-notebook -
Change into our new directory:
cd r2-data-catalog-notebook -
Create a new Python virtual environment:
uv venv -
Activate the Python virtual environment:
source .venv/bin/activate -
Install marimo with uv:
uv pip install marimo
-
Create a file called
r2-data-catalog-tutorial.py
. -
Paste the following code snippet into your
r2-data-catalog-tutorial.py
file:import marimo__generated_with = "0.11.31"app = marimo.App(width="medium")@app.celldef _():import marimo as moreturn (mo,)@app.celldef _():import pandasimport pyarrow as paimport pyarrow.compute as pcimport pyarrow.parquet as pqfrom pyiceberg.catalog.rest import RestCatalogfrom pyiceberg.exceptions import NamespaceAlreadyExistsError# Define catalog connection details (replace variables)WAREHOUSE = "<WAREHOUSE>"TOKEN = "<TOKEN>"CATALOG_URI = "<CATALOG_URI>"# Connect to R2 Data Catalogcatalog = RestCatalog(name="my_catalog",warehouse=WAREHOUSE,uri=CATALOG_URI,token=TOKEN,)return (CATALOG_URI,NamespaceAlreadyExistsError,RestCatalog,TOKEN,WAREHOUSE,catalog,pa,pandas,pc,pq,)@app.celldef _(NamespaceAlreadyExistsError, catalog):# Create default namespace if neededtry:catalog.create_namespace("default")except NamespaceAlreadyExistsError:passreturn@app.celldef _(pa):# Create simple PyArrow tabledf = pa.table({"id": [1, 2, 3],"name": ["Alice", "Bob", "Charlie"],"score": [80.0, 92.5, 88.0],})return (df,)@app.celldef _(catalog, df):# Create or load Iceberg tabletest_table = ("default", "people")if not catalog.table_exists(test_table):print(f"Creating table: {test_table}")table = catalog.create_table(test_table,schema=df.schema,)else:table = catalog.load_table(test_table)return table, test_table@app.celldef _(df, table):# Append datatable.append(df)return@app.celldef _(table):print("Table contents:")scanned = table.scan().to_arrow()print(scanned.to_pandas())return (scanned,)@app.celldef _():# Optional cleanup. To run uncomment and run cell# print(f"Deleting table: {test_table}")# catalog.drop_table(test_table)# print("Table dropped.")returnif __name__ == "__main__":app.run() -
Replace the
CATALOG_URI
,WAREHOUSE
, andTOKEN
variables with your values from sections 2 and 3 respectively.
In the Python notebook above, you:
- Connect to your catalog.
- Create the
default
namespace. - Create a simple PyArrow table.
- Create (or load) the
people
table in thedefault
namespace. - Append sample data to the table.
- Print the contents of the table.
- (Optional) Drop the
people
table we created for this tutorial.
Was this helpful?
- Resources
- API
- New to Cloudflare?
- Products
- Sponsorships
- Open Source
- Support
- Help Center
- System Status
- Compliance
- GDPR
- Company
- cloudflare.com
- Our team
- Careers
- 2025 Cloudflare, Inc.
- Privacy Policy
- Terms of Use
- Report Security Issues
- Trademark