Skip to content
Cloudflare Docs

Data persistence with R2

Last reviewed: 2 months ago

Mount object storage buckets as local filesystem paths to persist data across sandbox lifecycles. This tutorial uses Cloudflare R2, but the same approach works with any S3-compatible provider.

Time to complete: 20 minutes

What you'll build

A Worker that processes data, stores results in an R2 bucket mounted as a local directory, and demonstrates that data persists even after the sandbox is destroyed and recreated.

Key concepts you'll learn:

  • Mounting R2 buckets as filesystem paths
  • Automatic data persistence across sandbox lifecycles
  • Working with mounted storage using standard file operations

Prerequisites

  1. Sign up for a Cloudflare account.
  2. Install Node.js.

Node.js version manager

Use a Node version manager like Volta or nvm to avoid permission issues and change Node.js versions. Wrangler, discussed later in this guide, requires a Node version of 16.17.0 or later.

You'll also need:

1. Create your project

Terminal window
npm create cloudflare@latest -- data-pipeline --template=cloudflare/sandbox-sdk/examples/minimal
Terminal window
cd data-pipeline

2. Configure R2 binding

Add an R2 bucket binding to your wrangler.json:

wrangler.json
{
"name": "data-pipeline",
"compatibility_date": "2025-11-09",
"durable_objects": {
"bindings": [
{ "name": "Sandbox", "class_name": "Sandbox" }
]
},
"r2_buckets": [
{
"binding": "DATA_BUCKET",
"bucket_name": "my-data-bucket"
}
]
}

Replace my-data-bucket with your R2 bucket name. Create the bucket first in the Cloudflare dashboard.

3. Build the data processor

Replace src/index.ts with code that mounts R2 and processes data:

JavaScript
import { getSandbox } from "@cloudflare/sandbox";
export { Sandbox } from "@cloudflare/sandbox";
export default {
async fetch(request, env) {
const url = new URL(request.url);
const sandbox = getSandbox(env.Sandbox, "data-processor");
// Mount R2 bucket to /data directory
await sandbox.mountBucket("my-data-bucket", "/data", {
endpoint: "https://YOUR_ACCOUNT_ID.r2.cloudflarestorage.com",
});
if (url.pathname === "/process") {
// Process data and save to mounted R2
const result = await sandbox.exec("python", {
args: [
"-c",
`
import json
import os
from datetime import datetime
# Read input (or create sample data)
data = [
{'id': 1, 'value': 42},
{'id': 2, 'value': 87},
{'id': 3, 'value': 15}
]
# Process: calculate sum and average
total = sum(item['value'] for item in data)
avg = total / len(data)
# Save results to mounted R2 (/data is the mounted bucket)
result = {
'timestamp': datetime.now().isoformat(),
'total': total,
'average': avg,
'processed_count': len(data)
}
os.makedirs('/data/results', exist_ok=True)
with open('/data/results/latest.json', 'w') as f:
json.dump(result, f, indent=2)
print(json.dumps(result))
`,
],
});
return Response.json({
message: "Data processed and saved to R2",
result: JSON.parse(result.stdout),
});
}
if (url.pathname === "/results") {
// Read results from mounted R2
const result = await sandbox.exec("cat", {
args: ["/data/results/latest.json"],
});
if (!result.success) {
return Response.json(
{ error: "No results found yet" },
{ status: 404 },
);
}
return Response.json({
message: "Results retrieved from R2",
data: JSON.parse(result.stdout),
});
}
if (url.pathname === "/destroy") {
// Destroy sandbox to demonstrate persistence
await sandbox.destroy();
return Response.json({
message: "Sandbox destroyed. Data persists in R2!",
});
}
return new Response(
`
Data Pipeline with Persistent Storage
Endpoints:
- POST /process - Process data and save to R2
- GET /results - Retrieve results from R2
- POST /destroy - Destroy sandbox (data survives!)
Try this flow:
1. POST /process (processes and saves to R2)
2. POST /destroy (destroys sandbox)
3. GET /results (data still accessible from R2)
`,
{ headers: { "Content-Type": "text/plain" } },
);
},
};

4. Local development limitation

5. Deploy to production

Generate R2 API tokens:

  1. Go to R2 > Overview in the Cloudflare dashboard
  2. Select Manage R2 API Tokens
  3. Create a token with Object Read & Write permissions
  4. Copy the Access Key ID and Secret Access Key

Set up credentials as Worker secrets:

Terminal window
npx wrangler secret put AWS_ACCESS_KEY_ID
# Paste your R2 Access Key ID
npx wrangler secret put AWS_SECRET_ACCESS_KEY
# Paste your R2 Secret Access Key

Worker secrets are encrypted and only accessible to your deployed Worker. The SDK automatically detects these credentials when mountBucket() is called.

Deploy your Worker:

Terminal window
npx wrangler deploy

After deployment, wrangler outputs your Worker URL (e.g., https://data-pipeline.yourname.workers.dev).

6. Test the persistence flow

Now test against your deployed Worker. Replace YOUR_WORKER_URL with your actual Worker URL:

Terminal window
# 1. Process data (saves to R2)
curl -X POST https://YOUR_WORKER_URL/process
# Returns: { "message": "Data processed...", "result": { "total": 144, "average": 48, ... } }
# 2. Verify data is accessible
curl https://YOUR_WORKER_URL/results
# Returns the same results from R2
# 3. Destroy the sandbox
curl -X POST https://YOUR_WORKER_URL/destroy
# Returns: { "message": "Sandbox destroyed. Data persists in R2!" }
# 4. Access results again (from new sandbox)
curl https://YOUR_WORKER_URL/results
# Still works! Data persisted across sandbox lifecycle

The key insight: After destroying the sandbox, the next request creates a new sandbox instance, mounts the same R2 bucket, and finds the data still there.

What you learned

In this tutorial, you built a data pipeline that demonstrates filesystem persistence through R2 bucket mounting:

  • Mounting buckets: Use mountBucket() to make R2 accessible as a local directory
  • Standard file operations: Access mounted buckets using familiar filesystem commands (cat, Python open(), etc.)
  • Automatic persistence: Data written to mounted directories survives sandbox destruction
  • Credential management: Configure R2 access using environment variables or explicit credentials

Next steps