Data persistence with R2

Mount object storage buckets as local filesystem paths to persist data across sandbox lifecycles. This tutorial uses Cloudflare R2, but the same approach works with any S3-compatible provider.

Time to complete: 20 minutes

What you'll build

A Worker that processes data, stores results in an R2 bucket mounted as a local directory, and demonstrates that data persists even after the sandbox is destroyed and recreated.

Key concepts you'll learn:

Mounting R2 buckets as filesystem paths
Automatic data persistence across sandbox lifecycles
Working with mounted storage using standard file operations

Prerequisites

Sign up for a Cloudflare account ↗.
Install Node.js ↗.

Node.js version manager

Use a Node version manager like Volta ↗ or nvm ↗ to avoid permission issues and change Node.js versions. Wrangler, discussed later in this guide, requires a Node version of 16.17.0 or later.

You'll also need:

Docker ↗ running locally
An R2 bucket (create one in the Cloudflare dashboard ↗)

1. Create your project

npm create cloudflare@latest -- data-pipeline --template=cloudflare/sandbox-sdk/examples/minimal

yarn create cloudflare data-pipeline --template=cloudflare/sandbox-sdk/examples/minimal

pnpm create cloudflare@latest data-pipeline --template=cloudflare/sandbox-sdk/examples/minimal

cd data-pipeline

2. Configure R2 binding

Add an R2 bucket binding to your wrangler.json:

{
  "name": "data-pipeline",
  "compatibility_date": "2025-11-09",
  "durable_objects": {
    "bindings": [
      { "name": "Sandbox", "class_name": "Sandbox" }
    ]
  },
  "r2_buckets": [
    {
      "binding": "DATA_BUCKET",
      "bucket_name": "my-data-bucket"
    }
  ]
}

Replace my-data-bucket with your R2 bucket name. Create the bucket first in the Cloudflare dashboard ↗.

3. Build the data processor

Replace src/index.ts with code that mounts R2 and processes data:

JavaScript
TypeScript

import { getSandbox } from "@cloudflare/sandbox";

export { Sandbox } from "@cloudflare/sandbox";

export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    const sandbox = getSandbox(env.Sandbox, "data-processor");

    // Mount R2 bucket to /data directory
    await sandbox.mountBucket("my-data-bucket", "/data", {
      endpoint: "https://YOUR_ACCOUNT_ID.r2.cloudflarestorage.com",
    });

    if (url.pathname === "/process") {
      // Process data and save to mounted R2
      const result = await sandbox.exec("python", {
        args: [
          "-c",
          `
import json
import os
from datetime import datetime

# Read input (or create sample data)
data = [
    {'id': 1, 'value': 42},
    {'id': 2, 'value': 87},
    {'id': 3, 'value': 15}
]

# Process: calculate sum and average
total = sum(item['value'] for item in data)
avg = total / len(data)

# Save results to mounted R2 (/data is the mounted bucket)
result = {
    'timestamp': datetime.now().isoformat(),
    'total': total,
    'average': avg,
    'processed_count': len(data)
}

os.makedirs('/data/results', exist_ok=True)
with open('/data/results/latest.json', 'w') as f:
    json.dump(result, f, indent=2)

print(json.dumps(result))
        `,
        ],
      });

      return Response.json({
        message: "Data processed and saved to R2",
        result: JSON.parse(result.stdout),
      });
    }

    if (url.pathname === "/results") {
      // Read results from mounted R2
      const result = await sandbox.exec("cat", {
        args: ["/data/results/latest.json"],
      });

      if (!result.success) {
        return Response.json(
          { error: "No results found yet" },
          { status: 404 },
        );
      }

      return Response.json({
        message: "Results retrieved from R2",
        data: JSON.parse(result.stdout),
      });
    }

    if (url.pathname === "/destroy") {
      // Destroy sandbox to demonstrate persistence
      await sandbox.destroy();
      return Response.json({
        message: "Sandbox destroyed. Data persists in R2!",
      });
    }

    return new Response(
      `
Data Pipeline with Persistent Storage

Endpoints:
- POST /process  - Process data and save to R2
- GET /results   - Retrieve results from R2
- POST /destroy  - Destroy sandbox (data survives!)

Try this flow:
1. POST /process  (processes and saves to R2)
2. POST /destroy  (destroys sandbox)
3. GET /results   (data still accessible from R2)
    `,
      { headers: { "Content-Type": "text/plain" } },
    );
  },
};

import { getSandbox, type Sandbox } from '@cloudflare/sandbox';

export { Sandbox } from '@cloudflare/sandbox';

interface Env {
  Sandbox: DurableObjectNamespace<Sandbox>;
  DATA_BUCKET: R2Bucket;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);
    const sandbox = getSandbox(env.Sandbox, 'data-processor');

    // Mount R2 bucket to /data directory
    await sandbox.mountBucket('my-data-bucket', '/data', {
      endpoint: 'https://YOUR_ACCOUNT_ID.r2.cloudflarestorage.com'
    });

    if (url.pathname === '/process') {
      // Process data and save to mounted R2
      const result = await sandbox.exec('python', {
        args: ['-c', `
import json
import os
from datetime import datetime

# Read input (or create sample data)
data = [
    {'id': 1, 'value': 42},
    {'id': 2, 'value': 87},
    {'id': 3, 'value': 15}
]

# Process: calculate sum and average
total = sum(item['value'] for item in data)
avg = total / len(data)

# Save results to mounted R2 (/data is the mounted bucket)
result = {
    'timestamp': datetime.now().isoformat(),
    'total': total,
    'average': avg,
    'processed_count': len(data)
}

os.makedirs('/data/results', exist_ok=True)
with open('/data/results/latest.json', 'w') as f:
    json.dump(result, f, indent=2)

print(json.dumps(result))
        `]
      });

      return Response.json({
        message: 'Data processed and saved to R2',
        result: JSON.parse(result.stdout)
      });
    }

    if (url.pathname === '/results') {
      // Read results from mounted R2
      const result = await sandbox.exec('cat', {
        args: ['/data/results/latest.json']
      });

      if (!result.success) {
        return Response.json({ error: 'No results found yet' }, { status: 404 });
      }

      return Response.json({
        message: 'Results retrieved from R2',
        data: JSON.parse(result.stdout)
      });
    }

    if (url.pathname === '/destroy') {
      // Destroy sandbox to demonstrate persistence
      await sandbox.destroy();
      return Response.json({ message: 'Sandbox destroyed. Data persists in R2!' });
    }

    return new Response(`
Data Pipeline with Persistent Storage

Endpoints:
- POST /process  - Process data and save to R2
- GET /results   - Retrieve results from R2
- POST /destroy  - Destroy sandbox (data survives!)

Try this flow:
1. POST /process  (processes and saves to R2)
2. POST /destroy  (destroys sandbox)
3. GET /results   (data still accessible from R2)
    `, { headers: { 'Content-Type': 'text/plain' } });
  }
};

4. Local development limitation

5. Deploy to production

Generate R2 API tokens:

Go to R2 > Overview in the Cloudflare dashboard ↗
Select Manage R2 API Tokens
Create a token with Object Read & Write permissions
Copy the Access Key ID and Secret Access Key

Set up credentials as Worker secrets:

npx wrangler secret put AWS_ACCESS_KEY_ID
# Paste your R2 Access Key ID

npx wrangler secret put AWS_SECRET_ACCESS_KEY
# Paste your R2 Secret Access Key

Worker secrets are encrypted and only accessible to your deployed Worker. The SDK automatically detects these credentials when mountBucket() is called.

Deploy your Worker:

npx wrangler deploy

After deployment, wrangler outputs your Worker URL (e.g., https://data-pipeline.yourname.workers.dev).

6. Test the persistence flow

Now test against your deployed Worker. Replace YOUR_WORKER_URL with your actual Worker URL:

# 1. Process data (saves to R2)
curl -X POST https://YOUR_WORKER_URL/process
# Returns: { "message": "Data processed...", "result": { "total": 144, "average": 48, ... } }

# 2. Verify data is accessible
curl https://YOUR_WORKER_URL/results
# Returns the same results from R2

# 3. Destroy the sandbox
curl -X POST https://YOUR_WORKER_URL/destroy
# Returns: { "message": "Sandbox destroyed. Data persists in R2!" }

# 4. Access results again (from new sandbox)
curl https://YOUR_WORKER_URL/results
# Still works! Data persisted across sandbox lifecycle

The key insight: After destroying the sandbox, the next request creates a new sandbox instance, mounts the same R2 bucket, and finds the data still there.

What you learned

In this tutorial, you built a data pipeline that demonstrates filesystem persistence through R2 bucket mounting:

Mounting buckets: Use mountBucket() to make R2 accessible as a local directory
Standard file operations: Access mounted buckets using familiar filesystem commands (cat, Python open(), etc.)
Automatic persistence: Data written to mounted directories survives sandbox destruction
Credential management: Configure R2 access using environment variables or explicit credentials

Next steps

Mount buckets guide - Comprehensive mounting reference
Storage API - Complete API documentation
Environment variables - Credential configuration options

R2 documentation - Learn about Cloudflare R2
Background processes guide - Long-running data processing
Sandboxes concept - Understanding sandbox lifecycle

Was this helpful?

Community
X
Discord
YouTube
GitHub