Sippy
Sippy is a data migration service that allows you to copy data from other cloud providers to R2 as the data is requested, without paying unnecessary cloud egress fees typically associated with moving large amounts of data.
Migration-specific egress fees are reduced by leveraging requests within the flow of your application where you would already be paying egress fees to simultaneously copy objects to R2.
When enabled for an R2 bucket, Sippy implements the following migration strategy across Workers, S3 API, and public buckets:
- When an object is requested, it is served from your R2 bucket if it is found.
- If the object is not found in R2, the object will simultaneously be returned from your source storage bucket and copied to R2.
- All other operations, including put and delete, continue to work as usual.
Using Sippy as part of your migration strategy can be a good choice when:
- You want to start migrating your data, but you want to avoid paying upfront egress fees to facilitate the migration of your data all at once.
- You want to experiment by serving frequently accessed objects from R2 to eliminate egress fees, without investing time in data migration.
- You have frequently changing data and are looking to conduct a migration while avoiding downtime. Sippy can be used to serve requests while Super Slurper can be used to migrate your remaining data.
If you are looking to migrate all of your data from an existing cloud provider to R2 at one time, we recommend using Super Slurper.
Before getting started, you will need:
- An existing R2 bucket. If you don't already have one, refer to Create buckets.
- API credentials for your source object storage bucket.
- (Wrangler only) Cloudflare R2 Access Key ID and Secret Access Key with read and write permissions. For more information, refer to Authentication.
- From the Cloudflare dashboard, select R2 from the sidebar.
- Select the bucket you'd like to migrate objects to.
- Switch to the Settings tab, then scroll down to the Incremental migration card.
- Select Enable and enter details for the AWS / GCS bucket you'd like to migrate objects from. The credentials you enter must have permissions to read from this bucket. Cloudflare also recommends scoping your credentials to only allow reads from this bucket.
- Select Enable.
To begin, install npm
↗. Then install Wrangler, the Developer Platform CLI.
Log in to Wrangler with the wrangler login
command. Then run the r2 bucket sippy enable
command:
This will prompt you to select between supported object storage providers and lead you through setup.
For information on required parameters and examples of how to enable Sippy, refer to the API documentation. For information about getting started with the Cloudflare API, refer to Make API calls.
When enabled, Sippy exposes metrics that help you understand the progress of your ongoing migrations.
Metric | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|
Requests served by Sippy | The percentage of overall requests served by R2 over a period of time.
| ||||||||
Data migrated by Sippy | The amount of data that has been copied from the source bucket to R2 over a period of time. Reported in bytes. |
To view current and historical metrics:
- Log in to the Cloudflare dashboard ↗ and select your account.
- Go to the R2 tab ↗ and select your bucket.
- Select the Metrics tab.
You can optionally select a time window to query. This defaults to the last 24 hours.
- From the Cloudflare dashboard, select R2 from the sidebar.
- Select the bucket you'd like to disable Sippy for.
- Switch to the Settings tab and scroll down to the Incremental migration card.
- Press Disable.
To disable Sippy, run the r2 bucket sippy disable
command:
For more information on required parameters and examples of how to disable Sippy, refer to the API documentation.
Cloudflare currently supports copying data from the following cloud object storage providers to R2:
- Amazon S3
- Google Cloud Storage (GCS)
When Sippy is enabled, it changes the behavior of certain actions on your R2 bucket across Workers, S3 API, and public buckets.
Action | New behavior | ||||||||
---|---|---|---|---|---|---|---|---|---|
GetObject | Calls to GetObject will first attempt to retrieve the object from your R2 bucket. If the object is not present, the object will be served from the source storage bucket and simultaneously uploaded to the requested R2 bucket. Additional considerations:
| ||||||||
HeadObject | Behaves similarly to GetObject, but only retrieves object metadata. Will not copy objects to the requested R2 bucket. | ||||||||
PutObject | No change to behavior. Calls to PutObject will add objects to the requested R2 bucket. | ||||||||
DeleteObject | No change to behavior. Calls to DeleteObject will delete objects in the requested R2 bucket. Additional considerations:
|
Actions not listed above have no change in behavior. For more information, refer to Workers API reference or S3 API compatibility.
To copy objects from Amazon S3, Sippy requires access permissions to your bucket. While you can use any AWS Identity and Access Management (IAM) user credentials with the correct permissions, Cloudflare recommends you create a user with a narrow set of permissions.
To create credentials with the correct permissions:
- Log in to your AWS IAM account.
- Create a policy with the following format and replace
<BUCKET_NAME>
with the bucket you want to grant access to:
- Create a new user and attach the created policy to that user.
You can now use both the Access Key ID and Secret Access Key when enabling Sippy.
To copy objects from Google Cloud Storage (GCS), Sippy requires access permissions to your bucket. Cloudflare recommends using the Google Cloud predefined Storage Object Viewer
role.
To create credentials with the correct permissions:
- Log in to your Google Cloud console.
- Go to IAM & Admin > Service Accounts.
- Create a service account with the predefined
Storage Object Viewer
role. - Go to the Keys tab of the service account you created.
- Select Add Key > Create a new key and download the JSON key file.
You can now use this JSON key file when enabling Sippy via Wrangler or API.
While R2's ETag generation is compatible with S3's during the regular course of operations, ETags are not guaranteed to be equal when an object is migrated using Sippy. Sippy makes autonomous decisions about the operations it uses when migrating objects to optimize for performance and network usage. It may choose to migrate an object in multiple parts, which affects ETag calculation.
For example, a 320 MiB object originally uploaded to S3 using a single PutObject
operation might be migrated to R2 via multipart operations. In this case, its ETag on R2 will not be the same as its ETag on S3.
Similarly, an object originally uploaded to S3 using multipart operations might also have a different ETag on R2 if the part sizes Sippy chooses for its migration differ from the part sizes this object was originally uploaded with.
Relying on matching ETags before and after the migration is therefore discouraged.