Storing Data

A common need is to retrieve data in your Worker. This data might be included in the response being returned, or might be used to make routing or security decisions in your worker.

Generally speed is critical in these applications and making a request back to your origin service or database is too slow. It’s necessary to store or cache data in a location cachable by or near Cloudflare’s edge.

There are several common strategies we have used successfully to store data:

Use Workers KV

Workers KV is an eventually-consistant key-value store which operates on the same machines which execute your Workers. It makes it possible to store large amounts of data which which can be read by your Workers extremely quickly.

Include data in the worker

The Worker script can contain a relatively large amount of data inline (up to about 1 MB) in its source code. If your application requires no more data than that, it’s possible to include your full data object in your code.

You can even use our API to dynamically update your worker whenever your data changes.

Load data files through the Cloudflare Cache

Subrequests made with the Fetch API are cached inside Cloudflare’s edge. This means if you load a data file in your worker from a URL that is appropriately configured to allow caching (i.e. it returns an appropriate Cache-Control header with a max-age value, or it is a Cloudflare-powered site that has configured caching directly), it’s very likely that file will be cached in the very data center which is running your Worker.

A common method of doing this is to store your data in a file in Google Cloud Storage, and use a CNAME to host that file as a part of a Cloudflare-powered domain. As your Worker gets more traffic, that file will be more and more likely to be available in the edge cache around the world, making file loads very fast.

If you have more data than can be practically stored in a single file, it’s common to shard your data. Create multiple files, each of which stores a portion of your data. Then when it’s time to load data use a hash or prefix of your key to load the appropriate file first.

Cache in global memory

Your Worker script is not launched for each individual request. Internally the Worker system manages launching and decommissioning instances of your scripts based on request volume. If your Worker gets a large amount of traffic, it’s likely your Worker will be left running on many of our edge nodes for an extended period of time. It’s possible to use global JavaScript variables to cache data in these running instances.

You can, for example, load data from your origin, and store it in a global variable. That variable will be available the next time your Worker is executed on that node (if it hasn’t been stopped and started in between).

Keep in mind that the memory available for such caching is limited. Each running instance of your Worker can use up to 128MB of RAM, including space used to store your code, local variables used during execution, etc. If your Worker uses too much memory, it will be killed and restarted, possibly disrupting users. Even when it is within the limits, a Worker that is running close to its limits may slow down due to increased garbage collection activity. So, as a rule of thumb, you should not store more than a couple megabytes of data in global variables.

Alternatives

If these solutions don’t meet your data-storage needs, please reach out. Our engineering teams are always interested in brainstorming novel uses of Workers.