A common need is to retrieve data in your Worker. This data might be included in the response being returned, or might be used to make routing or security decisions in your worker.
Generally speed is critical in these applications and making a request back to your origin service or database is too slow. It’s necessary to store or cache data in a location cachable by or near Cloudflare’s edge.
There are several common strategies we have used successfully to store data:
Workers KV is an eventually-consistant key-value store which operates on the same machines which execute your Workers. It makes it possible to store large amounts of data which which can be read by your Workers extremely quickly.
The Worker script can contain a relatively large amount of data inline (up to about 1 MB) in its source code. If your application requires no more data than that, it’s possible to include your full data object in your code.
You can even use our API to dynamically update your worker whenever your data changes.
Subrequests made with the Fetch API are cached inside Cloudflare’s edge. This
means if you load a data file in your worker from a URL that is appropriately configured to allow
caching (i.e. it returns an appropriate
Cache-Control header with a
max-age value, or it is a
Cloudflare-powered site that has configured caching directly), it’s very likely that file will be
cached in the very data center which is running your Worker.
A common method of doing this is to store your data in a file in Google Cloud Storage, and use a CNAME to host that file as a part of a Cloudflare-powered domain. As your Worker gets more traffic, that file will be more and more likely to be available in the edge cache around the world, making file loads very fast.
If you have more data than can be practically stored in a single file, it’s common to shard your data. Create multiple files, each of which stores a portion of your data. Then when it’s time to load data use a hash or prefix of your key to load the appropriate file first.
You can, for example, load data from your origin, and store it in a global variable. That variable will be available the next time your Worker is executed on that node (if it hasn’t been stopped and started in between).
Keep in mind that the memory available for such caching is limited. Each running instance of your Worker can use up to 128MB of RAM, including space used to store your code, local variables used during execution, etc. If your Worker uses too much memory, it will be killed and restarted, possibly disrupting users. Even when it is within the limits, a Worker that is running close to its limits may slow down due to increased garbage collection activity. So, as a rule of thumb, you should not store more than a couple megabytes of data in global variables.
If these solutions don’t meet your data-storage needs, please reach out. Our engineering teams are always interested in brainstorming novel uses of Workers.