Scaling and Routing

Scaling container instances with `get()`

Currently, Containers are only scaled manually by calling BINDING.get() with a unique ID, then starting the container. Unless manualStart is set to true on the Container class, each instance will start when get() is called.

// gets 3 container instances
env.MY_CONTAINER.get(idOne)
env.MY_CONTAINER.get(idTwo)
env.MY_CONTAINER.get(idThree)

Each instance will run until its sleepAfter time has elapsed, or until it is manually stopped.

This behavior is very useful when you want explicit control over the lifecycle of container instances. For instance, you may want to spin up a container backend instance for a specific user, or you may briefly run a code sandbox to isolate AI-generated code, or you may want to run a short-lived batch job.

The `getRandom` helper function

However, sometimes you want to run multiple instances of a container and easily route requests to them.

Currently, the best way to achieve this is with the temporary getRandom helper function:

import { Container, getRandom } from "@cloudflare/containers";

const INSTANCE_COUNT = 3;

class Backend extends Container {
  defaultPort = 8080;
  sleepAfter = "2h";
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    // note: "getRandom" to be replaced with latency-aware routing in the near future
    const containerInstance = getRandom(env.BACKEND, INSTANCE_COUNT)
    return containerInstance.fetch(request);
  },
};

We have provided the getRandom function as a stopgap solution to route to multiple stateless container instances. It will randomly select one of N instances for each request and route to it. Unfortunately, it has two major downsides:

It requires that the user set a fixed number of instances to route to.
It will randomly select each instance, regardless of location.

We plan to fix these issues with built-in autoscaling and routing features in the near future.

Autoscaling and routing (unreleased)

You will be able to turn autoscaling on for a Container, by setting the autoscale property to on the Container class:

class MyBackend extends Container {
  autoscale = true;
  defaultPort = 8080;
}

This instructs the platform to automatically scale instances based on incoming traffic and resource usage (memory, CPU).

Container instances will be launched automatically to serve local traffic, and will be stopped when they are no longer needed.

To route requests to the correct instance, you will use the getContainer() helper function to get a container instance, then pass requests to it:

export default {
  async fetch(request, env) {
    return getContainer(env.MY_BACKEND).fetch(request);
  },
};

This will send traffic to the nearest ready instance of a container. If a container is overloaded or has not yet launched, requests will be routed to potentially more distant container. Container readiness can be automatically determined based on resource use, but will also be configurable with custom readiness checks.

Autoscaling and latency-aware routing will be available in the near future, and will be documented in more detail when released. Until then, you can use the getRandom helper function to route requests to multiple container instances.

Was this helpful?

Community
X
Discord
YouTube
GitHub