Workers Binding

You can use Workers Bindings to interact with the Batch API.

Send a Batch request

Send your initial batch inference request by composing a JSON payload containing an array of individual inference requests and the queueRequest: true property (which is what controlls queueing behavior).

export interface Env {
  AI: Ai;
}
export default {
  async fetch(request, env): Promise<Response> {
    const embeddings = await env.AI.run(
      "@cf/baai/bge-m3",
      {
        requests: [
          {
            query: "This is a story about Cloudflare",
            contexts: [
              {
                text: "This is a story about an orange cloud",
              },
              {
                text: "This is a story about a llama",
              },
              {
                text: "This is a story about a hugging emoji",
              },
            ],
          },
        ],
      },
      { queueRequest: true },
    );

    return Response.json(embeddings);
  },
} satisfies ExportedHandler<Env>;

{
  "status": "queued",
  "model": "@cf/baai/bge-m3",
  "request_id": "000-000-000"
}

You will get a response with the following values:

status: Indicates that your request is queued.
request_id: A unique identifier for the batch request.
model: The model used for the batch inference.

Of these, the request_id is important for when you need to poll the batch status.

Poll batch status

Once your batch request is queued, use the request_id to poll for its status. During processing, the API returns a status queued or running indicating that the request is still in the queue or being processed.

export interface Env {
  AI: Ai;
}

export default {
  async fetch(request, env): Promise<Response> {
    const status = await env.AI.run("@cf/baai/bge-m3", {
      request_id: "000-000-000",
    });

    return Response.json(status);
  },
} satisfies ExportedHandler<Env>;

{
  "responses": [
    {
      "id": 0,
      "result": {
        "response": [
          { "id": 0, "score": 0.73974609375 },
          { "id": 1, "score": 0.642578125 },
          { "id": 2, "score": 0.6220703125 }
        ]
      },
      "success": true,
      "external_reference": null
    }
  ],
  "usage": { "prompt_tokens": 12, "completion_tokens": 0, "total_tokens": 12 }
}

When the inference is complete, the API returns a final HTTP status code of 200 along with an array of responses. Each response object corresponds to an individual input prompt, identified by an id that maps to the index of the prompt in your original request.

Was this helpful?

Community
X
Discord
YouTube
GitHub