Workers Binding
You can use Workers Bindings to interact with the Batch API.
Send your initial batch inference request by composing a JSON payload containing an array of individual inference requests and the
queueRequest: true property (which is what controlls queueing behavior).
You will get a response with the following values:
status: Indicates that your request is queued.
request_id: A unique identifier for the batch request.
model: The model used for the batch inference.
Of these, the
request_id is important for when you need to poll the batch status.
Once your batch request is queued, use the
request_id to poll for its status. During processing, the API returns a status
queued or
running indicating that the request is still in the queue or being processed.
When the inference is complete, the API returns a final HTTP status code of
200 along with an array of responses. Each response object corresponds to an individual input prompt, identified by an
id that maps to the index of the prompt in your original request.
