granite-4.0-h-micro
Text Generation • ibm-graniteGranite 4.0 instruct models deliver strong performance across benchmarks, achieving industry-leading results in key agentic tasks like instruction following and function calling. These efficiencies make the models well-suited for a wide range of use cases like retrieval-augmented generation (RAG), multi-agent workflows, and edge deployments.
Model Info | |
---|---|
Context Window ↗ | 131,000 tokens |
Unit Pricing | $0.017 per M input tokens, $0.11 per M output tokens |
Playground
Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.
Launch the LLM PlaygroundUsage
Worker - Streaming
export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> {
const messages = [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World", }, ];
const stream = await env.AI.run("@cf/ibm-granite/granite-4.0-h-micro", { messages, stream: true, });
return new Response(stream, { headers: { "content-type": "text/event-stream" }, }); },} satisfies ExportedHandler<Env>;
Worker
export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> {
const messages = [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World", }, ]; const response = await env.AI.run("@cf/ibm-granite/granite-4.0-h-micro", { messages });
return Response.json(response); },} satisfies ExportedHandler<Env>;
Python
import osimport requests
ACCOUNT_ID = "your-account-id"AUTH_TOKEN = os.environ.get("CLOUDFLARE_AUTH_TOKEN")
prompt = "Tell me all about PEP-8"response = requests.post( f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/ibm-granite/granite-4.0-h-micro", headers={"Authorization": f"Bearer {AUTH_TOKEN}"}, json={ "messages": [ {"role": "system", "content": "You are a friendly assistant"}, {"role": "user", "content": prompt} ] })result = response.json()print(result)
curl
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/ibm-granite/granite-4.0-h-micro \ -X POST \ -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \ -d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'
Parameters
* indicates a required field
Input
-
0
object-
prompt
string required min 1The input text prompt for the model to generate a response.
-
lora
stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
-
response_format
object-
type
string -
json_schema
-
-
raw
booleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
-
stream
booleanIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
-
max_tokens
integer default 2000The maximum number of tokens to generate in the response.
-
temperature
number default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results.
-
top_p
number min 0.001 max 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
-
top_k
integer min 1 max 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
-
seed
integer min 1 max 9999999999Random seed for reproducibility of the generation.
-
repetition_penalty
number min 0 max 2Penalty for repeated tokens; higher values discourage repetition.
-
frequency_penalty
number min -2 max 2Decreases the likelihood of the model repeating the same lines verbatim.
-
presence_penalty
number min -2 max 2Increases the likelihood of the model introducing new topics.
-
-
1
object-
messages
array requiredAn array of message objects representing the conversation history.
-
items
object-
role
string requiredThe role of the message sender (e.g., 'user', 'assistant', 'system', 'tool').
-
content
string requiredThe content of the message as a string.
-
-
-
functions
array-
items
object-
name
string required -
code
string required
-
-
-
tools
arrayA list of tools available for the assistant to use.
-
items
one of-
0
object-
name
string requiredThe name of the tool. More descriptive the better.
-
description
string requiredA brief description of what the tool does.
-
parameters
object requiredSchema defining the parameters accepted by the tool.
-
type
string requiredThe type of the parameters object (usually 'object').
-
required
arrayList of required parameter names.
-
items
string
-
-
properties
object requiredDefinitions of each parameter.
-
additionalProperties
object-
type
string requiredThe data type of the parameter.
-
description
string requiredA description of the expected parameter.
-
-
-
-
-
1
object-
type
string requiredSpecifies the type of tool (e.g., 'function').
-
function
object requiredDetails of the function tool.
-
name
string requiredThe name of the function.
-
description
string requiredA brief description of what the function does.
-
parameters
object requiredSchema defining the parameters accepted by the function.
-
type
string requiredThe type of the parameters object (usually 'object').
-
required
arrayList of required parameter names.
-
items
string
-
-
properties
object requiredDefinitions of each parameter.
-
additionalProperties
object-
type
string requiredThe data type of the parameter.
-
description
string requiredA description of the expected parameter.
-
-
-
-
-
-
-
-
response_format
object-
type
string -
json_schema
-
-
raw
booleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
-
stream
booleanIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
-
max_tokens
integer default 2000The maximum number of tokens to generate in the response.
-
temperature
number default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results.
-
top_p
number min 0.001 max 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
-
top_k
integer min 1 max 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
-
seed
integer min 1 max 9999999999Random seed for reproducibility of the generation.
-
repetition_penalty
number min 0 max 2Penalty for repeated tokens; higher values discourage repetition.
-
frequency_penalty
number min -2 max 2Decreases the likelihood of the model repeating the same lines verbatim.
-
presence_penalty
number min -2 max 2Increases the likelihood of the model introducing new topics.
-
-
2
object-
requests
array required-
items
one of-
0
object-
prompt
string required min 1The input text prompt for the model to generate a response.
-
lora
stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
-
response_format
object-
type
string -
json_schema
-
-
raw
booleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
-
stream
booleanIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
-
max_tokens
integer default 256The maximum number of tokens to generate in the response.
-
temperature
number default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results.
-
top_p
number min 0.001 max 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
-
top_k
integer min 1 max 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
-
seed
integer min 1 max 9999999999Random seed for reproducibility of the generation.
-
repetition_penalty
number min 0 max 2Penalty for repeated tokens; higher values discourage repetition.
-
frequency_penalty
number min -2 max 2Decreases the likelihood of the model repeating the same lines verbatim.
-
presence_penalty
number min -2 max 2Increases the likelihood of the model introducing new topics.
-
-
1
object-
messages
array requiredAn array of message objects representing the conversation history.
-
items
object-
role
string requiredThe role of the message sender (e.g., 'user', 'assistant', 'system', 'tool').
-
content
string requiredThe content of the message as a string.
-
-
-
functions
array-
items
object-
name
string required -
code
string required
-
-
-
tools
arrayA list of tools available for the assistant to use.
-
items
one of-
0
object-
name
string requiredThe name of the tool. More descriptive the better.
-
description
string requiredA brief description of what the tool does.
-
parameters
object requiredSchema defining the parameters accepted by the tool.
-
type
string requiredThe type of the parameters object (usually 'object').
-
required
arrayList of required parameter names.
-
items
string
-
-
properties
object requiredDefinitions of each parameter.
-
additionalProperties
object-
type
string requiredThe data type of the parameter.
-
description
string requiredA description of the expected parameter.
-
-
-
-
-
1
object-
type
string requiredSpecifies the type of tool (e.g., 'function').
-
function
object requiredDetails of the function tool.
-
name
string requiredThe name of the function.
-
description
string requiredA brief description of what the function does.
-
parameters
object requiredSchema defining the parameters accepted by the function.
-
type
string requiredThe type of the parameters object (usually 'object').
-
required
arrayList of required parameter names.
-
items
string
-
-
properties
object requiredDefinitions of each parameter.
-
additionalProperties
object-
type
string requiredThe data type of the parameter.
-
description
string requiredA description of the expected parameter.
-
-
-
-
-
-
-
-
response_format
object-
type
string -
json_schema
-
-
raw
booleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
-
stream
booleanIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
-
max_tokens
integer default 256The maximum number of tokens to generate in the response.
-
temperature
number default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results.
-
top_p
number min 0.001 max 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
-
top_k
integer min 1 max 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
-
seed
integer min 1 max 9999999999Random seed for reproducibility of the generation.
-
repetition_penalty
number min 0 max 2Penalty for repeated tokens; higher values discourage repetition.
-
frequency_penalty
number min -2 max 2Decreases the likelihood of the model repeating the same lines verbatim.
-
presence_penalty
number min -2 max 2Increases the likelihood of the model introducing new topics.
-
-
-
-
Output
-
0
object-
id
stringUnique identifier for the completion
-
object
stringObject type identifier
-
created
numberUnix timestamp of when the completion was created
-
model
stringModel used for the completion
-
choices
arrayList of completion choices
-
items
object-
index
numberIndex of the choice in the list
-
message
objectThe message generated by the model
-
role
string requiredRole of the message author
-
content
string requiredThe content of the message
-
reasoning_content
stringInternal reasoning content (if available)
-
tool_calls
arrayTool calls made by the assistant
-
items
object-
id
string requiredUnique identifier for the tool call
-
type
string requiredType of tool call
-
function
object required-
name
string requiredName of the function to call
-
arguments
string requiredJSON string of arguments for the function
-
-
-
-
-
finish_reason
stringReason why the model stopped generating
-
stop_reason
stringStop reason (may be null)
-
logprobs
objectLog probabilities (if requested)
-
-
-
usage
objectUsage statistics for the inference request
-
prompt_tokens
number 0Total number of tokens in input
-
completion_tokens
number 0Total number of tokens in output
-
total_tokens
number 0Total number of input and output tokens
-
-
prompt_logprobs
objectLog probabilities for the prompt (if requested)
-
-
1
object-
id
stringUnique identifier for the completion
-
object
stringObject type identifier
-
created
numberUnix timestamp of when the completion was created
-
model
stringModel used for the completion
-
choices
arrayList of completion choices
-
items
object-
index
number requiredIndex of the choice in the list
-
text
string requiredThe generated text completion
-
finish_reason
string requiredReason why the model stopped generating
-
stop_reason
stringStop reason (may be null)
-
logprobs
objectLog probabilities (if requested)
-
prompt_logprobs
objectLog probabilities for the prompt (if requested)
-
-
-
usage
objectUsage statistics for the inference request
-
prompt_tokens
number 0Total number of tokens in input
-
completion_tokens
number 0Total number of tokens in output
-
total_tokens
number 0Total number of input and output tokens
-
-
-
2
string -
3
object-
request_id
stringThe async request id that can be used to obtain the results.
-
API Schemas
The following schemas are based on JSON Schema
{ "type": "object", "oneOf": [ { "title": "Prompt", "properties": { "prompt": { "type": "string", "minLength": 1, "description": "The input text prompt for the model to generate a response." }, "lora": { "type": "string", "description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model." }, "response_format": { "title": "JSON Mode", "type": "object", "properties": { "type": { "type": "string", "enum": [ "json_object", "json_schema" ] }, "json_schema": {} } }, "raw": { "type": "boolean", "default": false, "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting." }, "stream": { "type": "boolean", "default": false, "description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events." }, "max_tokens": { "type": "integer", "default": 2000, "description": "The maximum number of tokens to generate in the response." }, "temperature": { "type": "number", "default": 0.6, "minimum": 0, "maximum": 5, "description": "Controls the randomness of the output; higher values produce more random results." }, "top_p": { "type": "number", "minimum": 0.001, "maximum": 1, "description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses." }, "top_k": { "type": "integer", "minimum": 1, "maximum": 50, "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises." }, "seed": { "type": "integer", "minimum": 1, "maximum": 9999999999, "description": "Random seed for reproducibility of the generation." }, "repetition_penalty": { "type": "number", "minimum": 0, "maximum": 2, "description": "Penalty for repeated tokens; higher values discourage repetition." }, "frequency_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Decreases the likelihood of the model repeating the same lines verbatim." }, "presence_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Increases the likelihood of the model introducing new topics." } }, "required": [ "prompt" ] }, { "title": "Messages", "properties": { "messages": { "type": "array", "description": "An array of message objects representing the conversation history.", "items": { "type": "object", "properties": { "role": { "type": "string", "description": "The role of the message sender (e.g., 'user', 'assistant', 'system', 'tool')." }, "content": { "type": "string", "description": "The content of the message as a string." } }, "required": [ "role", "content" ] } }, "functions": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "code": { "type": "string" } }, "required": [ "name", "code" ] } }, "tools": { "type": "array", "description": "A list of tools available for the assistant to use.", "items": { "type": "object", "oneOf": [ { "properties": { "name": { "type": "string", "description": "The name of the tool. More descriptive the better." }, "description": { "type": "string", "description": "A brief description of what the tool does." }, "parameters": { "type": "object", "description": "Schema defining the parameters accepted by the tool.", "properties": { "type": { "type": "string", "description": "The type of the parameters object (usually 'object')." }, "required": { "type": "array", "description": "List of required parameter names.", "items": { "type": "string" } }, "properties": { "type": "object", "description": "Definitions of each parameter.", "additionalProperties": { "type": "object", "properties": { "type": { "type": "string", "description": "The data type of the parameter." }, "description": { "type": "string", "description": "A description of the expected parameter." } }, "required": [ "type", "description" ] } } }, "required": [ "type", "properties" ] } }, "required": [ "name", "description", "parameters" ] }, { "properties": { "type": { "type": "string", "description": "Specifies the type of tool (e.g., 'function')." }, "function": { "type": "object", "description": "Details of the function tool.", "properties": { "name": { "type": "string", "description": "The name of the function." }, "description": { "type": "string", "description": "A brief description of what the function does." }, "parameters": { "type": "object", "description": "Schema defining the parameters accepted by the function.", "properties": { "type": { "type": "string", "description": "The type of the parameters object (usually 'object')." }, "required": { "type": "array", "description": "List of required parameter names.", "items": { "type": "string" } }, "properties": { "type": "object", "description": "Definitions of each parameter.", "additionalProperties": { "type": "object", "properties": { "type": { "type": "string", "description": "The data type of the parameter." }, "description": { "type": "string", "description": "A description of the expected parameter." } }, "required": [ "type", "description" ] } } }, "required": [ "type", "properties" ] } }, "required": [ "name", "description", "parameters" ] } }, "required": [ "type", "function" ] } ] } }, "response_format": { "title": "JSON Mode", "type": "object", "properties": { "type": { "type": "string", "enum": [ "json_object", "json_schema" ] }, "json_schema": {} } }, "raw": { "type": "boolean", "default": false, "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting." }, "stream": { "type": "boolean", "default": false, "description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events." }, "max_tokens": { "type": "integer", "default": 2000, "description": "The maximum number of tokens to generate in the response." }, "temperature": { "type": "number", "default": 0.6, "minimum": 0, "maximum": 5, "description": "Controls the randomness of the output; higher values produce more random results." }, "top_p": { "type": "number", "minimum": 0.001, "maximum": 1, "description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses." }, "top_k": { "type": "integer", "minimum": 1, "maximum": 50, "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises." }, "seed": { "type": "integer", "minimum": 1, "maximum": 9999999999, "description": "Random seed for reproducibility of the generation." }, "repetition_penalty": { "type": "number", "minimum": 0, "maximum": 2, "description": "Penalty for repeated tokens; higher values discourage repetition." }, "frequency_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Decreases the likelihood of the model repeating the same lines verbatim." }, "presence_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Increases the likelihood of the model introducing new topics." } }, "required": [ "messages" ] }, { "title": "Async Batch", "type": "object", "properties": { "requests": { "type": "array", "items": { "type": "object", "oneOf": [ { "title": "Prompt", "properties": { "prompt": { "type": "string", "minLength": 1, "description": "The input text prompt for the model to generate a response." }, "lora": { "type": "string", "description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model." }, "response_format": { "title": "JSON Mode", "type": "object", "properties": { "type": { "type": "string", "enum": [ "json_object", "json_schema" ] }, "json_schema": {} } }, "raw": { "type": "boolean", "default": false, "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting." }, "stream": { "type": "boolean", "default": false, "description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events." }, "max_tokens": { "type": "integer", "default": 256, "description": "The maximum number of tokens to generate in the response." }, "temperature": { "type": "number", "default": 0.6, "minimum": 0, "maximum": 5, "description": "Controls the randomness of the output; higher values produce more random results." }, "top_p": { "type": "number", "minimum": 0.001, "maximum": 1, "description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses." }, "top_k": { "type": "integer", "minimum": 1, "maximum": 50, "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises." }, "seed": { "type": "integer", "minimum": 1, "maximum": 9999999999, "description": "Random seed for reproducibility of the generation." }, "repetition_penalty": { "type": "number", "minimum": 0, "maximum": 2, "description": "Penalty for repeated tokens; higher values discourage repetition." }, "frequency_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Decreases the likelihood of the model repeating the same lines verbatim." }, "presence_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Increases the likelihood of the model introducing new topics." } }, "required": [ "prompt" ] }, { "title": "Messages", "properties": { "messages": { "type": "array", "description": "An array of message objects representing the conversation history.", "items": { "type": "object", "properties": { "role": { "type": "string", "description": "The role of the message sender (e.g., 'user', 'assistant', 'system', 'tool')." }, "content": { "type": "string", "description": "The content of the message as a string." } }, "required": [ "role", "content" ] } }, "functions": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "code": { "type": "string" } }, "required": [ "name", "code" ] } }, "tools": { "type": "array", "description": "A list of tools available for the assistant to use.", "items": { "type": "object", "oneOf": [ { "properties": { "name": { "type": "string", "description": "The name of the tool. More descriptive the better." }, "description": { "type": "string", "description": "A brief description of what the tool does." }, "parameters": { "type": "object", "description": "Schema defining the parameters accepted by the tool.", "properties": { "type": { "type": "string", "description": "The type of the parameters object (usually 'object')." }, "required": { "type": "array", "description": "List of required parameter names.", "items": { "type": "string" } }, "properties": { "type": "object", "description": "Definitions of each parameter.", "additionalProperties": { "type": "object", "properties": { "type": { "type": "string", "description": "The data type of the parameter." }, "description": { "type": "string", "description": "A description of the expected parameter." } }, "required": [ "type", "description" ] } } }, "required": [ "type", "properties" ] } }, "required": [ "name", "description", "parameters" ] }, { "properties": { "type": { "type": "string", "description": "Specifies the type of tool (e.g., 'function')." }, "function": { "type": "object", "description": "Details of the function tool.", "properties": { "name": { "type": "string", "description": "The name of the function." }, "description": { "type": "string", "description": "A brief description of what the function does." }, "parameters": { "type": "object", "description": "Schema defining the parameters accepted by the function.", "properties": { "type": { "type": "string", "description": "The type of the parameters object (usually 'object')." }, "required": { "type": "array", "description": "List of required parameter names.", "items": { "type": "string" } }, "properties": { "type": "object", "description": "Definitions of each parameter.", "additionalProperties": { "type": "object", "properties": { "type": { "type": "string", "description": "The data type of the parameter." }, "description": { "type": "string", "description": "A description of the expected parameter." } }, "required": [ "type", "description" ] } } }, "required": [ "type", "properties" ] } }, "required": [ "name", "description", "parameters" ] } }, "required": [ "type", "function" ] } ] } }, "response_format": { "title": "JSON Mode", "type": "object", "properties": { "type": { "type": "string", "enum": [ "json_object", "json_schema" ] }, "json_schema": {} } }, "raw": { "type": "boolean", "default": false, "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting." }, "stream": { "type": "boolean", "default": false, "description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events." }, "max_tokens": { "type": "integer", "default": 256, "description": "The maximum number of tokens to generate in the response." }, "temperature": { "type": "number", "default": 0.6, "minimum": 0, "maximum": 5, "description": "Controls the randomness of the output; higher values produce more random results." }, "top_p": { "type": "number", "minimum": 0.001, "maximum": 1, "description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses." }, "top_k": { "type": "integer", "minimum": 1, "maximum": 50, "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises." }, "seed": { "type": "integer", "minimum": 1, "maximum": 9999999999, "description": "Random seed for reproducibility of the generation." }, "repetition_penalty": { "type": "number", "minimum": 0, "maximum": 2, "description": "Penalty for repeated tokens; higher values discourage repetition." }, "frequency_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Decreases the likelihood of the model repeating the same lines verbatim." }, "presence_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Increases the likelihood of the model introducing new topics." } }, "required": [ "messages" ] } ] } } }, "required": [ "requests" ] } ]}
{ "oneOf": [ { "type": "object", "contentType": "application/json", "title": "Chat Completion Response", "properties": { "id": { "type": "string", "description": "Unique identifier for the completion" }, "object": { "type": "string", "enum": [ "chat.completion" ], "description": "Object type identifier" }, "created": { "type": "number", "description": "Unix timestamp of when the completion was created" }, "model": { "type": "string", "description": "Model used for the completion" }, "choices": { "type": "array", "description": "List of completion choices", "items": { "type": "object", "properties": { "index": { "type": "number", "description": "Index of the choice in the list" }, "message": { "type": "object", "description": "The message generated by the model", "properties": { "role": { "type": "string", "description": "Role of the message author" }, "content": { "type": "string", "description": "The content of the message" }, "reasoning_content": { "type": "string", "description": "Internal reasoning content (if available)" }, "tool_calls": { "type": "array", "description": "Tool calls made by the assistant", "items": { "type": "object", "properties": { "id": { "type": "string", "description": "Unique identifier for the tool call" }, "type": { "type": "string", "enum": [ "function" ], "description": "Type of tool call" }, "function": { "type": "object", "properties": { "name": { "type": "string", "description": "Name of the function to call" }, "arguments": { "type": "string", "description": "JSON string of arguments for the function" } }, "required": [ "name", "arguments" ] } }, "required": [ "id", "type", "function" ] } } }, "required": [ "role", "content" ] }, "finish_reason": { "type": "string", "description": "Reason why the model stopped generating" }, "stop_reason": { "type": [ "string", "null" ], "description": "Stop reason (may be null)" }, "logprobs": { "type": [ "object", "null" ], "description": "Log probabilities (if requested)" } } } }, "usage": { "type": "object", "description": "Usage statistics for the inference request", "properties": { "prompt_tokens": { "type": "number", "description": "Total number of tokens in input", "default": 0 }, "completion_tokens": { "type": "number", "description": "Total number of tokens in output", "default": 0 }, "total_tokens": { "type": "number", "description": "Total number of input and output tokens", "default": 0 } } }, "prompt_logprobs": { "type": [ "object", "null" ], "description": "Log probabilities for the prompt (if requested)" } } }, { "type": "object", "contentType": "application/json", "title": "Text Completion Response", "properties": { "id": { "type": "string", "description": "Unique identifier for the completion" }, "object": { "type": "string", "enum": [ "text_completion" ], "description": "Object type identifier" }, "created": { "type": "number", "description": "Unix timestamp of when the completion was created" }, "model": { "type": "string", "description": "Model used for the completion" }, "choices": { "type": "array", "description": "List of completion choices", "items": { "type": "object", "properties": { "index": { "type": "number", "description": "Index of the choice in the list" }, "text": { "type": "string", "description": "The generated text completion" }, "finish_reason": { "type": "string", "description": "Reason why the model stopped generating" }, "stop_reason": { "type": [ "string", "null" ], "description": "Stop reason (may be null)" }, "logprobs": { "type": [ "object", "null" ], "description": "Log probabilities (if requested)" }, "prompt_logprobs": { "type": [ "object", "null" ], "description": "Log probabilities for the prompt (if requested)" } }, "required": [ "index", "text", "finish_reason" ] } }, "usage": { "type": "object", "description": "Usage statistics for the inference request", "properties": { "prompt_tokens": { "type": "number", "description": "Total number of tokens in input", "default": 0 }, "completion_tokens": { "type": "number", "description": "Total number of tokens in output", "default": 0 }, "total_tokens": { "type": "number", "description": "Total number of input and output tokens", "default": 0 } } } } }, { "type": "string", "contentType": "text/event-stream", "format": "binary" }, { "type": "object", "contentType": "application/json", "title": "Async response", "properties": { "request_id": { "type": "string", "description": "The async request id that can be used to obtain the results." } } } ]}
Was this helpful?
- Resources
- API
- New to Cloudflare?
- Directory
- Sponsorships
- Open Source
- Support
- Help Center
- System Status
- Compliance
- GDPR
- Company
- cloudflare.com
- Our team
- Careers
- © 2025 Cloudflare, Inc.
- Privacy Policy
- Terms of Use
- Report Security Issues
- Trademark
-