Skip to content
OpenAI logo

gpt-oss-20b

Text GenerationOpenAIHosted

OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-20b is for lower latency, and local or specialized use-cases.

Model Info
Context Window128,000 tokens
Function calling Yes
ReasoningYes
Unit Pricing$0.20 per M input tokens, $0.30 per M output tokens

Usage

export default {
async fetch(request, env): Promise<Response> {
const response = await env.AI.run('@cf/openai/gpt-oss-20b', {
instructions: 'You are a concise assistant.',
input: 'What is the origin of the phrase Hello, World?',
});
return Response.json(response);
},
} satisfies ExportedHandler<Env>;

Parameters

Synchronous — Send a request and receive a complete response
Input format
prompt
stringrequiredminLength: 1The input text prompt for the model to generate a response.
lora
stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
raw
booleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
stream
booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
max_tokens
integerdefault: 256The maximum number of tokens to generate in the response.
temperature
numberdefault: 0.6minimum: 0maximum: 5Controls the randomness of the output; higher values produce more random results.
top_p
numberminimum: 0.001maximum: 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
top_k
integerminimum: 1maximum: 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
seed
integerminimum: 1maximum: 9999999999Random seed for reproducibility of the generation.
repetition_penalty
numberminimum: 0maximum: 2Penalty for repeated tokens; higher values discourage repetition.
frequency_penalty
numberminimum: -2maximum: 2Decreases the likelihood of the model repeating the same lines verbatim.
presence_penalty
numberminimum: -2maximum: 2Increases the likelihood of the model introducing new topics.
Streaming — Send a request with `stream: true` and receive server-sent events
Input format
prompt
stringrequiredminLength: 1The input text prompt for the model to generate a response.
lora
stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
raw
booleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
stream
booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
max_tokens
integerdefault: 256The maximum number of tokens to generate in the response.
temperature
numberdefault: 0.6minimum: 0maximum: 5Controls the randomness of the output; higher values produce more random results.
top_p
numberminimum: 0.001maximum: 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
top_k
integerminimum: 1maximum: 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
seed
integerminimum: 1maximum: 9999999999Random seed for reproducibility of the generation.
repetition_penalty
numberminimum: 0maximum: 2Penalty for repeated tokens; higher values discourage repetition.
frequency_penalty
numberminimum: -2maximum: 2Decreases the likelihood of the model repeating the same lines verbatim.
presence_penalty
numberminimum: -2maximum: 2Increases the likelihood of the model introducing new topics.
Batch — Send multiple requests in a single API call

API Schemas (Raw)

Synchronous — Send a request and receive a complete response
{
"type": "object",
"oneOf": [
{
"title": "Prompt",
"properties": {
"prompt": {
"type": "string",
"minLength": 1,
"description": "The input text prompt for the model to generate a response."
},
"lora": {
"type": "string",
"description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model."
},
"response_format": {
"title": "JSON Mode",
"type": "object",
"properties": {
"type": {
"type": "string",
"enum": [
"json_object",
"json_schema"
]
},
"json_schema": {}
}
},
"raw": {
"type": "boolean",
"default": false,
"description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."
},
"stream": {
"type": "boolean",
"default": false,
"description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events."
},
"max_tokens": {
"type": "integer",
"default": 256,
"description": "The maximum number of tokens to generate in the response."
},
"temperature": {
"type": "number",
"default": 0.6,
"minimum": 0,
"maximum": 5,
"description": "Controls the randomness of the output; higher values produce more random results."
},
"top_p": {
"type": "number",
"minimum": 0.001,
"maximum": 1,
"description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."
},
"top_k": {
"type": "integer",
"minimum": 1,
"maximum": 50,
"description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises."
},
"seed": {
"type": "integer",
"minimum": 1,
"maximum": 9999999999,
"description": "Random seed for reproducibility of the generation."
},
"repetition_penalty": {
"type": "number",
"minimum": 0,
"maximum": 2,
"description": "Penalty for repeated tokens; higher values discourage repetition."
},
"frequency_penalty": {
"type": "number",
"minimum": -2,
"maximum": 2,
"description": "Decreases the likelihood of the model repeating the same lines verbatim."
},
"presence_penalty": {
"type": "number",
"minimum": -2,
"maximum": 2,
"description": "Increases the likelihood of the model introducing new topics."
}
},
"required": [
"prompt"
]
},
{
"title": "Messages",
"properties": {
"messages": {
"type": "array",
"description": "An array of message objects representing the conversation history.",
"items": {
"type": "object",
"properties": {
"role": {
"type": "string",
"description": "The role of the message sender (e.g., 'user', 'assistant', 'system', 'tool')."
},
"content": {
"oneOf": [
{
"type": "string",
"description": "The content of the message as a string."
},
{
"type": "array",
"description": "Array of text content parts.",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string",
"description": "Type of the content (text)"
},
"text": {
"type": "string",
"description": "Text content"
}
}
}
}
]
}
},
"required": [
"role",
"content"
]
}
},
"functions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"code": {
"type": "string"
}
},
"required": [
"name",
"code"
]
}
},
"tools": {
"type": "array",
"description": "A list of tools available for the assistant to use.",
"items": {
"type": "object",
"oneOf": [
{
"properties": {
"name": {
"type": "string",
"description": "The name of the tool. More descriptive the better."
},
"description": {
"type": "string",
"description": "A brief description of what the tool does."
},
"parameters": {
"type": "object",
"description": "Schema defining the parameters accepted by the tool.",
"properties": {
"type": {
"type": "string",
"description": "The type of the parameters object (usually 'object')."
},
"required": {
"type": "array",
"description": "List of required parameter names.",
"items": {
"type": "string"
}
},
"properties": {
"type": "object",
"description": "Definitions of each parameter.",
"additionalProperties": {
"type": "object",
"properties": {
"type": {
"type": "string",
"description": "The data type of the parameter."
},
"description": {
"type": "string",
"description": "A description of the expected parameter."
}
},
"required": [
"type",
"description"
]
}
}
},
"required": [
"type",
"properties"
]
}
},
"required": [
"name",
"description",
"parameters"
]
},
{
"properties": {
"type": {
"type": "string",
"description": "Specifies the type of tool (e.g., 'function')."
},
"function": {
"type": "object",
"description": "Details of the function tool.",
"properties": {
"name": {
"type": "string",
"description": "The name of the function."
},
"description": {
"type": "string",
"description": "A brief description of what the function does."
},
"parameters": {
"type": "object",
"description": "Schema defining the parameters accepted by the function.",
"properties": {
"type": {
"type": "string",
"description": "The type of the parameters object (usually 'object')."
},
"required": {
"type": "array",
"description": "List of required parameter names.",
"items": {
"type": "string"
}
},
"properties": {
"type": "object",
"description": "Definitions of each parameter.",
"additionalProperties": {
"type": "object",
"properties": {
"type": {
"type": "string",
"description": "The data type of the parameter."
},
"description": {
"type": "string",
"description": "A description of the expected parameter."
}
},
"required": [
"type",
"description"
]
}
}
},
"required": [
"type",
"properties"
]
}
},
"required": [
"name",
"description",
"parameters"
]
}
},
"required": [
"type",
"function"
]
}
]
}
},
"response_format": {
"title": "JSON Mode",
"type": "object",
"properties": {
"type": {
"type": "string",
"enum": [
"json_object",
"json_schema"
]
},
"json_schema": {}
}
},
"raw": {
"type": "boolean",
"default": false,
"description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."
},
"stream": {
"type": "boolean",
"default": false,
"description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events."
},
"max_tokens": {
"type": "integer",
"default": 256,
"description": "The maximum number of tokens to generate in the response."
},
"temperature": {
"type": "number",
"default": 0.6,
"minimum": 0,
"maximum": 5,
"description": "Controls the randomness of the output; higher values produce more random results."
},
"top_p": {
"type": "number",
"minimum": 0.001,
"maximum": 1,
"description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."
},
"top_k": {
"type": "integer",
"minimum": 1,
"maximum": 50,
"description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises."
},
"seed": {
"type": "integer",
"minimum": 1,
"maximum": 9999999999,
"description": "Random seed for reproducibility of the generation."
},
"repetition_penalty": {
"type": "number",
"minimum": 0,
"maximum": 2,
"description": "Penalty for repeated tokens; higher values discourage repetition."
},
"frequency_penalty": {
"type": "number",
"minimum": -2,
"maximum": 2,
"description": "Decreases the likelihood of the model repeating the same lines verbatim."
},
"presence_penalty": {
"type": "number",
"minimum": -2,
"maximum": 2,
"description": "Increases the likelihood of the model introducing new topics."
}
},
"required": [
"messages"
]
}
]
}
Streaming — Send a request with `stream: true` and receive server-sent events
{
"type": "object",
"oneOf": [
{
"title": "Prompt",
"properties": {
"prompt": {
"type": "string",
"minLength": 1,
"description": "The input text prompt for the model to generate a response."
},
"lora": {
"type": "string",
"description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model."
},
"response_format": {
"title": "JSON Mode",
"type": "object",
"properties": {
"type": {
"type": "string",
"enum": [
"json_object",
"json_schema"
]
},
"json_schema": {}
}
},
"raw": {
"type": "boolean",
"default": false,
"description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."
},
"stream": {
"type": "boolean",
"default": false,
"description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events."
},
"max_tokens": {
"type": "integer",
"default": 256,
"description": "The maximum number of tokens to generate in the response."
},
"temperature": {
"type": "number",
"default": 0.6,
"minimum": 0,
"maximum": 5,
"description": "Controls the randomness of the output; higher values produce more random results."
},
"top_p": {
"type": "number",
"minimum": 0.001,
"maximum": 1,
"description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."
},
"top_k": {
"type": "integer",
"minimum": 1,
"maximum": 50,
"description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises."
},
"seed": {
"type": "integer",
"minimum": 1,
"maximum": 9999999999,
"description": "Random seed for reproducibility of the generation."
},
"repetition_penalty": {
"type": "number",
"minimum": 0,
"maximum": 2,
"description": "Penalty for repeated tokens; higher values discourage repetition."
},
"frequency_penalty": {
"type": "number",
"minimum": -2,
"maximum": 2,
"description": "Decreases the likelihood of the model repeating the same lines verbatim."
},
"presence_penalty": {
"type": "number",
"minimum": -2,
"maximum": 2,
"description": "Increases the likelihood of the model introducing new topics."
}
},
"required": [
"prompt"
]
},
{
"title": "Messages",
"properties": {
"messages": {
"type": "array",
"description": "An array of message objects representing the conversation history.",
"items": {
"type": "object",
"properties": {
"role": {
"type": "string",
"description": "The role of the message sender (e.g., 'user', 'assistant', 'system', 'tool')."
},
"content": {
"oneOf": [
{
"type": "string",
"description": "The content of the message as a string."
},
{
"type": "array",
"description": "Array of text content parts.",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string",
"description": "Type of the content (text)"
},
"text": {
"type": "string",
"description": "Text content"
}
}
}
}
]
}
},
"required": [
"role",
"content"
]
}
},
"functions": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"code": {
"type": "string"
}
},
"required": [
"name",
"code"
]
}
},
"tools": {
"type": "array",
"description": "A list of tools available for the assistant to use.",
"items": {
"type": "object",
"oneOf": [
{
"properties": {
"name": {
"type": "string",
"description": "The name of the tool. More descriptive the better."
},
"description": {
"type": "string",
"description": "A brief description of what the tool does."
},
"parameters": {
"type": "object",
"description": "Schema defining the parameters accepted by the tool.",
"properties": {
"type": {
"type": "string",
"description": "The type of the parameters object (usually 'object')."
},
"required": {
"type": "array",
"description": "List of required parameter names.",
"items": {
"type": "string"
}
},
"properties": {
"type": "object",
"description": "Definitions of each parameter.",
"additionalProperties": {
"type": "object",
"properties": {
"type": {
"type": "string",
"description": "The data type of the parameter."
},
"description": {
"type": "string",
"description": "A description of the expected parameter."
}
},
"required": [
"type",
"description"
]
}
}
},
"required": [
"type",
"properties"
]
}
},
"required": [
"name",
"description",
"parameters"
]
},
{
"properties": {
"type": {
"type": "string",
"description": "Specifies the type of tool (e.g., 'function')."
},
"function": {
"type": "object",
"description": "Details of the function tool.",
"properties": {
"name": {
"type": "string",
"description": "The name of the function."
},
"description": {
"type": "string",
"description": "A brief description of what the function does."
},
"parameters": {
"type": "object",
"description": "Schema defining the parameters accepted by the function.",
"properties": {
"type": {
"type": "string",
"description": "The type of the parameters object (usually 'object')."
},
"required": {
"type": "array",
"description": "List of required parameter names.",
"items": {
"type": "string"
}
},
"properties": {
"type": "object",
"description": "Definitions of each parameter.",
"additionalProperties": {
"type": "object",
"properties": {
"type": {
"type": "string",
"description": "The data type of the parameter."
},
"description": {
"type": "string",
"description": "A description of the expected parameter."
}
},
"required": [
"type",
"description"
]
}
}
},
"required": [
"type",
"properties"
]
}
},
"required": [
"name",
"description",
"parameters"
]
}
},
"required": [
"type",
"function"
]
}
]
}
},
"response_format": {
"title": "JSON Mode",
"type": "object",
"properties": {
"type": {
"type": "string",
"enum": [
"json_object",
"json_schema"
]
},
"json_schema": {}
}
},
"raw": {
"type": "boolean",
"default": false,
"description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."
},
"stream": {
"type": "boolean",
"default": false,
"description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events."
},
"max_tokens": {
"type": "integer",
"default": 256,
"description": "The maximum number of tokens to generate in the response."
},
"temperature": {
"type": "number",
"default": 0.6,
"minimum": 0,
"maximum": 5,
"description": "Controls the randomness of the output; higher values produce more random results."
},
"top_p": {
"type": "number",
"minimum": 0.001,
"maximum": 1,
"description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."
},
"top_k": {
"type": "integer",
"minimum": 1,
"maximum": 50,
"description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises."
},
"seed": {
"type": "integer",
"minimum": 1,
"maximum": 9999999999,
"description": "Random seed for reproducibility of the generation."
},
"repetition_penalty": {
"type": "number",
"minimum": 0,
"maximum": 2,
"description": "Penalty for repeated tokens; higher values discourage repetition."
},
"frequency_penalty": {
"type": "number",
"minimum": -2,
"maximum": 2,
"description": "Decreases the likelihood of the model repeating the same lines verbatim."
},
"presence_penalty": {
"type": "number",
"minimum": -2,
"maximum": 2,
"description": "Increases the likelihood of the model introducing new topics."
}
},
"required": [
"messages"
]
}
]
}
Batch — Send multiple requests in a single API call
{
"type": "object",
"title": "Responses_Async",
"properties": {
"requests": {
"type": "array",
"items": {
"type": "object",
"properties": {
"input": {
"anyOf": [
{
"type": "string"
},
{
"items": {},
"type": "array"
}
],
"description": "Responses API Input messages. Refer to OpenAI Responses API docs to learn more about supported content types"
},
"reasoning": {
"type": "object",
"properties": {
"effort": {
"type": "string",
"description": "Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.",
"enum": [
"low",
"medium",
"high"
]
},
"summary": {
"type": "string",
"description": "A summary of the reasoning performed by the model. This can be useful for debugging and understanding the model's reasoning process. One of auto, concise, or detailed.",
"enum": [
"auto",
"concise",
"detailed"
]
}
}
}
},
"required": [
"input"
]
}
}
},
"required": [
"requests"
]
}