qwen3-30b-a3b-fp8
Text Generation • Qwen • HostedQwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.
| Model Info | |
|---|---|
| Context Window ↗ | 32,768 tokens |
| Function calling ↗ | Yes |
| Reasoning | Yes |
| Batch | Yes |
| Unit Pricing | $0.051 per M input tokens, $0.34 per M output tokens |
Playground
Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.
Launch the LLM PlaygroundUsage
export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> {
const messages = [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World", }, ];
const stream = await env.AI.run("@cf/qwen/qwen3-30b-a3b-fp8", { messages, stream: true, });
return new Response(stream, { headers: { "content-type": "text/event-stream" }, }); },} satisfies ExportedHandler<Env>;export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> {
const messages = [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World", }, ]; const response = await env.AI.run("@cf/qwen/qwen3-30b-a3b-fp8", { messages });
return Response.json(response); },} satisfies ExportedHandler<Env>;import osimport requests
ACCOUNT_ID = "your-account-id"AUTH_TOKEN = os.environ.get("CLOUDFLARE_AUTH_TOKEN")
prompt = "Tell me all about PEP-8"response = requests.post( f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/qwen/qwen3-30b-a3b-fp8", headers={"Authorization": f"Bearer {AUTH_TOKEN}"}, json={ "messages": [ {"role": "system", "content": "You are a friendly assistant"}, {"role": "user", "content": prompt} ] })result = response.json()print(result)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/qwen/qwen3-30b-a3b-fp8 \ -X POST \ -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \ -d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'Parameters
Synchronous — Send a request and receive a complete response
stringrequiredminLength: 1The input text prompt for the model to generate a response.stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.objectbooleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.integerdefault: 2000The maximum number of tokens to generate in the response.numberdefault: 0.6minimum: 0maximum: 5Controls the randomness of the output; higher values produce more random results.numberminimum: 0.001maximum: 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.integerminimum: 1maximum: 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.integerminimum: 1maximum: 9999999999Random seed for reproducibility of the generation.numberminimum: 0maximum: 2Penalty for repeated tokens; higher values discourage repetition.numberminimum: -2maximum: 2Decreases the likelihood of the model repeating the same lines verbatim.numberminimum: -2maximum: 2Increases the likelihood of the model introducing new topics.stringUnique identifier for the completionstringenum: chat.completionObject type identifiernumberUnix timestamp of when the completion was createdstringModel used for the completionarrayList of completion choicesobjectUsage statistics for the inference requestobjectLog probabilities for the prompt (if requested)Streaming — Send a request with `stream: true` and receive server-sent events
stringrequiredminLength: 1The input text prompt for the model to generate a response.stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.objectbooleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.integerdefault: 2000The maximum number of tokens to generate in the response.numberdefault: 0.6minimum: 0maximum: 5Controls the randomness of the output; higher values produce more random results.numberminimum: 0.001maximum: 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.integerminimum: 1maximum: 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.integerminimum: 1maximum: 9999999999Random seed for reproducibility of the generation.numberminimum: 0maximum: 2Penalty for repeated tokens; higher values discourage repetition.numberminimum: -2maximum: 2Decreases the likelihood of the model repeating the same lines verbatim.numberminimum: -2maximum: 2Increases the likelihood of the model introducing new topics.stringtext/event-streambinaryBatch — Send multiple requests in a single API call
arrayrequiredstringUnique identifier for the completionstringenum: chat.completionObject type identifiernumberUnix timestamp of when the completion was createdstringModel used for the completionarrayList of completion choicesobjectUsage statistics for the inference requestobjectLog probabilities for the prompt (if requested)API Schemas (Raw)
Synchronous — Send a request and receive a complete response
{ "title": "Prompt", "properties": { "prompt": { "type": "string", "minLength": 1, "description": "The input text prompt for the model to generate a response." }, "lora": { "type": "string", "description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model." }, "response_format": { "title": "JSON Mode", "type": "object", "properties": { "type": { "type": "string", "enum": [ "json_object", "json_schema" ] }, "json_schema": {} } }, "raw": { "type": "boolean", "default": false, "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting." }, "stream": { "type": "boolean", "default": false, "description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events." }, "max_tokens": { "type": "integer", "default": 2000, "description": "The maximum number of tokens to generate in the response." }, "temperature": { "type": "number", "default": 0.6, "minimum": 0, "maximum": 5, "description": "Controls the randomness of the output; higher values produce more random results." }, "top_p": { "type": "number", "minimum": 0.001, "maximum": 1, "description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses." }, "top_k": { "type": "integer", "minimum": 1, "maximum": 50, "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises." }, "seed": { "type": "integer", "minimum": 1, "maximum": 9999999999, "description": "Random seed for reproducibility of the generation." }, "repetition_penalty": { "type": "number", "minimum": 0, "maximum": 2, "description": "Penalty for repeated tokens; higher values discourage repetition." }, "frequency_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Decreases the likelihood of the model repeating the same lines verbatim." }, "presence_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Increases the likelihood of the model introducing new topics." } }, "required": [ "prompt" ]}{ "type": "object", "contentType": "application/json", "title": "Chat Completion Response", "properties": { "id": { "type": "string", "description": "Unique identifier for the completion" }, "object": { "type": "string", "enum": [ "chat.completion" ], "description": "Object type identifier" }, "created": { "type": "number", "description": "Unix timestamp of when the completion was created" }, "model": { "type": "string", "description": "Model used for the completion" }, "choices": { "type": "array", "description": "List of completion choices", "items": { "type": "object", "properties": { "index": { "type": "number", "description": "Index of the choice in the list" }, "message": { "type": "object", "description": "The message generated by the model", "properties": { "role": { "type": "string", "description": "Role of the message author" }, "content": { "type": "string", "description": "The content of the message" }, "reasoning_content": { "type": "string", "description": "Internal reasoning content (if available)" }, "tool_calls": { "type": "array", "description": "Tool calls made by the assistant", "items": { "type": "object", "properties": { "id": { "type": "string", "description": "Unique identifier for the tool call" }, "type": { "type": "string", "enum": [ "function" ], "description": "Type of tool call" }, "function": { "type": "object", "properties": { "name": { "type": "string", "description": "Name of the function to call" }, "arguments": { "type": "string", "description": "JSON string of arguments for the function" } }, "required": [ "name", "arguments" ] } }, "required": [ "id", "type", "function" ] } } }, "required": [ "role", "content" ] }, "finish_reason": { "type": "string", "description": "Reason why the model stopped generating" }, "stop_reason": { "type": [ "string", "null" ], "description": "Stop reason (may be null)" }, "logprobs": { "type": [ "object", "null" ], "description": "Log probabilities (if requested)" } } } }, "usage": { "type": "object", "description": "Usage statistics for the inference request", "properties": { "prompt_tokens": { "type": "number", "description": "Total number of tokens in input", "default": 0 }, "completion_tokens": { "type": "number", "description": "Total number of tokens in output", "default": 0 }, "total_tokens": { "type": "number", "description": "Total number of input and output tokens", "default": 0 } } }, "prompt_logprobs": { "type": [ "object", "null" ], "description": "Log probabilities for the prompt (if requested)" } }}Streaming — Send a request with `stream: true` and receive server-sent events
{ "title": "Prompt", "properties": { "prompt": { "type": "string", "minLength": 1, "description": "The input text prompt for the model to generate a response." }, "lora": { "type": "string", "description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model." }, "response_format": { "title": "JSON Mode", "type": "object", "properties": { "type": { "type": "string", "enum": [ "json_object", "json_schema" ] }, "json_schema": {} } }, "raw": { "type": "boolean", "default": false, "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting." }, "stream": { "type": "boolean", "default": false, "description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events." }, "max_tokens": { "type": "integer", "default": 2000, "description": "The maximum number of tokens to generate in the response." }, "temperature": { "type": "number", "default": 0.6, "minimum": 0, "maximum": 5, "description": "Controls the randomness of the output; higher values produce more random results." }, "top_p": { "type": "number", "minimum": 0.001, "maximum": 1, "description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses." }, "top_k": { "type": "integer", "minimum": 1, "maximum": 50, "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises." }, "seed": { "type": "integer", "minimum": 1, "maximum": 9999999999, "description": "Random seed for reproducibility of the generation." }, "repetition_penalty": { "type": "number", "minimum": 0, "maximum": 2, "description": "Penalty for repeated tokens; higher values discourage repetition." }, "frequency_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Decreases the likelihood of the model repeating the same lines verbatim." }, "presence_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Increases the likelihood of the model introducing new topics." } }, "required": [ "prompt" ]}{ "type": "string", "contentType": "text/event-stream", "format": "binary"}Batch — Send multiple requests in a single API call
{ "title": "Async Batch", "type": "object", "properties": { "requests": { "type": "array", "items": { "type": "object", "oneOf": [ { "title": "Prompt", "properties": { "prompt": { "type": "string", "minLength": 1, "description": "The input text prompt for the model to generate a response." }, "lora": { "type": "string", "description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model." }, "response_format": { "title": "JSON Mode", "type": "object", "properties": { "type": { "type": "string", "enum": [ "json_object", "json_schema" ] }, "json_schema": {} } }, "raw": { "type": "boolean", "default": false, "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting." }, "stream": { "type": "boolean", "default": false, "description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events." }, "max_tokens": { "type": "integer", "default": 256, "description": "The maximum number of tokens to generate in the response." }, "temperature": { "type": "number", "default": 0.6, "minimum": 0, "maximum": 5, "description": "Controls the randomness of the output; higher values produce more random results." }, "top_p": { "type": "number", "minimum": 0.001, "maximum": 1, "description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses." }, "top_k": { "type": "integer", "minimum": 1, "maximum": 50, "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises." }, "seed": { "type": "integer", "minimum": 1, "maximum": 9999999999, "description": "Random seed for reproducibility of the generation." }, "repetition_penalty": { "type": "number", "minimum": 0, "maximum": 2, "description": "Penalty for repeated tokens; higher values discourage repetition." }, "frequency_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Decreases the likelihood of the model repeating the same lines verbatim." }, "presence_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Increases the likelihood of the model introducing new topics." } }, "required": [ "prompt" ] }, { "title": "Messages", "properties": { "messages": { "type": "array", "description": "An array of message objects representing the conversation history.", "items": { "type": "object", "properties": { "role": { "type": "string", "description": "The role of the message sender (e.g., 'user', 'assistant', 'system', 'tool')." }, "content": { "oneOf": [ { "type": "string", "description": "The content of the message as a string." }, { "type": "array", "description": "Array of text content parts.", "items": { "type": "object", "properties": { "type": { "type": "string", "description": "Type of the content (text)" }, "text": { "type": "string", "description": "Text content" } } } } ] } }, "required": [ "role", "content" ] } }, "functions": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "code": { "type": "string" } }, "required": [ "name", "code" ] } }, "tools": { "type": "array", "description": "A list of tools available for the assistant to use.", "items": { "type": "object", "oneOf": [ { "properties": { "name": { "type": "string", "description": "The name of the tool. More descriptive the better." }, "description": { "type": "string", "description": "A brief description of what the tool does." }, "parameters": { "type": "object", "description": "Schema defining the parameters accepted by the tool.", "properties": { "type": { "type": "string", "description": "The type of the parameters object (usually 'object')." }, "required": { "type": "array", "description": "List of required parameter names.", "items": { "type": "string" } }, "properties": { "type": "object", "description": "Definitions of each parameter.", "additionalProperties": { "type": "object", "properties": { "type": { "type": "string", "description": "The data type of the parameter." }, "description": { "type": "string", "description": "A description of the expected parameter." } }, "required": [ "type", "description" ] } } }, "required": [ "type", "properties" ] } }, "required": [ "name", "description", "parameters" ] }, { "properties": { "type": { "type": "string", "description": "Specifies the type of tool (e.g., 'function')." }, "function": { "type": "object", "description": "Details of the function tool.", "properties": { "name": { "type": "string", "description": "The name of the function." }, "description": { "type": "string", "description": "A brief description of what the function does." }, "parameters": { "type": "object", "description": "Schema defining the parameters accepted by the function.", "properties": { "type": { "type": "string", "description": "The type of the parameters object (usually 'object')." }, "required": { "type": "array", "description": "List of required parameter names.", "items": { "type": "string" } }, "properties": { "type": "object", "description": "Definitions of each parameter.", "additionalProperties": { "type": "object", "properties": { "type": { "type": "string", "description": "The data type of the parameter." }, "description": { "type": "string", "description": "A description of the expected parameter." } }, "required": [ "type", "description" ] } } }, "required": [ "type", "properties" ] } }, "required": [ "name", "description", "parameters" ] } }, "required": [ "type", "function" ] } ] } }, "response_format": { "title": "JSON Mode", "type": "object", "properties": { "type": { "type": "string", "enum": [ "json_object", "json_schema" ] }, "json_schema": {} } }, "raw": { "type": "boolean", "default": false, "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting." }, "stream": { "type": "boolean", "default": false, "description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events." }, "max_tokens": { "type": "integer", "default": 256, "description": "The maximum number of tokens to generate in the response." }, "temperature": { "type": "number", "default": 0.6, "minimum": 0, "maximum": 5, "description": "Controls the randomness of the output; higher values produce more random results." }, "top_p": { "type": "number", "minimum": 0.001, "maximum": 1, "description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses." }, "top_k": { "type": "integer", "minimum": 1, "maximum": 50, "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises." }, "seed": { "type": "integer", "minimum": 1, "maximum": 9999999999, "description": "Random seed for reproducibility of the generation." }, "repetition_penalty": { "type": "number", "minimum": 0, "maximum": 2, "description": "Penalty for repeated tokens; higher values discourage repetition." }, "frequency_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Decreases the likelihood of the model repeating the same lines verbatim." }, "presence_penalty": { "type": "number", "minimum": -2, "maximum": 2, "description": "Increases the likelihood of the model introducing new topics." } }, "required": [ "messages" ] } ] } } }, "required": [ "requests" ]}{ "type": "object", "contentType": "application/json", "title": "Chat Completion Response", "properties": { "id": { "type": "string", "description": "Unique identifier for the completion" }, "object": { "type": "string", "enum": [ "chat.completion" ], "description": "Object type identifier" }, "created": { "type": "number", "description": "Unix timestamp of when the completion was created" }, "model": { "type": "string", "description": "Model used for the completion" }, "choices": { "type": "array", "description": "List of completion choices", "items": { "type": "object", "properties": { "index": { "type": "number", "description": "Index of the choice in the list" }, "message": { "type": "object", "description": "The message generated by the model", "properties": { "role": { "type": "string", "description": "Role of the message author" }, "content": { "type": "string", "description": "The content of the message" }, "reasoning_content": { "type": "string", "description": "Internal reasoning content (if available)" }, "tool_calls": { "type": "array", "description": "Tool calls made by the assistant", "items": { "type": "object", "properties": { "id": { "type": "string", "description": "Unique identifier for the tool call" }, "type": { "type": "string", "enum": [ "function" ], "description": "Type of tool call" }, "function": { "type": "object", "properties": { "name": { "type": "string", "description": "Name of the function to call" }, "arguments": { "type": "string", "description": "JSON string of arguments for the function" } }, "required": [ "name", "arguments" ] } }, "required": [ "id", "type", "function" ] } } }, "required": [ "role", "content" ] }, "finish_reason": { "type": "string", "description": "Reason why the model stopped generating" }, "stop_reason": { "type": [ "string", "null" ], "description": "Stop reason (may be null)" }, "logprobs": { "type": [ "object", "null" ], "description": "Log probabilities (if requested)" } } } }, "usage": { "type": "object", "description": "Usage statistics for the inference request", "properties": { "prompt_tokens": { "type": "number", "description": "Total number of tokens in input", "default": 0 }, "completion_tokens": { "type": "number", "description": "Total number of tokens in output", "default": 0 }, "total_tokens": { "type": "number", "description": "Total number of input and output tokens", "default": 0 } } }, "prompt_logprobs": { "type": [ "object", "null" ], "description": "Log probabilities for the prompt (if requested)" } }}