Skip to content
MistralAI logo

mistral-small-3.1-24b-instruct

Text GenerationMistralAIHosted

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.

Model Info
Context Window128,000 tokens
Function calling Yes
Unit Pricing$0.35 per M input tokens, $0.56 per M output tokens

Playground

Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.

Launch the LLM Playground

Usage

TypeScript
export interface Env {
AI: Ai;
}
export default {
async fetch(request, env): Promise<Response> {
const messages = [
{ role: "system", content: "You are a friendly assistant" },
{
role: "user",
content: "What is the origin of the phrase Hello, World",
},
];
const stream = await env.AI.run("@cf/mistralai/mistral-small-3.1-24b-instruct", {
messages,
stream: true,
});
return new Response(stream, {
headers: { "content-type": "text/event-stream" },
});
},
} satisfies ExportedHandler<Env>;

Parameters

Input

prompt
stringrequiredminLength: 1The input text prompt for the model to generate a response.
guided_json{}
objectJSON schema that should be fulfilled for the response.
raw
booleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
stream
booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
max_tokens
integerdefault: 256The maximum number of tokens to generate in the response.
temperature
numberdefault: 0.15minimum: 0maximum: 5Controls the randomness of the output; higher values produce more random results.
top_p
numberminimum: 0maximum: 2Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
top_k
integerminimum: 1maximum: 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
seed
integerminimum: 1maximum: 9999999999Random seed for reproducibility of the generation.
repetition_penalty
numberminimum: 0maximum: 2Penalty for repeated tokens; higher values discourage repetition.
frequency_penalty
numberminimum: 0maximum: 2Decreases the likelihood of the model repeating the same lines verbatim.
presence_penalty
numberminimum: 0maximum: 2Increases the likelihood of the model introducing new topics.

Output

Synchronous — Send a request and receive a complete response
response
stringThe generated text response from the model
Streaming — Send a request with `stream: true` and receive server-sent events
type
string
contentType
text/event-stream
format
binary

API Schemas (Raw)

Synchronous Input
Synchronous Output
Streaming Input
Streaming Output