Skip to content
Meta logo

llama-3.1-8b-instruct-fast

Text GenerationMetaHosted

[Fast version] The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Model Info
Context Window128,000 tokens
Terms and Licenselink

Playground

Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.

Launch the LLM Playground

Usage

TypeScript
export interface Env {
AI: Ai;
}
export default {
async fetch(request, env): Promise<Response> {
const messages = [
{ role: "system", content: "You are a friendly assistant" },
{
role: "user",
content: "What is the origin of the phrase Hello, World",
},
];
const stream = await env.AI.run("@cf/meta/llama-3.1-8b-instruct-fast", {
messages,
stream: true,
});
return new Response(stream, {
headers: { "content-type": "text/event-stream" },
});
},
} satisfies ExportedHandler<Env>;

Parameters

Synchronous — Send a request and receive a complete response
frequency_penalty
numbermaximum: 2minimum: 0Decreases the likelihood of the model repeating the same lines verbatim.
lora
stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
max_tokens
integerdefault: 256The maximum number of tokens to generate in the response.
presence_penalty
numbermaximum: 2minimum: 0Increases the likelihood of the model introducing new topics.
prompt
stringrequiredmaxLength: 131072minLength: 1The input text prompt for the model to generate a response.
raw
booleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
repetition_penalty
numbermaximum: 2minimum: 0Penalty for repeated tokens; higher values discourage repetition.
seed
integermaximum: 9999999999minimum: 1Random seed for reproducibility of the generation.
stream
booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
temperature
numberdefault: 0.6maximum: 5minimum: 0Controls the randomness of the output; higher values produce more random results.
top_k
integermaximum: 50minimum: 1Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
top_p
numbermaximum: 2minimum: 0Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
Streaming — Send a request with `stream: true` and receive server-sent events
frequency_penalty
numbermaximum: 2minimum: 0Decreases the likelihood of the model repeating the same lines verbatim.
lora
stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
max_tokens
integerdefault: 256The maximum number of tokens to generate in the response.
presence_penalty
numbermaximum: 2minimum: 0Increases the likelihood of the model introducing new topics.
prompt
stringrequiredmaxLength: 131072minLength: 1The input text prompt for the model to generate a response.
raw
booleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
repetition_penalty
numbermaximum: 2minimum: 0Penalty for repeated tokens; higher values discourage repetition.
seed
integermaximum: 9999999999minimum: 1Random seed for reproducibility of the generation.
stream
booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
temperature
numberdefault: 0.6maximum: 5minimum: 0Controls the randomness of the output; higher values produce more random results.
top_k
integermaximum: 50minimum: 1Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
top_p
numbermaximum: 2minimum: 0Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.

API Schemas (Raw)

Synchronous — Send a request and receive a complete response
{
"properties": {
"frequency_penalty": {
"description": "Decreases the likelihood of the model repeating the same lines verbatim.",
"maximum": 2,
"minimum": 0,
"type": "number"
},
"image": {
"oneOf": [
{
"description": "An array of integers that represent the image data constrained to 8-bit unsigned integer values",
"items": {
"description": "A value between 0 and 255",
"type": "number"
},
"type": "array"
},
{
"description": "Binary string representing the image contents.",
"format": "binary",
"type": "string"
}
]
},
"lora": {
"description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.",
"type": "string"
},
"max_tokens": {
"default": 256,
"description": "The maximum number of tokens to generate in the response.",
"type": "integer"
},
"presence_penalty": {
"description": "Increases the likelihood of the model introducing new topics.",
"maximum": 2,
"minimum": 0,
"type": "number"
},
"prompt": {
"description": "The input text prompt for the model to generate a response.",
"maxLength": 131072,
"minLength": 1,
"type": "string"
},
"raw": {
"default": false,
"description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting.",
"type": "boolean"
},
"repetition_penalty": {
"description": "Penalty for repeated tokens; higher values discourage repetition.",
"maximum": 2,
"minimum": 0,
"type": "number"
},
"seed": {
"description": "Random seed for reproducibility of the generation.",
"maximum": 9999999999,
"minimum": 1,
"type": "integer"
},
"stream": {
"default": false,
"description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events.",
"type": "boolean"
},
"temperature": {
"default": 0.6,
"description": "Controls the randomness of the output; higher values produce more random results.",
"maximum": 5,
"minimum": 0,
"type": "number"
},
"top_k": {
"description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.",
"maximum": 50,
"minimum": 1,
"type": "integer"
},
"top_p": {
"description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.",
"maximum": 2,
"minimum": 0,
"type": "number"
}
},
"required": [
"prompt"
],
"title": "Prompt"
}
Streaming — Send a request with `stream: true` and receive server-sent events
{
"properties": {
"frequency_penalty": {
"description": "Decreases the likelihood of the model repeating the same lines verbatim.",
"maximum": 2,
"minimum": 0,
"type": "number"
},
"image": {
"oneOf": [
{
"description": "An array of integers that represent the image data constrained to 8-bit unsigned integer values",
"items": {
"description": "A value between 0 and 255",
"type": "number"
},
"type": "array"
},
{
"description": "Binary string representing the image contents.",
"format": "binary",
"type": "string"
}
]
},
"lora": {
"description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.",
"type": "string"
},
"max_tokens": {
"default": 256,
"description": "The maximum number of tokens to generate in the response.",
"type": "integer"
},
"presence_penalty": {
"description": "Increases the likelihood of the model introducing new topics.",
"maximum": 2,
"minimum": 0,
"type": "number"
},
"prompt": {
"description": "The input text prompt for the model to generate a response.",
"maxLength": 131072,
"minLength": 1,
"type": "string"
},
"raw": {
"default": false,
"description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting.",
"type": "boolean"
},
"repetition_penalty": {
"description": "Penalty for repeated tokens; higher values discourage repetition.",
"maximum": 2,
"minimum": 0,
"type": "number"
},
"seed": {
"description": "Random seed for reproducibility of the generation.",
"maximum": 9999999999,
"minimum": 1,
"type": "integer"
},
"stream": {
"default": false,
"description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events.",
"type": "boolean"
},
"temperature": {
"default": 0.6,
"description": "Controls the randomness of the output; higher values produce more random results.",
"maximum": 5,
"minimum": 0,
"type": "number"
},
"top_k": {
"description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.",
"maximum": 50,
"minimum": 1,
"type": "integer"
},
"top_p": {
"description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.",
"maximum": 2,
"minimum": 0,
"type": "number"
}
},
"required": [
"prompt"
],
"title": "Prompt"
}