llama-3.1-8b-instruct-fast
Text Generation • Meta • Hosted[Fast version] The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
| Model Info | |
|---|---|
| Context Window ↗ | 128,000 tokens |
| Terms and License | link ↗ |
Playground
Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.
Launch the LLM PlaygroundUsage
export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> {
const messages = [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World", }, ];
const stream = await env.AI.run("@cf/meta/llama-3.1-8b-instruct-fast", { messages, stream: true, });
return new Response(stream, { headers: { "content-type": "text/event-stream" }, }); },} satisfies ExportedHandler<Env>;export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> {
const messages = [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World", }, ]; const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct-fast", { messages });
return Response.json(response); },} satisfies ExportedHandler<Env>;import osimport requests
ACCOUNT_ID = "your-account-id"AUTH_TOKEN = os.environ.get("CLOUDFLARE_AUTH_TOKEN")
prompt = "Tell me all about PEP-8"response = requests.post( f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/meta/llama-3.1-8b-instruct-fast", headers={"Authorization": f"Bearer {AUTH_TOKEN}"}, json={ "messages": [ {"role": "system", "content": "You are a friendly assistant"}, {"role": "user", "content": prompt} ] })result = response.json()print(result)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-3.1-8b-instruct-fast \ -X POST \ -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \ -d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'Parameters
Input
numbermaximum: 2minimum: 0Decreases the likelihood of the model repeating the same lines verbatim.one ofstringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.integerdefault: 256The maximum number of tokens to generate in the response.numbermaximum: 2minimum: 0Increases the likelihood of the model introducing new topics.stringrequiredmaxLength: 131072minLength: 1The input text prompt for the model to generate a response.booleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.numbermaximum: 2minimum: 0Penalty for repeated tokens; higher values discourage repetition.integermaximum: 9999999999minimum: 1Random seed for reproducibility of the generation.booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.numberdefault: 0.6maximum: 5minimum: 0Controls the randomness of the output; higher values produce more random results.integermaximum: 50minimum: 1Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.numbermaximum: 2minimum: 0Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.Output
Synchronous — Send a request and receive a complete response
stringThe generated text response from the modelarrayAn array of tool calls requests made during the response generationStreaming — Send a request with `stream: true` and receive server-sent events
text/event-streambinarystring