Skip to content
t

discolm-german-7b-v1-awq Beta

Text GenerationtheblokeHosted

DiscoLM German 7b is a Mistral-based large language model with a focus on German-language applications. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.

Model Info
Deprecated10/1/2025
Context Window4,096 tokens
More informationlink
BetaYes

Playground

Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.

Launch the LLM Playground

Usage

TypeScript
export interface Env {
AI: Ai;
}
export default {
async fetch(request, env): Promise<Response> {
const messages = [
{ role: "system", content: "You are a friendly assistant" },
{
role: "user",
content: "What is the origin of the phrase Hello, World",
},
];
const stream = await env.AI.run("@cf/thebloke/discolm-german-7b-v1-awq", {
messages,
stream: true,
});
return new Response(stream, {
headers: { "content-type": "text/event-stream" },
});
},
} satisfies ExportedHandler<Env>;

Parameters

Input

prompt
stringrequiredminLength: 1The input text prompt for the model to generate a response.
lora
stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
raw
booleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
stream
booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
max_tokens
integerdefault: 256The maximum number of tokens to generate in the response.
temperature
numberdefault: 0.6minimum: 0maximum: 5Controls the randomness of the output; higher values produce more random results.
top_p
numberminimum: 0.001maximum: 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
top_k
integerminimum: 1maximum: 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
seed
integerminimum: 1maximum: 9999999999Random seed for reproducibility of the generation.
repetition_penalty
numberminimum: 0maximum: 2Penalty for repeated tokens; higher values discourage repetition.
frequency_penalty
numberminimum: -2maximum: 2Decreases the likelihood of the model repeating the same lines verbatim.
presence_penalty
numberminimum: -2maximum: 2Increases the likelihood of the model introducing new topics.

Output

Synchronous — Send a request and receive a complete response
response
stringThe generated text response from the model
Streaming — Send a request with `stream: true` and receive server-sent events
type
string
format
binary

API Schemas (Raw)

Synchronous Input
Synchronous Output
Streaming Input
Streaming Output