Skip to content
Cloudflare Docs
l

llava-1.5-7b-hf Beta

Image-to-Textllava-hf
@cf/llava-hf/llava-1.5-7b-hf

LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

Features
BetaYes

Usage

Workers - TypeScript

export interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const res = await fetch("https://cataas.com/cat");
const blob = await res.arrayBuffer();
const input = {
image: [...new Uint8Array(blob)],
prompt: "Generate a caption for this image",
max_tokens: 512,
};
const response = await env.AI.run(
"@cf/llava-hf/llava-1.5-7b-hf",
input
);
return new Response(JSON.stringify(response));
},
} satisfies ExportedHandler<Env>;

Parameters

* indicates a required field

Input

  • 0 string

    Binary string representing the image contents.

  • 1 object

    • temperature number

      Controls the randomness of the output; higher values produce more random results.

    • prompt string

      The input text prompt for the model to generate a response.

    • raw boolean

      If true, a chat template is not applied and you must adhere to the specific model's expected formatting.

    • top_p number

      Controls the creativity of the AI's responses by adjusting how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.

    • top_k number

      Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.

    • seed number

      Random seed for reproducibility of the generation.

    • repetition_penalty number

      Penalty for repeated tokens; higher values discourage repetition.

    • frequency_penalty number

      Decreases the likelihood of the model repeating the same lines verbatim.

    • presence_penalty number

      Increases the likelihood of the model introducing new topics.

    • image * one of

      • 0 array

        An array of integers that represent the image data constrained to 8-bit unsigned integer values

        • items number

          A value between 0 and 255

      • 1 string

        Binary string representing the image contents.

    • max_tokens integer default 512

      The maximum number of tokens to generate in the response.

Output

  • description string

API Schemas

The following schemas are based on JSON Schema

{
"oneOf": [
{
"type": "string",
"format": "binary",
"description": "Binary string representing the image contents."
},
{
"type": "object",
"properties": {
"temperature": {
"type": "number",
"description": "Controls the randomness of the output; higher values produce more random results."
},
"prompt": {
"type": "string",
"description": "The input text prompt for the model to generate a response."
},
"raw": {
"type": "boolean",
"default": false,
"description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."
},
"top_p": {
"type": "number",
"description": "Controls the creativity of the AI's responses by adjusting how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."
},
"top_k": {
"type": "number",
"description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises."
},
"seed": {
"type": "number",
"description": "Random seed for reproducibility of the generation."
},
"repetition_penalty": {
"type": "number",
"description": "Penalty for repeated tokens; higher values discourage repetition."
},
"frequency_penalty": {
"type": "number",
"description": "Decreases the likelihood of the model repeating the same lines verbatim."
},
"presence_penalty": {
"type": "number",
"description": "Increases the likelihood of the model introducing new topics."
},
"image": {
"oneOf": [
{
"type": "array",
"description": "An array of integers that represent the image data constrained to 8-bit unsigned integer values",
"items": {
"type": "number",
"description": "A value between 0 and 255"
}
},
{
"type": "string",
"format": "binary",
"description": "Binary string representing the image contents."
}
]
},
"max_tokens": {
"type": "integer",
"default": 512,
"description": "The maximum number of tokens to generate in the response."
}
},
"required": [
"image"
]
}
]
}