uform-gen2-qwen-500m Beta
Image-to-Text • unumUForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.
Usage
Workers - TypeScript
export interface Env { AI: Ai;}
export default { async fetch(request: Request, env: Env): Promise<Response> { const res = await fetch("https://cataas.com/cat"); const blob = await res.arrayBuffer(); const input = { image: [...new Uint8Array(blob)], prompt: "Generate a caption for this image", max_tokens: 512, }; const response = await env.AI.run( "@cf/unum/uform-gen2-qwen-500m", input ); return new Response(JSON.stringify(response)); },} satisfies ExportedHandler<Env>;
Parameters
* indicates a required field
Input
-
0
stringBinary string representing the image contents.
-
1
object-
temperature
numberControls the randomness of the output; higher values produce more random results.
-
prompt
stringThe input text prompt for the model to generate a response.
-
raw
booleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
-
top_p
numberControls the creativity of the AI's responses by adjusting how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
-
top_k
numberLimits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
-
seed
numberRandom seed for reproducibility of the generation.
-
repetition_penalty
numberPenalty for repeated tokens; higher values discourage repetition.
-
frequency_penalty
numberDecreases the likelihood of the model repeating the same lines verbatim.
-
presence_penalty
numberIncreases the likelihood of the model introducing new topics.
-
image *
one of-
0
arrayAn array of integers that represent the image data constrained to 8-bit unsigned integer values
-
items
numberA value between 0 and 255
-
-
1
stringBinary string representing the image contents.
-
-
max_tokens
integer default 512The maximum number of tokens to generate in the response.
-
Output
-
description
string
API Schemas
The following schemas are based on JSON Schema
{ "oneOf": [ { "type": "string", "format": "binary", "description": "Binary string representing the image contents." }, { "type": "object", "properties": { "temperature": { "type": "number", "description": "Controls the randomness of the output; higher values produce more random results." }, "prompt": { "type": "string", "description": "The input text prompt for the model to generate a response." }, "raw": { "type": "boolean", "default": false, "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting." }, "top_p": { "type": "number", "description": "Controls the creativity of the AI's responses by adjusting how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses." }, "top_k": { "type": "number", "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises." }, "seed": { "type": "number", "description": "Random seed for reproducibility of the generation." }, "repetition_penalty": { "type": "number", "description": "Penalty for repeated tokens; higher values discourage repetition." }, "frequency_penalty": { "type": "number", "description": "Decreases the likelihood of the model repeating the same lines verbatim." }, "presence_penalty": { "type": "number", "description": "Increases the likelihood of the model introducing new topics." }, "image": { "oneOf": [ { "type": "array", "description": "An array of integers that represent the image data constrained to 8-bit unsigned integer values", "items": { "type": "number", "description": "A value between 0 and 255" } }, { "type": "string", "format": "binary", "description": "Binary string representing the image contents." } ] }, "max_tokens": { "type": "integer", "default": 512, "description": "The maximum number of tokens to generate in the response." } }, "required": [ "image" ] } ]}
{ "type": "object", "contentType": "application/json", "properties": { "description": { "type": "string" } }}