Workers AI Models

MetaSummarization

BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. You can use this model for text summarization.

Cloudflare-hosted
Deprecated

bge-base-en-v1.5

BAAI general embedding (Base) model that transforms any given text into a 768-dimensional vector

Cloudflare-hosted
Batch

bge-large-en-v1.5

BAAI general embedding (Large) model that transforms any given text into a 1024-dimensional vector

Cloudflare-hosted
Batch

bge-m3

Multi-Functionality, Multi-Linguality, and Multi-Granularity embeddings model.

Cloudflare-hosted

bge-reranker-base

BAAIText Classification

Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.

Cloudflare-hosted

bge-small-en-v1.5

BAAI general embedding (Small) model that transforms any given text into a 384-dimensional vector

Cloudflare-hosted
Batch

deepseek-r1-distill-qwen-32b

DeepSeekText Generation

DeepSeek-R1-Distill-Qwen-32B is a model distilled from DeepSeek-R1 based on Qwen2.5. It outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

Cloudflare-hosted
Reasoning

detr-resnet-50

HuggingFaceText Classification

MetaObject Detection

DEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images).

Cloudflare-hosted

distilbert-sst-2-int8

Distilled BERT model that was finetuned on SST-2 for sentiment classification

Cloudflare-hosted

dreamshaper-8-lcm

lykonText-to-Image

Stable Diffusion model that has been fine-tuned to be better at photorealism without sacrificing range.

Cloudflare-hosted

embeddinggemma-300m

GoogleText Embeddings

EmbeddingGemma is a 300M parameter, state-of-the-art for its size, open embedding model from Google, built from Gemma 3 (with T5Gemma initialization) and the same research and technology used to create Gemini models. EmbeddingGemma produces vector representations of text, making it well-suited for search and retrieval tasks, including classification, clustering, and semantic similarity search. This model was trained with data in 100+ spoken languages.

Cloudflare-hosted

flux

DeepgramAutomatic Speech Recognition

Flux is the first conversational speech recognition model built specifically for voice agents.

Cloudflare-hosted
Partner
Real-time

flux-1-schnell

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.

Cloudflare-hosted

flux-2-dev

FLUX.2 [dev] is an image model from Black Forest Labs where you can generate highly realistic and detailed images, with multi-reference support.

Cloudflare-hosted
Partner

flux-2-klein-4b

FLUX.2 [klein] is an ultra-fast, distilled image model. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.

Cloudflare-hosted
Partner

flux-2-klein-9b

FLUX.2 [klein] 9B is an ultra-fast, distilled image model with enhanced quality. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.

Cloudflare-hosted
Partner

gemma-2b-it-lora

This is a Gemma-2B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

Cloudflare-hosted
LoRA

gemma-3-12b-it

Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

Cloudflare-hosted
LoRA
Deprecated

gemma-4-26b-a4b-it

Gemma 4 is Google's most intelligent family of open models, built from Gemini 3 research to maximize intelligence-per-parameter.

Cloudflare-hosted
Function calling
Reasoning
Vision

gemma-7b-it

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.

Cloudflare-hosted
LoRA
Deprecated

gemma-7b-it-lora

aisingaporeText Generation

This is a Gemma-7B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

Cloudflare-hosted
LoRA

gemma-sea-lion-v4-27b-it

SEA-LION stands for Southeast Asian Languages In One Network, which is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned for the Southeast Asia (SEA) region.

Cloudflare-hosted

glm-4.7-flash

Zhipu AIText Generation

GLM-4.7-Flash is a fast and efficient multilingual text generation model with a 131,072 token context window. Optimized for dialogue, instruction-following, and multi-turn tool calling across 100+ languages.

Cloudflare-hosted
Function calling
Reasoning

glm-5.2

Zhipu AIText Generation

Z.ai's flagship agentic coding model

Cloudflare-hosted
Function calling
Reasoning

gpt-oss-120b

OpenAIText Generation

OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-120b is for production, general purpose, high reasoning use-cases.

Cloudflare-hosted
Function calling
Reasoning

gpt-oss-20b

OpenAIText Generation

OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-20b is for lower latency, and local or specialized use-cases.

Cloudflare-hosted
Function calling
Reasoning

granite-4.0-h-micro

IBMText Generation

Granite 4.0 instruct models deliver strong performance across benchmarks, achieving industry-leading results in key agentic tasks like instruction following and function calling. These efficiencies make the models well-suited for a wide range of use cases like retrieval-augmented generation (RAG), multi-agent workflows, and edge deployments.

Cloudflare-hosted
Function calling

hermes-2-pro-mistral-7b