Models

llama-4-scout-17b-16e-instruct

OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-120b is for production, general purpose, high reasoning use-cases.

Cloudflare-hosted
Function calling
Reasoning

📌

Text Generation • Moonshot AI

Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

Cloudflare-hosted
Batch
Function calling
Vision

kimi-k3

Kimi K3 is Moonshot's flagship 2.8 trillion-parameter model, built on Kimi Delta Attention (a hybrid linear attention mechanism) with Attention Residuals. It offers native visual understanding, always-on reasoning, and a 1M-token context window for long-horizon coding, knowledge work, and deep reasoning tasks.

Third-party

gpt-5.6-luna

GPT-5.6 Luna is an OpenAI GPT-5.6 model optimized for cost-sensitive workloads, using the Responses API for efficient text generation.

Third-party

gpt-5.6-terra

GPT-5.6 Terra is an OpenAI GPT-5.6 model that balances intelligence and cost, using the Responses API for reasoning and stateful context management.

Third-party

gpt-5.6-sol

GPT-5.6 Sol is OpenAI's frontier GPT-5.6 model for complex professional work, using the Responses API for reasoning and stateful context management.

Third-party

claude-sonnet-5

Image-to-Text • Moondream

Claude Sonnet 5 is Anthropic's most agentic Sonnet model yet, built for coding, tool use, reasoning, and long-horizon professional work at lower cost than Opus-class models.

Third-party

moondream3.1-9B-A2B

Moondream 3 is a fast, efficient 9B mixture-of-experts vision language model (2B active parameters) that delivers frontier-level visual reasoning for tasks like object detection, pointing, OCR, and structured output.

Cloudflare-hosted
Vision

hh1.1-r2v

Alibaba's HappyHorse 1.1 reference-to-video model. Takes 1-9 reference images (characters and scenes) and a prompt that choreographs them into a single video, keeping each subject's identity consistent. Supports 720P and 1080P output with durations from 3 to 15 seconds.

Third-party

hh1.1-i2v

Image-to-Video • Pruna AI

Alibaba's HappyHorse 1.1 image-to-video model. Animates a reference image with an optional text prompt, with smoother motion, natural skin textures, and improved close-up quality over 1.0. Supports 720P and 1080P output with durations from 3 to 15 seconds.

Third-party

hh1.1-t2v

Text-to-Video • Alibaba

Alibaba's HappyHorse 1.1 text-to-video model. Generates videos from a text prompt with stronger dynamic expressiveness, better visual quality, and improved instruction following over 1.0. Configurable resolution, aspect ratio, and duration (3-15s).

Third-party

p-video

Text-to-Video • Pruna AI

Pruna's P-Video is a premium video generation model supporting text-to-video, image-to-video, and audio-conditioned generation up to 1080p at 24 or 48 fps, with configurable duration up to 20 seconds.

Third-party

p-video-animate

Pruna's P-Video-Animate takes a source video and a subject reference image, then animates the referenced subject using the motion and audio from the source video.

Third-party

p-video-replace

Image-to-Video • Pruna AI

Pruna's P-Video-Replace takes a source video and one or more identity reference images, then places the referenced person or people into the video while preserving the source motion and audio.

Third-party

p-video-avatar

Image-to-Video • Pruna AI

Pruna's P-Video-Avatar generates talking-head videos from a single portrait image driven by a text script or audio file, with multiple voices, languages, and output resolutions.

Third-party

p-image-try-on

Image-to-Image • Pruna AI

Pruna's P-Image Try-On virtually fits one or more garments onto a person's photo. Provide a photo of a person plus garment reference images and the model realistically dresses the person in the provided garments.

Third-party

p-image-upscale

Image-to-Image • Pruna AI

Pruna's P-Image-Upscale increases image resolution using AI, targeting 1-128 megapixels with optional detail and realism enhancement for sharper, cleaner results.

Third-party

p-image-edit

Image-to-Image • Pruna AI

Pruna's P-Image-Edit edits and composes 1-5 reference images with text instructions. It supports complex compositions, style transfers, and targeted edits with flexible output aspect ratios.

Third-party

p-image

Text-to-Image • Pruna AI

Pruna's P-Image is an ultra-fast text-to-image model with automatic prompt enhancement and 2-stage refinement, combining exceptional speed with high-quality output and flexible aspect ratios.

Third-party

gemini-3.5-flash

Text Generation • Zhipu AI

Gemini 3.5 Flash is Google's fast multimodal model with frontier intelligence, superior search, and grounding capabilities.

Third-party

krea-2-large

Text-to-Image • krea

More than 2x the size of Medium, with softer post-training. Outputs are rawer, more textured, and more flexible — at its best, Large produces results Medium can't match. Strongest on photorealism, raw looks (motion blur, grain, low dynamic range), and expressive and artistic styles.

Third-party

krea-2-medium

Text-to-Image • krea

Smaller, faster, more cost-efficient. Extensive post-training makes outputs especially stable and consistent across generations. Strongest on illustration, anime, painting, and other expressive or artistic styles.

Third-party

krea-2-medium-turbo

Text-to-Image • krea

The fastest Krea 2 model, built for low-cost iteration on expressive illustrations, style-driven concepts, and rapid visual exploration. Keeps the Krea 2 style system and expressive visual range but uses a distilled sampling schedule so you can move through ideas much faster. Especially useful for expressive illustration, graphic styles, typography experiments, and quick campaign or concept directions.

Third-party

aleph-2

Text-to-Video • RunwayML

RunwayML's video editing model. Edit one frame to update your whole video, make changes across multiple shots, and work with up to 30 seconds of video. Supports keyframe-guided editing for precise control over specific moments in the clip.

Third-party

glm-5.2

Z.ai's flagship agentic coding model

Cloudflare-hosted
Function calling
Reasoning

claude-fable-5

Text Generation • deepseek

Claude Fable 5 is Anthropic's most capable widely released model, built for the most demanding reasoning and long-horizon agentic work. Adaptive thinking is always on, and the model supports a 1M token context window with up to 128k output tokens per request.

Third-party

deepseek-v4-pro

DeepSeek V4 Pro is a high-capability reasoning model from DeepSeek, served via Fireworks infrastructure for production-grade inference.

Third-party

grok-voice

websocket • xAI

xAI's real-time voice conversation model with low-latency audio input and output streaming.

Third-party

grok-tts

Text-to-Speech • xAI

xAI's Grok text-to-speech model. Generates high-fidelity spoken audio in 5 expressive voices (eve, ara, rex, sal, leo) with 20+ supported languages. Supports inline speech tags for laughter, whispers, and pauses.

Third-party

grok-stt

Automatic Speech Recognition • xAI

xAI's Grok speech-to-text model. Transcribes audio files into text across 25 languages with word-level timestamps, multichannel transcription, speaker diarization, and key-term biasing.

Third-party

grok-imagine-video-1.5-preview

Image-to-Video • xAI

xAI's next-generation video generation model. Generates, edits, and extends videos from text and image inputs. Supports multiple aspect ratios and resolutions with improved quality over the previous generation.

Third-party

Text Generation • MiniMax

MiniMax's M3 language model with frontier coding and agentic capabilities, a 1M token context window, and multilingual support.

Third-party
Zero data retention

wan-2.7-i2v

Alibaba's Wan 2.7 image-to-video model that generates videos from a reference image with optional text prompts. Supports 720P and 1080P output with durations from 2 to 15 seconds.

Third-party
Zero data retention

claude-opus-4.8

Claude Opus 4.8 is Anthropic's most capable generally available model, with a step-change improvement in agentic coding over Claude Opus 4.7. It uses adaptive thinking to calibrate reasoning per task and supports a one million token context window at standard pricing.

Third-party
Zero data retention

flux-2-pro-preview

FLUX.2 [pro] Preview is Black Forest Labs' recommended default for production image generation and editing — tracks the latest [pro] weights with strong multi-reference support.

Third-party

flux-2-max

FLUX.2 [max] is Black Forest Labs' highest-quality image model — top editing consistency, strongest prompt following, and grounding search for visualizations of real-time information.

Third-party

flux-2-flex

grok-imagine-image-quality

FLUX.2 [flex] is Black Forest Labs' fine-grained control variant of FLUX.2 — exposes tunable inference steps, guidance, and prompt upsampling for typography-heavy and production workflows.

Third-party

grok-imagine-video

Text-to-Video • xAI

xAI's video generation model. Generates, edits, and extends videos from text and image inputs with native synchronized audio including dialogue, sound effects, and music. Supports multiple creative modes (normal, fun, custom).

Third-party

Text-to-Image • xAI

xAI's higher-fidelity text-to-image model optimized for sharper details, more accurate compositions, and stronger text rendering. Supports image editing via reference images and masks. Trades speed for quality compared to grok-imagine-image. Default output at 2k resolution.

Third-party

grok-4.3

grok-4.20-multi-agent-0309

xAI's Grok 4.3 model with a 1M-token context window and strong agentic tool calling with minimal hallucinations. Accepts text and image inputs, and supports function calling, structured outputs, and configurable reasoning effort (none, low, medium, high).

Third-party

grok-imagine-image

Text-to-Image • xAI

xAI's Grok Imagine image model. Generates and edits images from text and reference-image inputs with configurable aspect ratio and resolution.

Third-party

grok-4.20-0309-non-reasoning

xAI's Grok 4.20 multi-agent model with a 2M-token context window. Multiple agents collaborate in parallel to perform deep research tasks, with function calling, structured outputs, and reasoning capabilities.

Third-party

xAI's Grok 4.20 non-reasoning model. Skips the thinking trace for fast, single-pass responses while keeping the same training as the reasoning variant.

Third-party

grok-4.20-0309-reasoning

xAI's Grok 4.20 reasoning model. Uses extended thinking to work through complex problems, returning a reasoning trace alongside the final answer.

Third-party

q3-pro

Text-to-Video • Vidu

Vidu Q3 Pro is a high-quality video generation model supporting text-to-video, image-to-video, and start/end-frame-to-video workflows with audio and up to 16-second clips.

Third-party
Zero data retention

q3-turbo

Text-to-Video • Vidu

Vidu Q3 Turbo is a faster version of Vidu Q3 optimized for lower latency video generation while maintaining audio support and up to 16-second clips.

Third-party
Zero data retention

gen-4.5

Text-to-Video • RunwayML

RunwayML's video generation model supporting both text-to-video and image-to-video with customizable duration, aspect ratio, and content moderation controls.

Third-party

recraftv4-vector

Generate production-ready SVG vector graphics from text prompts with clean geometry, structured layers, and editable paths.

Third-party
Zero data retention

recraftv4-pro

Recraft V4 Pro generates high-resolution, art-directed images at 2048px+ with strong composition, text rendering, and design taste. Built for print and production work.

Third-party
Zero data retention

recraftv4-pro-vector

recraftv4-1-utility-vector

Generate detailed, production-ready SVG vector graphics from text prompts with fine geometry, scalable to any size for print and design work.

Third-party
Zero data retention

Generate production-ready SVG vector graphics from text prompts with a general-purpose model suited for a wide range of design and illustration tasks.

Third-party
Zero data retention

recraftv4-1-vector

Generate production-ready SVG vector graphics from text prompts with high aesthetic quality, clean geometry, structured layers, and editable paths.

Third-party
Zero data retention

recraftv4-1-utility-pro

recraftv4-1-utility-pro-vector

Recraft V4.1 Utility Pro is a general-purpose text-to-image model producing high-resolution 2048px+ output for a wide range of production and print use cases.

Third-party
Zero data retention

Generate detailed, high-resolution SVG vector graphics from text prompts with a general-purpose model, scalable to any size for print and large-scale design work.

Third-party
Zero data retention

recraftv4-1-pro-vector

Generate detailed, high-resolution SVG vector graphics from text prompts with high aesthetic quality, fine geometry, scalable to any size for print and design work.

Third-party
Zero data retention

recraftv4-1-utility

Recraft V4.1 Utility is a general-purpose text-to-image model balancing quality and flexibility for a wide range of everyday use cases at standard resolution.

Third-party
Zero data retention

recraftv4-1

Recraft V4.1 generates art-directed images tuned for high aesthetics, with strong composition, accurate text rendering, and refined design taste. Fast and cost-efficient at standard resolution.

Third-party
Zero data retention

recraftv4-1-pro

Recraft V4.1 Pro generates high-resolution, art-directed images at 2048px+ tuned for high aesthetics, with strong composition, text rendering, and refined design taste. Built for print and production work.

Third-party
Zero data retention

recraftv4

Recraft V4 generates art-directed images with strong composition, accurate text rendering, and design taste built in. Fast and cost-efficient at standard resolution.

Third-party
Zero data retention

Text-to-Video • PixVerse

Pixverse v6 is the latest Pixverse video model with support for up to 15-second videos, customizable duration from 1 to 15 seconds, and audio generation.

Third-party
Zero data retention

recraftv3

Recraft V3 is the previous-generation text-to-image model from Recraft, well-suited to design-quality compositions, brand-aware imagery, and accurate text rendering.

Third-party
Zero data retention

v5.6

Text-to-Video • PixVerse

Pixverse v5.6 is a video generation model supporting text-to-video and image-to-video with audio generation, customizable aspect ratios, and up to 1080p output.

Third-party
Zero data retention

tts-1-hd

Text-to-Speech • OpenAI

OpenAI's high-definition text-to-speech model producing higher quality audio output.

Third-party
Zero data retention

tts-1

Text-to-Speech • OpenAI

OpenAI's text-to-speech model optimized for real-time use with low latency.

Third-party
Zero data retention

o4-mini

OpenAI's fast, lightweight reasoning model optimized for multi-step problem solving at lower cost.

Third-party
Zero data retention

o3-mini

o3-mini is the lightweight, low-cost reasoning variant of o3, well suited to quick analytical tasks at scale.

Third-party
Zero data retention

o3 is OpenAI’s general-purpose reasoning model, balancing strong analytical performance with reasonable latency and cost.

Third-party
Zero data retention

gpt-image-2

Text-to-Image • OpenAI

OpenAI's next-generation image model that creates and edits images from text prompts, with support for multiple quality levels, sizes, and output formats. Note: transparent backgrounds are not supported — use openai/gpt-image-1.5 for transparent PNGs.

Third-party
Zero data retention

gpt-5.5-pro

GPT-5.5 pro uses OpenAI's Responses API with built-in tools, improved reasoning, and stateful context management.

Third-party
Zero data retention

gpt-image-1.5

Text-to-Image • OpenAI

OpenAI's image generation model that creates and edits images from text prompts, supporting multiple quality levels and output sizes.

Third-party
Zero data retention

gpt-5.5

GPT-5.5 is OpenAI's flagship model with strong coding, reasoning, and multimodal capabilities.

Third-party
Zero data retention

gpt-5.4-nano

GPT-5.4 nano is OpenAI's smallest and fastest model, optimized for edge and low-latency use cases.

Third-party
Zero data retention

gpt-5.4-pro

GPT-5.4 pro uses OpenAI's Responses API with built-in tools, improved reasoning, and stateful context management.

Third-party
Zero data retention

gpt-5.4

GPT-5.4 is OpenAI's flagship model with strong coding, reasoning, and multimodal capabilities.

Third-party
Zero data retention

gpt-5.4-mini

GPT-5.4 mini is a smaller, faster, and more cost-efficient version of GPT-5.4 for lightweight tasks.

Third-party
Zero data retention

gpt-5.1-chat

GPT-5.1 Chat is the chat-tuned variant of GPT-5.1, optimised for back-and-forth conversation and instruction following.

Third-party

gpt-5.1

GPT-5.1 is OpenAI’s incremental improvement over GPT-5, with stronger coding, reasoning, and writing.

Third-party
Zero data retention

gpt-5-nano

GPT-5 Nano is OpenAI’s smallest GPT-5 variant, optimized for low latency and cheap, high-throughput tasks.

Third-party
Zero data retention

gpt-5-chat

GPT-5 Chat is the chat-tuned variant of GPT-5, optimised for back-and-forth conversation and instruction following.

Third-party
Zero data retention

gpt-5-mini

GPT-5 Mini is the lightweight, low-cost variant of GPT-5, well suited to high-volume coding and reasoning tasks.

Third-party
Zero data retention

gpt-5

OpenAI's model excelling at coding, writing, and reasoning.

Third-party
Zero data retention

gpt-4o-transcribe

A speech-to-text model that uses GPT-4o to transcribe audio with improved word error rate and better language recognition compared to original Whisper models.

Third-party
Zero data retention

gpt-4o-mini

GPT-4o Mini is the lightweight, low-cost variant of GPT-4o, well suited to high-volume tasks with multimodal inputs.

Third-party
Zero data retention

gpt-4o

GPT-4o is OpenAI’s multimodal flagship, accepting text and images and responding quickly across a wide range of tasks.

Third-party
Zero data retention

gpt-4.1-mini

Fast, affordable version of GPT-4.1 with a million-token context window.

Third-party
Zero data retention

gpt-4.1-nano

GPT-4.1 Nano is OpenAI’s smallest and cheapest GPT-4.1 variant, optimized for high-throughput, low-latency tasks.

Third-party
Zero data retention

gpt-4.1

Music Generation • MiniMax

OpenAI's flagship GPT model for complex tasks with a million-token context window.

Third-party
Zero data retention

speech-2.8-turbo

Text-to-Speech • MiniMax

MiniMax Speech 2.8 Turbo turns text into natural, expressive speech with voice cloning, emotion control, and 40+ language support at faster speeds.

Third-party
Zero data retention

music-2.6

MiniMax's music generation model that creates full-length songs with vocals from text prompts and lyrics, or instrumental tracks. Supports BPM/key control and auto-generated lyrics.

Third-party
Zero data retention

speech-2.8-hd

Text-to-Speech • MiniMax

MiniMax Speech 2.8 HD focuses on studio-grade audio generation with emotion control, multilingual support (40+ languages), and voice cloning.

Third-party
Zero data retention

m2.7

Text Generation • MiniMax

MiniMax's M2.7 language model with multilingual capabilities.

Third-party
Zero data retention

hailuo-2.3-fast

Text-to-Video • MiniMax

A lower-latency version of Hailuo 2.3 that preserves core motion quality, visual consistency, and stylization while enabling faster iteration.

Third-party
Zero data retention

tts-2

Text-to-Speech • Inworld

Inworld's most powerful and expressive text-to-speech model. Builds on TTS 1.5 with rich expressive speech, real-time latency, natural language steering (e.g. [whisper], [say excitedly]), and stronger multilingual support across 15 production languages plus 90+ experimental languages.

Third-party
Zero data retention

hailuo-2.3

Text-to-Video • MiniMax

A high-fidelity video generation model optimized for realistic human motion, cinematic VFX, expressive characters, and strong prompt and style adherence across text-to-video and image-to-video workflows.

Third-party
Zero data retention

tts-1.5-max

Text-to-Speech • Inworld

Highest-quality text-to-speech with under 200ms latency, emotion control, and 15-language support.

Third-party
Zero data retention

tts-1.5-mini

Text-to-Speech • Inworld

Ultra-fast, cost-efficient text-to-speech with approximately 120ms latency and 15-language support.

Third-party
Zero data retention

veo-3.1-fast

A faster version of Veo 3.1 optimized for lower latency while maintaining high-quality video and audio output.

Third-party
Zero data retention

veo-3-fast

A faster version of Veo 3 optimized for lower latency video generation with audio support.

Third-party
Zero data retention

veo-3.1

Google's latest video generation model with improved quality, motion, and audio generation.

Third-party
Zero data retention

nano-banana-pro

Google's higher-quality image generation model with improved detail and prompt adherence.

Third-party
Zero data retention

veo-3

Google's video generation model capable of producing high-quality videos with optional audio from text prompts.

Third-party
Zero data retention

nano-banana

Google's fast image generation model producing high-quality images from text prompts.

Third-party
Zero data retention

nano-banana-2

Google's second-generation image generation model with improved quality and speed.

Third-party
Zero data retention

gemini-3.1-pro

Google's most intelligent Gemini model with improved reasoning, a medium thinking level, and a 1M token context window.

Third-party
Zero data retention

imagen-4

Google's latest image generation model producing high-quality, photorealistic images from text prompts with support for multiple aspect ratios.

Third-party
Zero data retention

gemini-3.1-flash-tts

Text-to-Speech • Google

Third-party
Zero data retention

gemini-3-flash

Gemini 3 Flash is Google's fast multimodal model with frontier intelligence, superior search, and grounding capabilities.

Third-party
Zero data retention

gemini-3.1-flash-lite

Google's lightest and most cost-efficient Gemini model for high-throughput tasks.

Third-party
Zero data retention

gemini-2.5-flash-lite

Google's lightest and most cost-efficient Gemini 2.5 model for high-throughput tasks.

Third-party
Zero data retention

gemini-2.5-pro

Google's most capable Gemini 2.5 model with strong reasoning, thinking support, and a 1M token context window.

Third-party
Zero data retention

gemini-2.5-flash

Google's fast multimodal Gemini 2.5 model with strong reasoning and a 1M token context window.

Third-party
Zero data retention

seedream-4.5

Seedream 4.5 builds on 4.0 with multi-reference image support, batch generation, and sequential image generation.

Third-party

seedream-5-lite

Seedream 5 Lite is a lighter, faster version of the Seedream 5 family with multi-reference and batch generation support.

Third-party

seedream-4.0

Text-to-Video • ByteDance

Seedream 4.0 is ByteDance's image creation model that combines text-to-image generation and image editing into a single architecture, offering fast, high-resolution output up to 4K.

Third-party

seedance-2.0-fast

Faster variant of ByteDance's Seedance 2.0 video model. Trades some quality for speed while sharing the same multimodal architecture. Supports text-to-video, image-to-video, native audio generation, multimodal references (images, videos, audio), video editing, and video extension.

Third-party

universal-3-pro

Automatic Speech Recognition • AssemblyAI

AssemblyAI's Universal 3 Pro speech recognition model for high-accuracy transcription.

Third-party
Zero data retention

seedance-2.0

Text-to-Video • ByteDance

ByteDance's next-generation video model with a unified multimodal architecture. Generates high-quality video with synchronized audio from text, images, video clips, and audio inputs. Supports multimodal references (up to 9 images, 3 videos, 3 audio files), native audio generation, video editing, video extension, intelligent duration, and adaptive aspect ratio.

Third-party

claude-sonnet-4.5

Claude Sonnet 4.5 is the best coding model to date, with significant improvements across the entire development lifecycle.

Third-party
Zero data retention

claude-sonnet-4.6

Claude Sonnet 4.6 is Anthropic's latest balanced model offering strong coding, reasoning, and agentic capabilities with improved instruction following.

Third-party
Zero data retention

claude-opus-4.6

Claude Opus 4.6 is Anthropic's flagship language model built for complex, multi-step work in coding, financial analysis, and legal reasoning. It uses extended thinking to work through complex problems carefully and features a one million token context window.

Third-party
Zero data retention

claude-opus-4.7

Claude Opus 4.7 is Anthropic's most capable generally available model, with a step-change improvement in agentic coding over Claude Opus 4.6. It uses adaptive thinking to calibrate reasoning per task and supports a one million token context window at standard pricing.

Third-party
Zero data retention

claude-opus-4.5

Claude Opus 4.5 brings further reasoning, coding, and agentic improvements over Opus 4.1, with stronger tool use and tighter instruction following.

Third-party
Zero data retention

wan-2.6-image

Text-to-Image • Alibaba

Alibaba's Wan 2.6 text-to-image model generating images from text prompts with optional negative prompts and customizable dimensions.

Third-party
Zero data retention

claude-haiku-4.5

Text Generation • Alibaba

Claude Haiku 4.5 delivers similar levels of coding performance at one-third the cost and more than twice the speed of larger models.

Third-party
Zero data retention

qwen3-max

Alibaba's Qwen 3 Max is a large language model with strong coding, reasoning, and multilingual capabilities, served via DashScope's OpenAI-compatible endpoint.

Third-party
Zero data retention

qwen3.5-397b-a17b

Text Generation • Alibaba

Alibaba's Qwen 3.5 is a 397B-parameter mixture-of-experts model with 17B active parameters, offering strong reasoning capabilities with efficient inference.

Third-party
Zero data retention

hh1-t2v

Text-to-Video • Alibaba

Alibaba's HappyHorse 1.0 text-to-video model. Generates videos from a text prompt with configurable resolution, aspect ratio, and duration (3-15s).

Third-party
Zero data retention

hh1-i2v

Text Generation • Moonshot AI

Alibaba's HappyHorse 1.0 image-to-video model. Animates a reference image with an optional text prompt. Supports 720P and 1080P output with durations from 3 to 15 seconds.

Third-party
Zero data retention

kimi-k2.6

Kimi K2.6 is a frontier-scale open-source 1T parameter model with a 262.1k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.

Cloudflare-hosted
Function calling
Reasoning
Vision

gemma-4-26b-a4b-it

Text Generation • Moonshot AI

Gemma 4 is Google's most intelligent family of open models, built from Gemini 3 research to maximize intelligence-per-parameter.

Cloudflare-hosted
Function calling
Reasoning
Vision

nemotron-3-120b-a12b

Text Generation • NVIDIA

NVIDIA Nemotron 3 Super is a hybrid MoE model with leading accuracy for multi-agent applications and specialized agentic AI systems.

Cloudflare-hosted
Function calling
Reasoning

kimi-k2.5

Kimi K2.5 is a frontier-scale open-source model with a 256k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.

Cloudflare-hosted
Function calling
Deprecated
Reasoning
Vision

flux-2-klein-9b

FLUX.2 [klein] 9B is an ultra-fast, distilled image model with enhanced quality. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.

Cloudflare-hosted
Partner

flux-2-klein-4b

FLUX.2 [klein] is an ultra-fast, distilled image model. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.

Cloudflare-hosted
Partner

flux-2-dev

Text-to-Speech • Deepgram

FLUX.2 [dev] is an image model from Black Forest Labs where you can generate highly realistic and detailed images, with multi-reference support.

Cloudflare-hosted
Partner

aura-2-es

Aura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

Cloudflare-hosted
Batch
Partner
Real-time

aura-2-en

Text-to-Speech • Deepgram

Cloudflare-hosted
Batch
Partner
Real-time

granite-4.0-h-micro

Text Generation • IBM

Granite 4.0 instruct models deliver strong performance across benchmarks, achieving industry-leading results in key agentic tasks like instruction following and function calling. These efficiencies make the models well-suited for a wide range of use cases like retrieval-augmented generation (RAG), multi-agent workflows, and edge deployments.

Cloudflare-hosted
Function calling

flux

Automatic Speech Recognition • Deepgram

Flux is the first conversational speech recognition model built specifically for voice agents.

Cloudflare-hosted
Partner
Real-time

plamo-embedding-1b

Text Embeddings • pfnet

PLaMo-Embedding-1B is a Japanese text embedding model developed by Preferred Networks, Inc. It can convert Japanese text input into numerical vectors and can be used for a wide range of applications, including information retrieval, text classification, and clustering.

Cloudflare-hosted

gemma-sea-lion-v4-27b-it

Text Generation • aisingapore

SEA-LION stands for Southeast Asian Languages In One Network, which is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned for the Southeast Asia (SEA) region.

Cloudflare-hosted

indictrans2-en-indic-1B

Translation • ai4bharat

IndicTrans2 is the first open-source transformer-based multilingual NMT model that supports high-quality translations across all the 22 scheduled Indic languages

Cloudflare-hosted

embeddinggemma-300m

Text Embeddings • Google

EmbeddingGemma is a 300M parameter, state-of-the-art for its size, open embedding model from Google, built from Gemma 3 (with T5Gemma initialization) and the same research and technology used to create Gemini models. EmbeddingGemma produces vector representations of text, making it well-suited for search and retrieval tasks, including classification, clustering, and semantic similarity search. This model was trained with data in 100+ spoken languages.

Cloudflare-hosted

aura-1

Text-to-Speech • Deepgram

Aura is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

Cloudflare-hosted
Batch
Partner
Real-time

lucid-origin

Text-to-Image • Leonardo

Lucid Origin from Leonardo.AI is their most adaptable and prompt-responsive model to date. Whether you're generating images with sharp graphic design, stunning full-HD renders, or highly specific creative direction, it adheres closely to your prompts, renders text with accuracy, and supports a wide array of visual styles and aesthetics – from stylized concept art to crisp product mockups.

Cloudflare-hosted
Partner

phoenix-1.0

Text-to-Image • Leonardo

Phoenix 1.0 is a model by Leonardo.Ai that generates images with exceptional prompt adherence and coherent text.

Cloudflare-hosted
Partner

gpt-oss-20b

Voice Activity Detection • Pipecat

OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-20b is for lower latency, and local or specialized use-cases.

Cloudflare-hosted
Function calling
Reasoning

smart-turn-v2

An open source, community-driven, native audio turn detection model in 2nd version

Cloudflare-hosted
Batch
Real-time

qwen3-embedding-0.6b

Text Embeddings • Qwen

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks.

Cloudflare-hosted

nova-3

Automatic Speech Recognition • Deepgram

Transcribe audio using Deepgram’s speech-to-text model

Cloudflare-hosted
Batch
Partner
Real-time

qwen3-30b-a3b-fp8

Text Generation • Qwen

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

Cloudflare-hosted
Batch
Function calling
Reasoning

gemma-3-12b-it

mistral-small-3.1-24b-instruct

Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

Cloudflare-hosted
LoRA
Deprecated

qwen2.5-coder-32b-instruct

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.

Cloudflare-hosted
Function calling

qwq-32b

Text Generation • Qwen

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

Cloudflare-hosted
LoRA
Reasoning

Text Generation • Qwen

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

Cloudflare-hosted
LoRA

bge-reranker-base

Text Classification • BAAI

Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.

Cloudflare-hosted

llama-guard-3-8b

deepseek-r1-distill-qwen-32b

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

Cloudflare-hosted
LoRA

Text Generation • DeepSeek

DeepSeek-R1-Distill-Qwen-32B is a model distilled from DeepSeek-R1 based on Qwen2.5. It outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

Cloudflare-hosted
Reasoning

llama-3.3-70b-instruct-fp8-fast

Llama 3.3 70B quantized to fp8 precision, optimized to be faster.

Cloudflare-hosted
Batch
Function calling

llama-3.2-1b-instruct

The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.

Cloudflare-hosted

llama-3.2-3b-instruct

llama-3.2-11b-vision-instruct

The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.

Cloudflare-hosted

The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.

Cloudflare-hosted
LoRA
Vision

flux-1-schnell

llama-3.1-8b-instruct-awq

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.

Cloudflare-hosted

llama-3.1-8b-instruct-fp8

Quantized (int4) generative text model with 8 billion parameters from Meta.

Cloudflare-hosted
Deprecated

Llama 3.1 8B quantized to FP8 precision

Cloudflare-hosted

melotts

Text-to-Speech • MyShell

MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai.

Cloudflare-hosted

llama-3.1-8b-instruct

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Cloudflare-hosted
Deprecated

bge-m3

Multi-Functionality, Multi-Linguality, and Multi-Granularity embeddings model.

Cloudflare-hosted

meta-llama-3-8b-instruct

Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

Cloudflare-hosted
Deprecated

whisper-large-v3-turbo

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation.

Cloudflare-hosted
Batch

llama-3-8b-instruct-awq

Quantized (int4) generative text model with 8 billion parameters from Meta.

Cloudflare-hosted
Deprecated

llava-1.5-7b-hfBeta

Image-to-Text • llava-hf

LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

Cloudflare-hosted

whisper-tiny-enBeta

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. This is the English-only version of the Whisper Tiny model which was trained on the task of speech recognition.

Cloudflare-hosted

llama-3-8b-instruct

mistral-7b-instruct-v0.2Beta

Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

Cloudflare-hosted
Deprecated

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2. Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1: 32k context window (vs 8k context in v0.1), rope-theta = 1e6, and no Sliding-Window Attention.

Cloudflare-hosted
LoRA
Deprecated

gemma-7b-it-loraBeta

This is a Gemma-7B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

Cloudflare-hosted
LoRA

gemma-2b-it-loraBeta

llama-2-7b-chat-hf-loraBeta

This is a Gemma-2B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

Cloudflare-hosted
LoRA

This is a Llama2 base model that Cloudflare dedicated for inference with LoRA adapters. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.

Cloudflare-hosted
LoRA

gemma-7b-itBeta

hermes-2-pro-mistral-7bBeta

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.

Cloudflare-hosted
LoRA
Deprecated

Text Generation • nousresearch

Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes! Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Cloudflare-hosted
Function calling
Deprecated

mistral-7b-instruct-v0.2-loraBeta

Text Generation • Microsoft

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.

Cloudflare-hosted
LoRA

uform-gen2-qwen-500mBeta

Image-to-Text • Unum

UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.

Cloudflare-hosted
Deprecated

bart-large-cnnBeta

Summarization • Meta

BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. You can use this model for text summarization.

Cloudflare-hosted
Deprecated

phi-2Beta

Phi-2 is a Transformer-based model with a next-word prediction objective, trained on 1.4T tokens from multiple passes on a mixture of Synthetic and Web datasets for NLP and coding.

Cloudflare-hosted
Deprecated

sqlcoder-7b-2Beta

Text Generation • Defog

This model is intended to be used by non-technical users to understand data inside their SQL databases.

Cloudflare-hosted
Deprecated

detr-resnet-50Beta

Object Detection • Meta

DEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images).

Cloudflare-hosted

stable-diffusion-xl-lightningBeta

stable-diffusion-v1-5-img2imgBeta

SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps.

Cloudflare-hosted

dreamshaper-8-lcm

Text-to-Image • lykon

Stable Diffusion model that has been fine-tuned to be better at photorealism without sacrificing range.

Cloudflare-hosted

Text-to-Image • RunwayML

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images. Img2img generate a new image from an input image with Stable Diffusion.

Cloudflare-hosted

stable-diffusion-v1-5-inpaintingBeta

Text-to-Image • RunwayML

Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.

Cloudflare-hosted

stable-diffusion-xl-base-1.0Beta

Text-to-Image • Stability.ai

Diffusion-based text-to-image generative model by Stability AI. Generates and modify images based on text prompts.

Cloudflare-hosted

bge-large-en-v1.5

BAAI general embedding (Large) model that transforms any given text into a 1024-dimensional vector

Cloudflare-hosted
Batch

bge-small-en-v1.5

BAAI general embedding (Small) model that transforms any given text into a 384-dimensional vector

Cloudflare-hosted
Batch

llama-2-7b-chat-fp16

Full precision (fp16) generative text model with 7 billion parameters from Meta

Cloudflare-hosted
Deprecated

mistral-7b-instruct-v0.1

Instruct fine-tuned version of the Mistral-7b generative text model with 7 billion parameters

Cloudflare-hosted
LoRA
Deprecated

bge-base-en-v1.5

Text Classification • HuggingFace

BAAI general embedding (Base) model that transforms any given text into a 768-dimensional vector

Cloudflare-hosted
Batch

distilbert-sst-2-int8

Distilled BERT model that was finetuned on SST-2 for sentiment classification

Cloudflare-hosted

llama-2-7b-chat-int8

Image Classification • Microsoft

Quantized (int8) generative text model with 7 billion parameters from Meta

Cloudflare-hosted
Deprecated

m2m100-1.2b

Translation • Meta

Multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation

Cloudflare-hosted
Batch

resnet-50

50 layers deep image classification CNN trained on more than 1M images from ImageNet

Cloudflare-hosted

whisper

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Cloudflare-hosted

llama-3.1-70b-instruct

llama-3.1-8b-instruct-fast

Cloudflare-hosted
Deprecated