Models

Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

Batch
Function calling
Vision

claude-opus-4.7

Text Generation • Alibaba • Proxied

Claude Opus 4.7 is Anthropic's most capable generally available model to date. It is highly autonomous and performs exceptionally well on long-horizon agentic work, knowledge work, vision tasks, and memory tasks.

qwen3.5-397b-a17b

Alibaba's Qwen 3.5 is a 397B-parameter mixture-of-experts model with 17B active parameters, offering strong reasoning capabilities with efficient inference.

qwen3-max

Text Generation • Alibaba • Proxied

Alibaba's Qwen 3 Max is a large language model with strong coding, reasoning, and multilingual capabilities, served via DashScope's OpenAI-compatible endpoint.

Text-to-Video • PixVerse • Proxied

Pixverse v6 is the latest Pixverse video model with support for up to 15-second videos, customizable duration from 1 to 15 seconds, and audio generation.

v5.6

Text-to-Video • PixVerse • Proxied

Pixverse v5.6 is a video generation model supporting text-to-video and image-to-video with audio generation, customizable aspect ratios, and up to 1080p output.

q3-turbo

Text-to-Video • Vidu • Proxied

Vidu Q3 Turbo is a faster version of Vidu Q3 optimized for lower latency video generation while maintaining audio support and up to 16-second clips.

q3-pro

Text-to-Video • Vidu • Proxied

Vidu Q3 Pro is a high-quality video generation model supporting text-to-video, image-to-video, and start/end-frame-to-video workflows with audio and up to 16-second clips.

wan-2.6-image

Text-to-Image • Alibaba • Proxied

Alibaba's Wan 2.6 text-to-image model generating images from text prompts with optional negative prompts and customizable dimensions.

gen-4.5

Text-to-Video • RunwayML • Proxied

RunwayML's video generation model supporting both text-to-video and image-to-video with customizable duration, aspect ratio, and content moderation controls.

music-2.6

Music Generation • MiniMax • Proxied

MiniMax's music generation model that creates full-length songs with vocals from text prompts and lyrics, or instrumental tracks. Supports BPM/key control and auto-generated lyrics.

gpt-image-1.5

Text-to-Image • OpenAI • Proxied

OpenAI's image generation model that creates and edits images from text prompts, supporting multiple quality levels and output sizes.

imagen-4

Automatic Speech Recognition • AssemblyAI • Proxied

Google's latest image generation model producing high-quality, photorealistic images from text prompts with support for multiple aspect ratios.

universal-3-pro

AssemblyAI's Universal 3 Pro speech recognition model for high-accuracy transcription.

tts-1.5-mini

Text-to-Speech • Inworld • Proxied

Ultra-fast, cost-efficient text-to-speech with approximately 120ms latency and 15-language support.

tts-1.5-max

Text-to-Speech • Inworld • Proxied

Highest-quality text-to-speech with under 200ms latency, emotion control, and 15-language support.

speech-2.8-turbo

Text-to-Speech • MiniMax • Proxied

MiniMax Speech 2.8 Turbo turns text into natural, expressive speech with voice cloning, emotion control, and 40+ language support at faster speeds.

m2.7

Text Generation • MiniMax • Proxied

MiniMax's M2.7 language model with multilingual capabilities.

speech-2.8-hd

Text-to-Speech • MiniMax • Proxied

MiniMax Speech 2.8 HD focuses on studio-grade audio generation with emotion control, multilingual support (40+ languages), and voice cloning.

hailuo-2.3-fast

Text-to-Video • MiniMax • Proxied

A lower-latency version of Hailuo 2.3 that preserves core motion quality, visual consistency, and stylization while enabling faster iteration.

hailuo-2.3

Text-to-Video • MiniMax • Proxied

A high-fidelity video generation model optimized for realistic human motion, cinematic VFX, expressive characters, and strong prompt and style adherence across text-to-video and image-to-video workflows.

recraftv4-pro-vector

Generate detailed, production-ready SVG vector graphics from text prompts with fine geometry, scalable to any size for print and design work.

recraftv4-vector

Generate production-ready SVG vector graphics from text prompts with clean geometry, structured layers, and editable paths.

recraftv4

Recraft V4 generates art-directed images with strong composition, accurate text rendering, and design taste built in. Fast and cost-efficient at standard resolution.

recraftv4-pro

Text Generation • Google • Proxied

Recraft V4 Pro generates high-resolution, art-directed images at 2048px+ with strong composition, text rendering, and design taste. Built for print and production work.

gemini-3-flash

Gemini 3 Flash is Google's fast multimodal model with frontier intelligence, superior search, and grounding capabilities.

gemini-3.1-flash-lite

Text Generation • Google • Proxied

Google's lightest and most cost-efficient Gemini model for high-throughput tasks.

gemini-3.1-pro

Text Generation • Google • Proxied

Google's most intelligent Gemini model with improved reasoning, a medium thinking level, and a 1M token context window.

tts-1

Text-to-Speech • OpenAI • Proxied

OpenAI's text-to-speech model optimized for real-time use with low latency.

tts-1-hd

Text-to-Speech • OpenAI • Proxied

OpenAI's high-definition text-to-speech model producing higher quality audio output.

gpt-4o-transcribe

Automatic Speech Recognition • OpenAI • Proxied

A speech-to-text model that uses GPT-4o to transcribe audio with improved word error rate and better language recognition compared to original Whisper models.

o4-mini

OpenAI's fast, lightweight reasoning model optimized for multi-step problem solving at lower cost.

gpt-4.1

OpenAI's flagship GPT model for complex tasks with a million-token context window.

gpt-4.1-mini

Fast, affordable version of GPT-4.1 with a million-token context window.

gpt-5

OpenAI's model excelling at coding, writing, and reasoning.

gpt-5.4-nano

GPT-5.4 Nano is OpenAI's smallest and fastest model, optimized for edge and low-latency use cases.

gpt-5.4-mini

GPT-5.4 Mini is a smaller, faster, and more cost-efficient version of GPT-5.4 for lightweight tasks.

claude-haiku-4.5

Claude Haiku 4.5 delivers similar levels of coding performance at one-third the cost and more than twice the speed of larger models.

claude-sonnet-4

Claude Sonnet 4 delivers superior coding and reasoning while responding more precisely to instructions, a significant upgrade over previous versions.

claude-sonnet-4.5

Claude Sonnet 4.5 is the best coding model to date, with significant improvements across the entire development lifecycle.

claude-sonnet-4.6

Claude Sonnet 4.6 is Anthropic's latest balanced model offering strong coding, reasoning, and agentic capabilities with improved instruction following.

claude-opus-4.6

Text-to-Image • Black Forest Labs • Proxied

Claude Opus 4.6 is Anthropic's flagship language model built for complex, multi-step work in coding, financial analysis, and legal reasoning. It uses extended thinking to work through complex problems carefully and features a one million token context window.

flux-2-klein-9b

FLUX 2 Klein is Black Forest Labs' 9-billion parameter image generation model optimized for fast inference.

seedream-5-lite

Text-to-Image • ByteDance • Proxied

Seedream 5 Lite is a lighter, faster version of the Seedream 5 family with multi-reference and batch generation support.

seedream-4.5

Text-to-Image • ByteDance • Proxied

Seedream 4.5 builds on 4.0 with multi-reference image support, batch generation, and sequential image generation.

seedream-4.0

Text-to-Image • ByteDance • Proxied

Seedream 4.0 is ByteDance's image creation model that combines text-to-image generation and image editing into a single architecture, offering fast, high-resolution output up to 4K.

nano-banana-2

Google's second-generation image generation model with improved quality and speed.

nano-banana-pro

Google's higher-quality image generation model with improved detail and prompt adherence.

nano-banana

Google's fast image generation model producing high-quality images from text prompts.

veo-3.1-fast

A faster version of Veo 3.1 optimized for lower latency while maintaining high-quality video and audio output.

veo-3-fast

A faster version of Veo 3 optimized for lower latency video generation with audio support.

veo-3.1

Google's latest video generation model with improved quality, motion, and audio generation.

veo-3

Text Generation • Moonshot AI • Proxied

Google's video generation model capable of producing high-quality videos with optional audio from text prompts.

kimi-k2.5

Kimi K2.5 is Moonshot AI's language model with strong coding, reasoning, and multilingual capabilities.

gpt-5.4

GPT-5.4 is OpenAI's flagship model with strong coding, reasoning, and multimodal capabilities.

gemma-4-26b-a4b-it

Text Generation • NVIDIA • Hosted

Gemma 4 is Google's most intelligent family of open models, built from Gemini 3 research to maximize intelligence-per-parameter.

Function calling
Reasoning
Vision

nemotron-3-120b-a12b

NVIDIA Nemotron 3 Super is a hybrid MoE model with leading accuracy for multi-agent applications and specialized agentic AI systems.

Function calling
Reasoning

flux-2-klein-9b

FLUX.2 [klein] 9B is an ultra-fast, distilled image model with enhanced quality. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.

Partner

flux-2-klein-4b

FLUX.2 [klein] is an ultra-fast, distilled image model. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.

Partner

flux-2-dev

Text-to-Speech • Deepgram • Hosted

FLUX.2 [dev] is an image model from Black Forest Labs where you can generate highly realistic and detailed images, with multi-reference support.

Partner

aura-2-es

Aura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

Batch
Partner
Real-time

aura-2-en

Text-to-Speech • Deepgram • Hosted

Batch
Partner
Real-time

granite-4.0-h-micro

Text Generation • IBM • Hosted

Granite 4.0 instruct models deliver strong performance across benchmarks, achieving industry-leading results in key agentic tasks like instruction following and function calling. These efficiencies make the models well-suited for a wide range of use cases like retrieval-augmented generation (RAG), multi-agent workflows, and edge deployments.

Function calling

flux

Automatic Speech Recognition • Deepgram • Hosted

Flux is the first conversational speech recognition model built specifically for voice agents.

Partner
Real-time

plamo-embedding-1b

Text Embeddings • pfnet • Hosted

PLaMo-Embedding-1B is a Japanese text embedding model developed by Preferred Networks, Inc. It can convert Japanese text input into numerical vectors and can be used for a wide range of applications, including information retrieval, text classification, and clustering.

gemma-sea-lion-v4-27b-it

Text Generation • aisingapore • Hosted

SEA-LION stands for Southeast Asian Languages In One Network, which is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned for the Southeast Asia (SEA) region.

indictrans2-en-indic-1B

Translation • ai4bharat • Hosted

IndicTrans2 is the first open-source transformer-based multilingual NMT model that supports high-quality translations across all the 22 scheduled Indic languages

embeddinggemma-300m

Text Embeddings • Google • Hosted

EmbeddingGemma is a 300M parameter, state-of-the-art for its size, open embedding model from Google, built from Gemma 3 (with T5Gemma initialization) and the same research and technology used to create Gemini models. EmbeddingGemma produces vector representations of text, making it well-suited for search and retrieval tasks, including classification, clustering, and semantic similarity search. This model was trained with data in 100+ spoken languages.

aura-1

Text-to-Speech • Deepgram • Hosted

Aura is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

Batch
Partner
Real-time

lucid-origin

Text-to-Image • Leonardo • Hosted

Lucid Origin from Leonardo.AI is their most adaptable and prompt-responsive model to date. Whether you're generating images with sharp graphic design, stunning full-HD renders, or highly specific creative direction, it adheres closely to your prompts, renders text with accuracy, and supports a wide array of visual styles and aesthetics – from stylized concept art to crisp product mockups.

Partner

phoenix-1.0

Text-to-Image • Leonardo • Hosted

Phoenix 1.0 is a model by Leonardo.Ai that generates images with exceptional prompt adherence and coherent text.

Partner

gpt-oss-20b

Text Generation • OpenAI • Hosted

OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-20b is for lower latency, and local or specialized use-cases.

Function calling
Reasoning

smart-turn-v2

Voice Activity Detection • Pipecat • Hosted

An open source, community-driven, native audio turn detection model in 2nd version

Batch
Real-time

qwen3-embedding-0.6b

Text Embeddings • Qwen • Hosted

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks.

nova-3

Automatic Speech Recognition • Deepgram • Hosted

Transcribe audio using Deepgram’s speech-to-text model

Batch
Partner
Real-time

qwen3-30b-a3b-fp8

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

Batch
Function calling
Reasoning

gemma-3-12b-it

mistral-small-3.1-24b-instruct

Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

LoRA

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.

Function calling

qwq-32b

qwen2.5-coder-32b-instruct

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

LoRA
Reasoning

Text Classification • BAAI • Hosted

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

LoRA

bge-reranker-base

Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.

llama-guard-3-8b

deepseek-r1-distill-qwen-32b

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

LoRA

Text Generation • DeepSeek • Hosted

DeepSeek-R1-Distill-Qwen-32B is a model distilled from DeepSeek-R1 based on Qwen2.5. It outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

Reasoning

llama-3.3-70b-instruct-fp8-fast

Llama 3.3 70B quantized to fp8 precision, optimized to be faster.

Batch
Function calling

llama-3.2-1b-instruct

The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.

llama-3.2-3b-instruct

llama-3.2-11b-vision-instruct

The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.

The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.

LoRA
Vision

flux-1-schnell

llama-3.1-8b-instruct-awq

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.

llama-3.1-8b-instruct-fp8

Quantized (int4) generative text model with 8 billion parameters from Meta.

Text-to-Speech • MyShell • Hosted

Llama 3.1 8B quantized to FP8 precision

melotts

MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai.

llama-3.1-8b-instruct

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

bge-m3

Multi-Functionality, Multi-Linguality, and Multi-Granularity embeddings model.

meta-llama-3-8b-instruct

Automatic Speech Recognition • OpenAI • Hosted

Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

whisper-large-v3-turbo

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation.

Batch

llama-3-8b-instruct-awq

Image-to-Text • llava-hf • Hosted

Quantized (int4) generative text model with 8 billion parameters from Meta.

llava-1.5-7b-hfBeta

LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

una-cybertron-7b-v2-bf16Beta

Text Generation • fblgit • Hosted

Cybertron 7B v2 is a 7B MistralAI based model, best on it's series. It was trained with SFT, DPO and UNA (Unified Neural Alignment) on multiple datasets.

Deprecated

whisper-tiny-enBeta

Automatic Speech Recognition • OpenAI • Hosted

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. This is the English-only version of the Whisper Tiny model which was trained on the task of speech recognition.

llama-3-8b-instruct

mistral-7b-instruct-v0.2Beta

Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2. Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1: 32k context window (vs 8k context in v0.1), rope-theta = 1e6, and no Sliding-Window Attention.

LoRA

gemma-7b-it-loraBeta

This is a Gemma-7B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

LoRA

gemma-2b-it-loraBeta

llama-2-7b-chat-hf-loraBeta

This is a Gemma-2B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

LoRA

This is a Llama2 base model that Cloudflare dedicated for inference with LoRA adapters. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.

LoRA

gemma-7b-itBeta

Text Generation • nexusflow • Hosted

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.

LoRA

starling-lm-7b-betaBeta

We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from Openchat-3.5-0106 with our new reward model Nexusflow/Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO).

Deprecated

hermes-2-pro-mistral-7bBeta

Text Generation • nousresearch • Hosted

Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes! Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Function calling

mistral-7b-instruct-v0.2-loraBeta

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.

LoRA

qwen1.5-1.8b-chatBeta

Image-to-Text • Unum • Hosted

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud.

Deprecated

uform-gen2-qwen-500mBeta

UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.

bart-large-cnnBeta

Summarization • Meta • Hosted

BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. You can use this model for text summarization.

phi-2Beta

Text Generation • Microsoft • Hosted

Phi-2 is a Transformer-based model with a next-word prediction objective, trained on 1.4T tokens from multiple passes on a mixture of Synthetic and Web datasets for NLP and coding.

tinyllama-1.1b-chat-v1.0Beta

Text Generation • tinyllama • Hosted

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T.

Deprecated

qwen1.5-14b-chat-awqBeta

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.

Deprecated

qwen1.5-7b-chat-awqBeta

Deprecated

qwen1.5-0.5b-chatBeta

discolm-german-7b-v1-awqBeta

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud.

Deprecated

Text Generation • TII UAE • Hosted

DiscoLM German 7b is a Mistral-based large language model with a focus on German-language applications. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.

Deprecated

falcon-7b-instructBeta

Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.

Deprecated

openchat-3.5-0106Beta

Text Generation • openchat • Hosted

OpenChat is an innovative library of open-source language models, fine-tuned with C-RLFT - a strategy inspired by offline reinforcement learning.

Deprecated

sqlcoder-7b-2Beta

Text Generation • Defog • Hosted

This model is intended to be used by non-technical users to understand data inside their SQL databases.

deepseek-math-7b-instructBeta

Text Generation • DeepSeek • Hosted

DeepSeekMath-Instruct 7B is a mathematically instructed tuning model derived from DeepSeekMath-Base 7B. DeepSeekMath is initialized with DeepSeek-Coder-v1.5 7B and continues pre-training on math-related tokens sourced from Common Crawl, together with natural language and code data for 500B tokens.

Deprecated

detr-resnet-50Beta

Object Detection • Meta • Hosted

DEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images).

stable-diffusion-xl-lightningBeta

Text-to-Image • ByteDance • Hosted

SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps.

dreamshaper-8-lcm

Text-to-Image • lykon • Hosted

Stable Diffusion model that has been fine-tuned to be better at photorealism without sacrificing range.

stable-diffusion-v1-5-img2imgBeta

Text-to-Image • RunwayML • Hosted

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images. Img2img generate a new image from an input image with Stable Diffusion.

stable-diffusion-v1-5-inpaintingBeta

Text-to-Image • RunwayML • Hosted

Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.

deepseek-coder-6.7b-instruct-awqBeta

deepseek-coder-6.7b-base-awqBeta

Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.

Deprecated

Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.

Deprecated

llamaguard-7b-awqBeta

neural-chat-7b-v3-1-awqBeta

Llama Guard is a model for classifying the safety of LLM prompts and responses, using a taxonomy of safety risks.

Deprecated

openhermes-2.5-mistral-7b-awqBeta

This model is a fine-tuned 7B parameter LLM on the Intel Gaudi 2 processor from the mistralai/Mistral-7B-v0.1 on the open source dataset Open-Orca/SlimOrca.

Deprecated

OpenHermes 2.5 Mistral 7B is a state of the art Mistral Fine-tune, a continuation of OpenHermes 2 model, which trained on additional code datasets.

Deprecated

llama-2-13b-chat-awqBeta

mistral-7b-instruct-v0.1-awqBeta

Llama 2 13B Chat AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Llama 2 variant.

Deprecated

Mistral 7B Instruct v0.1 AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Mistral variant.

Deprecated

zephyr-7b-beta-awqBeta

stable-diffusion-xl-base-1.0Beta

Zephyr 7B Beta AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Zephyr model variant.

Deprecated

Text-to-Image • Stability.ai • Hosted

Diffusion-based text-to-image generative model by Stability AI. Generates and modify images based on text prompts.

bge-large-en-v1.5

BAAI general embedding (Large) model that transforms any given text into a 1024-dimensional vector

Batch

bge-small-en-v1.5

BAAI general embedding (Small) model that transforms any given text into a 384-dimensional vector

Batch

llama-2-7b-chat-fp16

Full precision (fp16) generative text model with 7 billion parameters from Meta

mistral-7b-instruct-v0.1

Instruct fine-tuned version of the Mistral-7b generative text model with 7 billion parameters

LoRA

bge-base-en-v1.5

Text Classification • HuggingFace • Hosted

BAAI general embedding (Base) model that transforms any given text into a 768-dimensional vector

Batch

distilbert-sst-2-int8

Distilled BERT model that was finetuned on SST-2 for sentiment classification

llama-2-7b-chat-int8

Translation • Meta • Hosted

Quantized (int8) generative text model with 7 billion parameters from Meta

m2m100-1.2b

Multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation

Batch

resnet-50

Image Classification • Microsoft • Hosted

50 layers deep image classification CNN trained on more than 1M images from ImageNet

whisper

Automatic Speech Recognition • OpenAI • Hosted

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

llama-3.1-70b-instruct

llama-3.1-8b-instruct-fast