# Changelog
URL: https://developers.cloudflare.com/workers-ai/changelog/
import { ProductReleaseNotes } from "~/components";
{/* */}
---
# Demos and architectures
URL: https://developers.cloudflare.com/workers-ai/demos/
import { ExternalResources, GlossaryTooltip, ResourcesBySelector } from "~/components"
Workers AI can be used to build dynamic and performant services. The following demo applications and reference architectures showcase how to use Workers AI optimally within your architecture.
## Demos
Explore the following demo applications for Workers AI.
## Reference architectures
Explore the following reference architectures that use Workers AI:
---
# Glossary
URL: https://developers.cloudflare.com/workers-ai/glossary/
import { Glossary } from "~/components";
Review the definitions for terms used across Cloudflare's Workers AI documentation.
---
# Overview
URL: https://developers.cloudflare.com/workers-ai/
import { CardGrid, Description, Feature, LinkTitleCard, Plan, RelatedProduct, Render, LinkButton, Flex } from "~/components"
Run machine learning models, powered by serverless GPUs, on Cloudflare's global network.
Workers AI allows you to run AI models in a serverless way, without having to worry about scaling, maintaining, or paying for unused infrastructure. You can invoke models running on GPUs on Cloudflare's network from your own code โ from [Workers](/workers/), [Pages](/pages/), or anywhere via [the Cloudflare API](/api/resources/ai/methods/run/).
Workers AI gives you access to:
- **50+ [open-source models](/workers-ai/models/)**, available as a part of our model catalog
- Serverless, **pay-for-what-you-use** [pricing model](/workers-ai/platform/pricing/)
- All as part of a **fully-featured developer platform**, including [AI Gateway](/ai-gateway/), [Vectorize](/vectorize/), [Workers](/workers/) and more...
Get startedWatch a Workers AI demo
***
## Features
Workers AI comes with a curated set of popular open-source models that enable you to do tasks such as image classification, text generation, object detection and more.
***
## Related products
Observe and control your AI applications with caching, rate limiting, request retries, model fallback, and more.
Build full-stack AI applications with Vectorize, Cloudflareโs vector database. Adding Vectorize enables you to perform tasks such as semantic search, recommendations, anomaly detection or can be used to provide context and memory to an LLM.
Build serverless applications and deploy instantly across the globe for exceptional performance, reliability, and scale.
Create full-stack applications that are instantly deployed to the Cloudflare global network.
Store large amounts of unstructured data without the costly egress bandwidth fees associated with typical cloud storage services.
Create new serverless SQL databases to query from your Workers and Pages projects.
A globally distributed coordination API with strongly consistent storage.
Create a global, low-latency, key-value data storage.
***
## More resources
Build and deploy your first Workers AI application.
Learn about Free and Paid plans.
Learn about Workers AI limits.
Learn how you can build and deploy ambitious AI applications to Cloudflare's global network.
Learn which storage option is best for your project.
Connect with the Workers community on Discord to ask questions, share what you are building, and discuss the platform with other developers.
Follow @CloudflareDev on Twitter to learn about product announcements, and what is new in Cloudflare Workers.
---
# Privacy
URL: https://developers.cloudflare.com/workers-ai/privacy/
Cloudflare processes certain customer data in order to provide the Workers AI service, subject to our [Privacy Policy](https://www.cloudflare.com/privacypolicy/) and [Self-Serve Subscription Agreement](https://www.cloudflare.com/terms/) or [Enterprise Subscription Agreement](https://www.cloudflare.com/enterpriseterms/) (as applicable).
Cloudflare neither creates nor trains the AI models made available on Workers AI. The models constitute Third-Party Services and may be subject to open source or other license terms that apply between you and the model provider. Be sure to review the license terms applicable to each model (if any).
Your inputs (e.g., text prompts, image submissions, audio files, etc.), outputs (e.g., generated text/images, translations, etc.), embeddings, and training data constitute Customer Content.
For Workers AI:
* You own, and are responsible for, all of your Customer Content.
* Cloudflare does not make your Customer Content available to any other Cloudflare customer.
* Cloudflare does not use your Customer Content to (1) train any AI models made available on Workers AI or (2) improve any Cloudflare or third-party services, and would not do so unless we received your explicit consent.
* Your Customer Content for Workers AI may be stored by Cloudflare if you specifically use a storage service (e.g., R2, KV, DO, Vectorize, etc.) in conjunction with Workers AI.
---
# Errors
URL: https://developers.cloudflare.com/workers-ai/workers-ai-errors/
Below is a list of Workers AI errors.
| **Name** | **Internal Code** | **HTTP Code** | **Description** |
| ------------------------------------- | ----------------- | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| No such model | `5007` | `400` | No such model `${model}` or task |
| Invalid data | `5004` | `400` | Invalid data type for base64 input: `${type}` |
| Finetune missing required files | `3039` | `400` | Finetune is missing required files `(model.safetensors and config.json) ` |
| Incomplete request | `3003` | `400` | Request is missing headers or body: `{what}` |
| Account not allowed for private model | `5018` | `403` | The account is not allowed to access this model |
| Model agreement | `5016` | `403` | User has not agreed to Llama3.2 model terms |
| Account blocked | `3023` | `403` | Service unavailable for account |
| Account not allowed for private model | `3041` | `403` | The account is not allowed to access this model |
| Deprecated SDK version | `5019` | `405` | Request trying to use deprecated SDK version |
| LoRa unsupported | `5005` | `405` | The model `${this.model}` does not support LoRa inference |
| Invalid model ID | `3042` | `404` | The model name is invalid |
| Request too large | `3006` | `413` | Request is too large |
| Timeout | `3007` | `408` | Request timeout |
| Aborted | `3008` | `408` | Request was aborted |
| Account limited | `3036` | `429` | You have used up your daily free allocation of 10,000 neurons. Please upgrade to Cloudflare's Workers Paid plan if you would like to continue usage. |
| Out of capacity | `3040` | `429` | No more data centers to forward the request to |
---
# AI SDK
URL: https://developers.cloudflare.com/workers-ai/configuration/ai-sdk/
Workers AI can be used with the [AI SDK](https://sdk.vercel.ai/) for JavaScript and TypeScript codebases.
## Setup
Install the [`workers-ai-provider` provider](https://sdk.vercel.ai/providers/community-providers/cloudflare-workers-ai):
```bash
npm install workers-ai-provider
```
Then, add an AI binding in your Workers project Wrangler file:
```toml
[ai]
binding = "AI"
```
## Models
The AI SDK can be configured to work with [any AI model](/workers-ai/models/).
```js
import { createWorkersAI } from 'workers-ai-provider';
const workersai = createWorkersAI({ binding: env.AI });
// Choose any model: https://developers.cloudflare.com/workers-ai/models/
const model = workersai('@cf/meta/llama-3.1-8b-instruct', {});
```
## Generate Text
Once you have selected your model, you can generate text from a given prompt.
```js
import { createWorkersAI } from 'workers-ai-provider';
import { generateText } from 'ai';
type Env = {
AI: Ai;
};
export default {
async fetch(_: Request, env: Env) {
const workersai = createWorkersAI({ binding: env.AI });
const result = await generateText({
model: workersai('@cf/meta/llama-2-7b-chat-int8'),
prompt: 'Write a 50-word essay about hello world.',
});
return new Response(result.text);
},
};
```
## Stream Text
For longer responses, consider streaming responses to provide as the generation completes.
```js
import { createWorkersAI } from 'workers-ai-provider';
import { streamText } from 'ai';
type Env = {
AI: Ai;
};
export default {
async fetch(_: Request, env: Env) {
const workersai = createWorkersAI({ binding: env.AI });
const result = streamText({
model: workersai('@cf/meta/llama-2-7b-chat-int8'),
prompt: 'Write a 50-word essay about hello world.',
});
return result.toTextStreamResponse({
headers: {
// add these headers to ensure that the
// response is chunked and streamed
'Content-Type': 'text/x-unknown',
'content-encoding': 'identity',
'transfer-encoding': 'chunked',
},
});
},
};
```
## Generate Structured Objects
You can provide a Zod schema to generate a structured JSON response.
```js
import { createWorkersAI } from 'workers-ai-provider';
import { generateObject } from 'ai';
import { z } from 'zod';
type Env = {
AI: Ai;
};
export default {
async fetch(_: Request, env: Env) {
const workersai = createWorkersAI({ binding: env.AI });
const result = await generateObject({
model: workersai('@cf/meta/llama-3.1-8b-instruct'),
prompt: 'Generate a Lasagna recipe',
schema: z.object({
recipe: z.object({
ingredients: z.array(z.string()),
description: z.string(),
}),
}),
});
return Response.json(result.object);
},
};
```
---
# Workers Bindings
URL: https://developers.cloudflare.com/workers-ai/configuration/bindings/
import { Type, MetaInfo, WranglerConfig } from "~/components";
## Workers
[Workers](/workers/) provides a serverless execution environment that allows you to create new applications or augment existing ones.
To use Workers AI with Workers, you must create a Workers AI [binding](/workers/runtime-apis/bindings/). Bindings allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform. You create bindings on the Cloudflare dashboard or by updating your [Wrangler file](/workers/wrangler/configuration/).
To bind Workers AI to your Worker, add the following to the end of your Wrangler file:
```toml
[ai]
binding = "AI" # i.e. available in your Worker on env.AI
```
## Pages Functions
[Pages Functions](/pages/functions/) allow you to build full-stack applications with Cloudflare Pages by executing code on the Cloudflare network. Functions are Workers under the hood.
To configure a Workers AI binding in your Pages Function, you must use the Cloudflare dashboard. Refer to [Workers AI bindings](/pages/functions/bindings/#workers-ai) for instructions.
## Methods
### async env.AI.run()
`async env.AI.run()` runs a model. Takes a model as the first parameter, and an object as the second parameter.
```javascript
const answer = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
prompt: "What is the origin of the phrase 'Hello, World'"
});
```
**Parameters**
* `model`
* The model to run.
**Supported options**
* `stream`
* Returns a stream of results as they are available.
```javascript
const answer = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
prompt: "What is the origin of the phrase 'Hello, World'",
stream: true
});
return new Response(answer, {
headers: { "content-type": "text/event-stream" }
});
```
---
# Hugging Face Chat UI
URL: https://developers.cloudflare.com/workers-ai/configuration/hugging-face-chat-ui/
Use Workers AI with [Chat UI](https://github.com/huggingface/chat-ui?tab=readme-ov-file#text-embedding-models), an open-source chat interface offered by Hugging Face.
## Prerequisites
You will need the following:
* A [Cloudflare account](https://dash.cloudflare.com)
* Your [Account ID](/fundamentals/setup/find-account-and-zone-ids/)
* An [API token](/workers-ai/get-started/rest-api/#1-get-api-token-and-account-id) for Workers AI
## Setup
First, decide how to reference your Account ID and API token (either directly in your `.env.local` using the `CLOUDFLARE_ACCOUNT_ID` and `CLOUDFLARE_API_TOKEN` variables or in the endpoint configuration).
Then, follow the rest of the setup instructions in the [Chat UI GitHub repository](https://github.com/huggingface/chat-ui?tab=readme-ov-file#text-embedding-models).
When setting up your models, specify the `cloudflare` endpoint.
```json
{
"name" : "nousresearch/hermes-2-pro-mistral-7b",
"tokenizer": "nousresearch/hermes-2-pro-mistral-7b",
"parameters": {
"stop": ["<|im_end|>"]
},
"endpoints" : [
{
"type": "cloudflare",
// optionally specify these if not included in .env.local
"accountId": "your-account-id",
"apiToken": "your-api-token"
//
}
]
}
```
## Supported models
This template works with any [text generation models](/workers-ai/models/) that begin with the `@hf` parameter.
---
# Configuration
URL: https://developers.cloudflare.com/workers-ai/configuration/
import { DirectoryListing } from "~/components";
---
# OpenAI compatible API endpoints
URL: https://developers.cloudflare.com/workers-ai/configuration/open-ai-compatibility/
import { Render } from "~/components"
## Usage
### Workers AI
Normally, Workers AI requires you to specify the model name in the cURL endpoint or within the `env.AI.run` function.
With OpenAI compatible endpoints,you can leverage the [openai-node sdk](https://github.com/openai/openai-node) to make calls to Workers AI. This allows you to use Workers AI by simply changing the base URL and the model name.
```js title="OpenAI SDK Example"
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: env.CLOUDFLARE_API_KEY,
baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`
});
const chatCompletion = await openai.chat.completions.create({
messages: [{ role: "user", content: "Make some robot noises" }],
model: "@cf/meta/llama-3.1-8b-instruct",
});
const embeddings = await openai.embeddings.create({
model: "@cf/baai/bge-large-en-v1.5",
input: "I love matcha"
});
```
```bash title="cURL example"
curl --request POST \
--url https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/v1/chat/completions \
--header "Authorization: Bearer {api_token}" \
--header "Content-Type: application/json" \
--data '
{
"model": "@cf/meta/llama-3.1-8b-instruct",
"messages": [
{
"role": "user",
"content": "how to build a wooden spoon in 3 short steps? give as short as answer as possible"
}
]
}
'
```
### AI Gateway
These endpoints are also compatible with [AI Gateway](/ai-gateway/providers/workersai/#openai-compatible-endpoints).
---
# Function calling
URL: https://developers.cloudflare.com/workers-ai/function-calling/
import { Stream, TabItem, Tabs } from "~/components";
Function calling enables people to take Large Language Models (LLMs) and use the model response to execute functions or interact with external APIs. The developer usually defines a set of functions and the required input schema for each function, which we call `tools`. The model then intelligently understands when it needs to do a tool call, and it returns a JSON output which the user needs to feed to another function or API.
In essence, function calling allows you to perform actions with LLMs by executing code or making additional API calls.
## How can I use function calling?
Workers AI has [embedded function calling](/workers-ai/function-calling/embedded/) which allows you to execute function code alongside your inference calls. We have a package called [`@cloudflare/ai-utils`](https://www.npmjs.com/package/@cloudflare/ai-utils) to help facilitate this, which we have open-sourced on [Github](https://github.com/cloudflare/ai-utils).
For industry-standard function calling, take a look at the documentation on [Traditional Function Calling](/workers-ai/function-calling/traditional/).
To show you the value of embedded function calling, take a look at the example below that compares traditional function calling with embedded function calling. Embedded function calling allowed us to cut down the lines of code from 77 to 31.
```sh
# The ai-utils package enables embedded function calling
npm i @cloudflare/ai-utils
```
```js title="Embedded function calling example"
import {
createToolsFromOpenAPISpec,
runWithTools,
autoTrimTools,
} from "@cloudflare/ai-utils";
export default {
async fetch(request, env, ctx) {
const response = await runWithTools(
env.AI,
"@hf/nousresearch/hermes-2-pro-mistral-7b",
{
messages: [{ role: "user", content: "Who is Cloudflare on github?" }],
tools: [
// You can pass the OpenAPI spec link or contents directly
...(await createToolsFromOpenAPISpec(
"https://gist.githubusercontent.com/mchenco/fd8f20c8f06d50af40b94b0671273dc1/raw/f9d4b5cd5944cc32d6b34cad0406d96fd3acaca6/partial_api.github.com.json",
{
overrides: [
{
// for all requests on *.github.com, we'll need to add a User-Agent.
matcher: ({ url, method }) => {
return url.hostname === "api.github.com";
},
values: {
headers: {
"User-Agent":
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
},
},
},
],
},
)),
],
},
).then((response) => {
return response;
});
return new Response(JSON.stringify(response));
},
};
```
```js title="Traditional function calling example"
export default {
async fetch(request, env, ctx) {
const response = await env.AI.run(
"@hf/nousresearch/hermes-2-pro-mistral-7b",
{
messages: [{ role: "user", content: "Who is Cloudflare on github?" }],
tools: [
{
name: "getGithubUser",
description:
"Provides publicly available information about someone with a GitHub account.",
parameters: {
type: "object",
properties: {
username: {
type: "string",
description: "The handle for the GitHub user account.",
},
},
required: ["username"],
},
},
],
},
);
const selected_tool = response.tool_calls[0];
let res;
if (selected_tool.name == "getGithubUser") {
try {
const username = selected_tool.arguments.username;
const url = `https://api.github.com/users/${username}`;
res = await fetch(url, {
headers: {
// Github API requires a User-Agent header
"User-Agent":
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
},
}).then((res) => res.json());
} catch (error) {
return error;
}
}
const finalResponse = await env.AI.run(
"@hf/nousresearch/hermes-2-pro-mistral-7b",
{
messages: [
{
role: "user",
content: "Who is Cloudflare on github?",
},
{
role: "assistant",
content: "",
tool_call: selected_tool.name,
},
{
role: "tool",
name: selected_tool.name,
content: JSON.stringify(res),
},
],
tools: [
{
name: "getGithubUser",
description:
"Provides publicly available information about someone with a GitHub account.",
parameters: {
type: "object",
properties: {
username: {
type: "string",
description: "The handle for the GitHub user account.",
},
},
required: ["username"],
},
},
],
},
);
return new Response(JSON.stringify(finalResponse));
},
};
```
## What models support function calling?
There are open-source models which have been fine-tuned to do function calling. When browsing our [model catalog](/workers-ai/models/), look for models with the function calling property beside it. For example, [@hf/nousresearch/hermes-2-pro-mistral-7b](/workers-ai/models/hermes-2-pro-mistral-7b/) is a fine-tuned variant of Mistral 7B that you can use for function calling.
---
# Traditional
URL: https://developers.cloudflare.com/workers-ai/function-calling/traditional/
This page shows how you can do traditional function calling, as defined by industry standards. Workers AI also offers [embedded function calling](/workers-ai/function-calling/embedded/), which is drastically easier than traditional function calling.
With traditional function calling, you define an array of tools with the name, description, and tool arguments. The example below shows how you would pass a tool called `getWeather` in an inference request to a model.
```js title="Traditional function calling example"
const response = await env.AI.run("@hf/nousresearch/hermes-2-pro-mistral-7b", {
messages: [
{
role: "user",
content: "what is the weather in london?",
},
],
tools: [
{
name: "getWeather",
description: "Return the weather for a latitude and longitude",
parameters: {
type: "object",
properties: {
latitude: {
type: "string",
description: "The latitude for the given location",
},
longitude: {
type: "string",
description: "The longitude for the given location",
},
},
required: ["latitude", "longitude"],
},
},
],
});
return new Response(JSON.stringify(response.tool_calls));
```
The LLM will then return a JSON object with the required arguments and the name of the tool that was called. You can then pass this JSON object to make an API call.
```json
[{"arguments":{"latitude":"51.5074","longitude":"-0.1278"},"name":"getWeather"}]
```
For a working example on how to do function calling, take a look at our [demo app](https://github.com/craigsdennis/lightbulb-moment-tool-calling/blob/main/src/index.ts).
---
# Fine-tunes
URL: https://developers.cloudflare.com/workers-ai/fine-tunes/
import { Feature } from "~/components"
Learn how to use Workers AI to get fine-tuned inference.
Upload a LoRA adapter and run fine-tuned inference with one of our base models.
***
## What is fine-tuning?
Fine-tuning is a general term for modifying an AI model by continuing to train it with additional data. The goal of fine-tuning is to increase the probability that a generation is similar to your dataset. Training a model from scratch is not practical for many use cases given how expensive and time consuming they can be to train. By fine-tuning an existing pre-trained model, you benefit from its capabilities while also accomplishing your desired task.
[Low-Rank Adaptation](https://arxiv.org/abs/2106.09685) (LoRA) is a specific fine-tuning method that can be applied to various model architectures, not just LLMs. It is common that the pre-trained model weights are directly modified or fused with additional fine-tune weights in traditional fine-tuning methods. LoRA, on the other hand, allows for the fine-tune weights and pre-trained model to remain separate, and for the pre-trained model to remain unchanged. The end result is that you can train models to be more accurate at specific tasks, such as generating code, having a specific personality, or generating images in a specific style.
---
# Using LoRA adapters
URL: https://developers.cloudflare.com/workers-ai/fine-tunes/loras/
import { TabItem, Tabs } from "~/components"
Workers AI supports fine-tuned inference with adapters trained with [Low-Rank Adaptation](https://blog.cloudflare.com/fine-tuned-inference-with-loras). This feature is in open beta and free during this period.
## Limitations
* We only support LoRAs for the following models (must not be quantized):
* `@cf/meta-llama/llama-2-7b-chat-hf-lora`
* `@cf/mistral/mistral-7b-instruct-v0.2-lora`
* `@cf/google/gemma-2b-it-lora`
* `@cf/google/gemma-7b-it-lora`
* Adapter must be trained with rank `r <=8`. You can check the rank of a pre-trained LoRA adapter through the adapter's `config.json` file
* LoRA adapter file must be < 100MB
* LoRA adapter files must be named `adapter_config.json` and `adapter_model.safetensors` exactly
* You can test up to 30 LoRA adapters per account
***
## Choosing compatible LoRA adapters
### Finding open-source LoRA adapters
We have started a [Hugging Face Collection](https://huggingface.co/collections/Cloudflare/workers-ai-compatible-loras-6608dd9f8d305a46e355746e) that lists a few LoRA adapters that are compatible with Workers AI. Generally, any LoRA adapter that fits our limitations above should work.
### Training your own LoRA adapters
To train your own LoRA adapter, follow the [tutorial](/workers-ai/tutorials/fine-tune-models-with-autotrain).
***
## Uploading LoRA adapters
In order to run inference with LoRAs on Workers AI, you'll need to create a new fine tune on your account and upload your adapter files. You should have a `adapter_model.safetensors` file with model weights and `adapter_config.json` with your config information. *Note that we only accept adapter files in these types.*
Right now, you can't edit a fine tune's asset files after you upload it. We will support this soon, but for now you will need to create a new fine tune and upload files again if you would like to use a new LoRA.
Before you upload your LoRA adapter, you'll need to edit your `adapter_config.json` file to include `model_type` as one of `mistral`, `gemma` or `llama` like below.
```json null {10}
{
"alpha_pattern": {},
"auto_mapping": null,
...
"target_modules": [
"q_proj",
"v_proj"
],
"task_type": "CAUSAL_LM",
"model_type": "mistral",
}
```
### Wrangler
You can create a finetune and upload your LoRA adapter via wrangler with the following commands:
```bash title="wrangler CLI" {1,7}
npx wrangler ai finetune create
#๐ Creating new finetune "test-lora" for model "@cf/mistral/mistral-7b-instruct-v0.2-lora"...
#๐ Uploading file "/Users/abcd/Downloads/adapter_config.json" to "test-lora"...
#๐ Uploading file "/Users/abcd/Downloads/adapter_model.safetensors" to "test-lora"...
#โ Assets uploaded, finetune "test-lora" is ready to use.
npx wrangler ai finetune list
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ finetune_id โ name โ description โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโค
โ 00000000-0000-0000-0000-000000000000 โ test-lora โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโ
```
### REST API
Alternatively, you can use our REST API to create a finetune and upload your adapter files. You will need a Cloudflare API Token with `Workers AI: Edit` permissions to make calls to our REST API, which you can generate via the Cloudflare Dashboard.
#### Creating a fine-tune on your account
```bash title="cURL"
## Input: user-defined name of fine tune
## Output: unique finetune_id
curl -X POST https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/ \
-H "Authorization: Bearer {API_TOKEN}" \
-H 'Content-Type: application/json' \
-d '{
"model": "SUPPORTED_MODEL_NAME",
"name": "FINETUNE_NAME",
"description": "OPTIONAL_DESCRIPTION"
}'
```
#### Uploading your adapter weights and config
You have to call the upload endpoint each time you want to upload a new file, so you usually run this once for `adapter_model.safetensors` and once for `adapter_config.json`. Make sure you include the `@` before your path to files.
You can either use the finetune `name` or `id` that you used when you created the fine tune.
```bash title="cURL"
## Input: finetune_id, adapter_model.safetensors, then adapter_config.json
## Output: success true/false
curl -X POST https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/{FINETUNE_ID}/finetune-assets/ \
-H 'Authorization: Bearer {API_TOKEN}' \
-H 'Content-Type: multipart/form-data' \
-F 'file_name=adapter_model.safetensors' \
-F 'file=@{PATH/TO/adapter_model.safetensors}'
```
#### List fine-tunes in your account
You can call this method to confirm what fine-tunes you have created in your account
```bash title="cURL"
## Input: n/a
## Output: success true/false
curl -X GET https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/ \
-H 'Authorization: Bearer {API_TOKEN}'
```
```json title="Example JSON output"
# Example output JSON
{
"success": true,
"result": [
[{
"id": "00000000-0000-0000-0000-000000000",
"model": "@cf/meta-llama/llama-2-7b-chat-hf-lora",
"name": "llama2-finetune",
"description": "test"
},
{
"id": "00000000-0000-0000-0000-000000000",
"model": "@cf/mistralai/mistral-7b-instruct-v0.2-lora",
"name": "mistral-finetune",
"description": "test"
}]
]
}
```
***
## Running inference with LoRAs
To make inference requests and apply the LoRA adapter, you will need your model and finetune `name` or `id`. You should use the chat template that your LoRA was trained on, but you can try running it with `raw: true` and the messages template like below.
```javascript null {5-6}
const response = await env.AI.run(
"@cf/mistralai/mistral-7b-instruct-v0.2-lora", //the model supporting LoRAs
{
messages: [{"role": "user", "content": "Hello world"}],
raw: true, //skip applying the default chat template
lora: "00000000-0000-0000-0000-000000000", //the finetune id OR name
}
);
```
```bash null {5-6}
curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/mistral/mistral-7b-instruct-v0.2-lora \
-H 'Authorization: Bearer {API_TOKEN}' \
-d '{
"messages": [{"role": "user", "content": "Hello world"}],
"raw": "true",
"lora": "00000000-0000-0000-0000-000000000"
}'
```
---
# Public LoRA adapters
URL: https://developers.cloudflare.com/workers-ai/fine-tunes/public-loras/
Cloudflare offers a few public LoRA adapters that can immediately be used for fine-tuned inference. You can try them out immediately via our [playground](https://playground.ai.cloudflare.com).
Public LoRAs will have the name `cf-public-x`, and the prefix will be reserved for Cloudflare.
:::note
Have more LoRAs you would like to see? Let us know on [Discord](https://discord.cloudflare.com).
:::
| Name | Description | Compatible with |
| -------------------------------------------------------------------------- | ---------------------------------- | ----------------------------------------------------------------------------------- |
| [cf-public-magicoder](https://huggingface.co/predibase/magicoder) | Coding tasks in multiple languages | `@cf/mistral/mistral-7b-instruct-v0.1` `@hf/mistral/mistral-7b-instruct-v0.2` |
| [cf-public-jigsaw-classification](https://huggingface.co/predibase/jigsaw) | Toxic comment classification | `@cf/mistral/mistral-7b-instruct-v0.1` `@hf/mistral/mistral-7b-instruct-v0.2` |
| [cf-public-cnn-summarization](https://huggingface.co/predibase/cnn) | Article summarization | `@cf/mistral/mistral-7b-instruct-v0.1` `@hf/mistral/mistral-7b-instruct-v0.2` |
You can also list these public LoRAs with an API call:
```bash
curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/finetunes/public \
--header 'Authorization: Bearer {cf_token}'
```
## Running inference with public LoRAs
To run inference with public LoRAs, you just need to define the LoRA name in the request.
We recommend that you use the prompt template that the LoRA was trained on. You can find this in the HuggingFace repos linked above for each adapter.
### cURL
```bash null {10}
curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/mistral/mistral-7b-instruct-v0.1 \
--header 'Authorization: Bearer {cf_token}' \
--data '{
"messages": [
{
"role": "user",
"content": "Write a python program to check if a number is even or odd."
}
],
"lora": "cf-public-magicoder"
}'
```
### JavaScript
```js null {11}
const answer = await env.AI.run('@cf/mistral/mistral-7b-instruct-v0.1',
{
stream: true,
raw: true,
messages: [
{
"role": "user",
"content": "Summarize the following: Some newspapers, TV channels and well-known companies publish false news stories to fool people on 1 April. One of the earliest examples of this was in 1957 when a programme on the BBC, the UKs national TV channel, broadcast a report on how spaghetti grew on trees. The film showed a family in Switzerland collecting spaghetti from trees and many people were fooled into believing it, as in the 1950s British people didnt eat much pasta and many didnt know how it was made! Most British people wouldnt fall for the spaghetti trick today, but in 2008 the BBC managed to fool their audience again with their Miracles of Evolution trailer, which appeared to show some special penguins that had regained the ability to fly. Two major UK newspapers, The Daily Telegraph and the Daily Mirror, published the important story on their front pages."
}
],
lora: "cf-public-cnn-summarization"
});
```
---
# Dashboard
URL: https://developers.cloudflare.com/workers-ai/get-started/dashboard/
import { Render } from "~/components"
Follow this guide to create a Workers AI application using the Cloudflare dashboard.
## Prerequisites
Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up/workers-and-pages) if you have not already.
## Setup
To create a Workers AI application:
1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com) and select your account.
2. Go to **Compute (Workers)** and **Workers & Pages**.
3. Select **Create**.
4. Under **Start from a template**, select **LLM App**. After you select your template, an [AI binding](/workers-ai/configuration/bindings/) will be created for you in the dashboard.
5. Review the provided code and select **Deploy**.
6. Preview your Worker at its provided [`workers.dev`](/workers/configuration/routing/workers-dev/) subdomain.
## Development
---
# Get started
URL: https://developers.cloudflare.com/workers-ai/get-started/
import { DirectoryListing } from "~/components"
There are several options to build your Workers AI projects on Cloudflare. To get started, choose your preferred method:
:::note
These examples are geared towards creating new Workers AI projects. For help adding Workers AI to an existing Worker, refer to [Workers Bindings](/workers-ai/configuration/bindings/).
:::
---
# REST API
URL: https://developers.cloudflare.com/workers-ai/get-started/rest-api/
This guide will instruct you through setting up and deploying your first Workers AI project. You will use the Workers AI REST API to experiment with a large language model (LLM).
## Prerequisites
Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up/workers-and-pages) if you have not already.
## 1. Get API token and Account ID
You need your API token and Account ID to use the REST API.
To get these values:
1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com) and select your account.
2. Go to **AI** > **Workers AI**.
3. Select **Use REST API**.
4. Get your API token:
1. Select **Create a Workers AI API Token**.
2. Review the prefilled information.
3. Select **Create API Token**.
4. Select **Copy API Token**.
5. Save that value for future use.
5. For **Get Account ID**, copy the value for **Account ID**. Save that value for future use.
:::note
If you choose to [create an API token](/fundamentals/api/get-started/create-token/) instead of using the template, that token will need permissions for both `Workers AI - Read` and `Workers AI - Edit`.
:::
## 2. Run a model via API
After creating your API token, authenticate and make requests to the API using your API token in the request.
You will use the [Execute AI model](/api/resources/ai/methods/run/) endpoint to run the [`@cf/meta/llama-3.1-8b-instruct`](/workers-ai/models/llama-3.1-8b-instruct/) model:
```bash
curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/meta/llama-3.1-8b-instruct \
-H 'Authorization: Bearer {API_TOKEN}' \
-d '{ "prompt": "Where did the phrase Hello World come from" }'
```
Replace the values for `{ACCOUNT_ID}` and `{API_token}`.
The API response will look like the following:
```json
{
"result": {
"response": "Hello, World first appeared in 1974 at Bell Labs when Brian Kernighan included it in the C programming language example. It became widely used as a basic test program due to simplicity and clarity. It represents an inviting greeting from a program to the world."
},
"success": true,
"errors": [],
"messages": []
}
```
This example execution uses the `@cf/meta/llama-3.1-8b-instruct` model, but you can use any of the models in the [Workers AI models catalog](/workers-ai/models/). If using another model, you will need to replace `{model}` with your desired model name.
By completing this guide, you have created a Cloudflare account (if you did not have one already) and an API token that grants Workers AI read permissions to your account. You executed the [`@cf/meta/llama-3.1-8b-instruct`](/workers-ai/models/llama-3.1-8b-instruct/) model using a cURL command from the terminal and received an answer to your prompt in a JSON response.
## Related resources
- [Models](/workers-ai/models/) - Browse the Workers AI models catalog.
- [AI SDK](/workers-ai/configuration/ai-sdk) - Learn how to integrate with an AI model.
---
# CLI
URL: https://developers.cloudflare.com/workers-ai/get-started/workers-wrangler/
import { Render, PackageManagers, WranglerConfig } from "~/components";
This guide will instruct you through setting up and deploying your first Workers AI project. You will use [Workers](/workers/), a Workers AI binding, and a large language model (LLM) to deploy your first AI-powered application on the Cloudflare global network.
## 1. Create a Worker project
You will create a new Worker project using the `create-cloudflare` CLI (C3). [C3](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) is a command-line tool designed to help you set up and deploy new applications to Cloudflare.
Create a new project named `hello-ai` by running:
Running `npm create cloudflare@latest` will prompt you to install the [`create-cloudflare` package](https://www.npmjs.com/package/create-cloudflare), and lead you through setup. C3 will also install [Wrangler](/workers/wrangler/), the Cloudflare Developer Platform CLI.
This will create a new `hello-ai` directory. Your new `hello-ai` directory will include:
- A `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code) at `src/index.ts`.
- A [`wrangler.jsonc`](/workers/wrangler/configuration/) configuration file.
Go to your application directory:
```sh
cd hello-ai
```
## 2. Connect your Worker to Workers AI
You must create an AI binding for your Worker to connect to Workers AI. [Bindings](/workers/runtime-apis/bindings/) allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform.
To bind Workers AI to your Worker, add the following to the end of your Wrangler file:
```toml
[ai]
binding = "AI"
```
Your binding is [available in your Worker code](/workers/reference/migrate-to-module-workers/#bindings-in-es-modules-format) on [`env.AI`](/workers/runtime-apis/handlers/fetch/).
{/* */}
You can also bind Workers AI to a Pages Function. For more information, refer to [Functions Bindings](/pages/functions/bindings/#workers-ai).
## 3. Run an inference task in your Worker
You are now ready to run an inference task in your Worker. In this case, you will use an LLM, [`llama-3.1-8b-instruct`](/workers-ai/models/llama-3.1-8b-instruct/), to answer a question.
Update the `index.ts` file in your `hello-ai` application directory with the following code:
```typescript title="src/index.ts"
export interface Env {
// If you set another name in the Wrangler config file as the value for 'binding',
// replace "AI" with the variable name you defined.
AI: Ai;
}
export default {
async fetch(request, env): Promise {
const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
prompt: "What is the origin of the phrase Hello, World",
});
return new Response(JSON.stringify(response));
},
} satisfies ExportedHandler;
```
Up to this point, you have created an AI binding for your Worker and configured your Worker to be able to execute the Llama 3.1 model. You can now test your project locally before you deploy globally.
## 4. Develop locally with Wrangler
While in your project directory, test Workers AI locally by running [`wrangler dev`](/workers/wrangler/commands/#dev):
```sh
npx wrangler dev
```
You will be prompted to log in after you run the `wrangler dev`. When you run `npx wrangler dev`, Wrangler will give you a URL (most likely `localhost:8787`) to review your Worker. After you go to the URL Wrangler provides, a message will render that resembles the following example:
```json
{
"response": "Ah, a most excellent question, my dear human friend! *adjusts glasses*\n\nThe origin of the phrase \"Hello, World\" is a fascinating tale that spans several decades and multiple disciplines. It all began in the early days of computer programming, when a young man named Brian Kernighan was tasked with writing a simple program to demonstrate the basics of a new programming language called C.\nKernighan, a renowned computer scientist and author, was working at Bell Labs in the late 1970s when he created the program. He wanted to showcase the language's simplicity and versatility, so he wrote a basic \"Hello, World!\" program that printed the familiar greeting to the console.\nThe program was included in Kernighan and Ritchie's influential book \"The C Programming Language,\" published in 1978. The book became a standard reference for C programmers, and the \"Hello, World!\" program became a sort of \"Hello, World!\" for the programming community.\nOver time, the phrase \"Hello, World!\" became a shorthand for any simple program that demonstrated the basics"
}
```
## 5. Deploy your AI Worker
Before deploying your AI Worker globally, log in with your Cloudflare account by running:
```sh
npx wrangler login
```
You will be directed to a web page asking you to log in to the Cloudflare dashboard. After you have logged in, you will be asked if Wrangler can make changes to your Cloudflare account. Scroll down and select **Allow** to continue.
Finally, deploy your Worker to make your project accessible on the Internet. To deploy your Worker, run:
```sh
npx wrangler deploy
```
```sh output
https://hello-ai..workers.dev
```
Your Worker will be deployed to your custom [`workers.dev`](/workers/configuration/routing/workers-dev/) subdomain. You can now visit the URL to run your AI Worker.
By finishing this tutorial, you have created a Worker, connected it to Workers AI through an AI binding, and ran an inference task from the Llama 3 model.
## Related resources
- [Cloudflare Developers community on Discord](https://discord.cloudflare.com) - Submit feature requests, report bugs, and share your feedback directly with the Cloudflare team by joining the Cloudflare Discord server.
- [Models](/workers-ai/models/) - Browse the Workers AI models catalog.
- [AI SDK](/workers-ai/configuration/ai-sdk) - Learn how to integrate with an AI model.
---
# Guides
URL: https://developers.cloudflare.com/workers-ai/guides/
import { DirectoryListing } from "~/components";
---
# Prompting
URL: https://developers.cloudflare.com/workers-ai/guides/prompting/
import { Code } from "~/components";
export const scopedExampleOne = `{
messages: [
{ role: "system", content: "you are a very funny comedian and you like emojis" },
{ role: "user", content: "tell me a joke about cloudflare" },
],
};`;
export const scopedExampleTwo = `{
messages: [
{ role: "system", content: "you are a professional computer science assistant" },
{ role: "user", content: "what is WASM?" },
{ role: "assistant", content: "WASM (WebAssembly) is a binary instruction format that is designed to be a platform-agnostic" },
{ role: "user", content: "does Python compile to WASM?" },
{ role: "assistant", content: "No, Python does not directly compile to WebAssembly" },
{ role: "user", content: "what about Rust?" },
],
};`;
export const unscopedExampleOne = `{
prompt: "tell me a joke about cloudflare";
}`;
export const unscopedExampleTwo = `{
prompt: "[INST]comedian[/INST]\n[INST]tell me a joke about cloudflare[/INST]",
raw: true
};`;
Part of getting good results from text generation models is asking questions correctly. LLMs are usually trained with specific predefined templates, which should then be used with the model's tokenizer for better results when doing inference tasks.
There are two ways to prompt text generation models with Workers AI:
:::note[Important]
We recommend using unscoped prompts for inference with LoRA.
:::
### Scoped Prompts
This is the recommended method. With scoped prompts, Workers AI takes the burden of knowing and using different chat templates for different models and provides a unified interface to developers when building prompts and creating text generation tasks.
Scoped prompts are a list of messages. Each message defines two keys: the role and the content.
Typically, the role can be one of three options:
- system - System messages define the AI's personality. You can
use them to set rules and how you expect the AI to behave.
- user - User messages are where you actually query the AI by
providing a question or a conversation.
- assistant - Assistant messages hint to the AI about the
desired output format. Not all models support this role.
OpenAI has a [good explanation](https://platform.openai.com/docs/guides/text-generation#messages-and-roles) of how they use these roles with their GPT models. Even though chat templates are flexible, other text generation models tend to follow the same conventions.
Here's an input example of a scoped prompt using system and user roles:
Here's a better example of a chat session using multiple iterations between the user and the assistant.
Note that different LLMs are trained with different templates for different use cases. While Workers AI tries its best to abstract the specifics of each LLM template from the developer through a unified API, you should always refer to the model documentation for details (we provide links in the table above.) For example, instruct models like Codellama are fine-tuned to respond to a user-provided instruction, while chat models expect fragments of dialogs as input.
### Unscoped Prompts
You can use unscoped prompts to send a single question to the model without worrying about providing any context. Workers AI will automatically convert your `prompt` input to a reasonable default scoped prompt internally so that you get the best possible prediction.
You can also use unscoped prompts to construct the model chat template manually. In this case, you can use the raw parameter. Here's an input example of a [Mistral](https://docs.mistral.ai/models/#chat-template) chat template prompt:
---
# JSON Mode
URL: https://developers.cloudflare.com/workers-ai/json-mode/
import { Code } from "~/components";
export const jsonModeSchema = `{
response_format: {
title: "JSON Mode",
type: "object",
properties: {
type: {
type: "string",
enum: ["json_object", "json_schema"],
},
json_schema: {},
}
}
}`;
export const jsonModeRequestExample = `{
"messages": [
{
"role": "system",
"content": "Extract data about a country."
},
{
"role": "user",
"content": "Tell me about India."
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"capital": {
"type": "string"
},
"languages": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"name",
"capital",
"languages"
]
}
}
}`;
export const jsonModeResponseExample = `{
"response": {
"name": "India",
"capital": "New Delhi",
"languages": [
"Hindi",
"English",
"Bengali",
"Telugu",
"Marathi",
"Tamil",
"Gujarati",
"Urdu",
"Kannada",
"Odia",
"Malayalam",
"Punjabi",
"Sanskrit"
]
}
}`;
When we want text-generation AI models to interact with databases, services, and external systems programmatically, typically when using tool calling or building AI agents, we must have structured response formats rather than natural language.
Workers AI supports JSON Mode, enabling applications to request a structured output response when interacting with AI models.
## Schema
JSON Mode is compatible with OpenAIโs implementation; to enable add the `response_format` property to the request object using the following convention:
Where `json_schema` must be a valid [JSON Schema](https://json-schema.org/) declaration.
## JSON Mode example
When using JSON Format, pass the schema as in the example below as part of the request you send to the LLM.
The LLM will follow the schema, and return a response such as below:
As you can see, the model is complying with the JSON schema definition in the request and responding with a validated JSON object.
## Supported Models
This is the list of models that now support JSON Mode:
- [@cf/meta/llama-3.1-8b-instruct-fast](/workers-ai/models/llama-3.1-8b-instruct-fast/)
- [@cf/meta/llama-3.1-70b-instruct](/workers-ai/models/llama-3.1-70b-instruct/)
- [@cf/meta/llama-3.3-70b-instruct-fp8-fast](/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/)
- [@cf/meta/llama-3-8b-instruct](/workers-ai/models/llama-3-8b-instruct/)
- [@cf/meta/llama-3.1-8b-instruct](/workers-ai/models/llama-3.1-8b-instruct/)
- [@cf/meta/llama-3.2-11b-vision-instruct](/workers-ai/models/llama-3.2-11b-vision-instruct/)
- [@hf/nousresearch/hermes-2-pro-mistral-7b](/workers-ai/models/hermes-2-pro-mistral-7b/)
- [@hf/thebloke/deepseek-coder-6.7b-instruct-awq](/workers-ai/models/deepseek-coder-6.7b-instruct-awq/)
- [@cf/deepseek-ai/deepseek-r1-distill-qwen-32b](/workers-ai/models/deepseek-r1-distill-qwen-32b/)
We will continue extending this list to keep up with new, and requested models.
Note that Workers AI can't guarantee that the model responds according to the requested JSON Schema. Depending on the complexity of the task and adequacy of the JSON Schema, the model may not be able to satisfy the request in extreme situations. If that's the case, then an error `JSON Mode couldn't be met` is returned and must be handled.
JSON Mode currently doesn't support streaming.
---
# Models
URL: https://developers.cloudflare.com/workers-ai/models/
import ModelCatalog from "~/pages/workers-ai/models/index.astro";
---
# Platform
URL: https://developers.cloudflare.com/workers-ai/platform/
import { DirectoryListing } from "~/components";
---
# Limits
URL: https://developers.cloudflare.com/workers-ai/platform/limits/
import { Render } from "~/components"
Workers AI is now Generally Available. We've updated our rate limits to reflect this.
Note that model inferences in local mode using Wrangler will also count towards these limits. Beta models may have lower rate limits while we work on performance and scale.
Rate limits are default per task type, with some per-model limits defined as follows:
## Rate limits by task type
### [Automatic Speech Recognition](/workers-ai/models/)
* 720 requests per minute
### [Image Classification](/workers-ai/models/)
* 3000 requests per minute
### [Image-to-Text](/workers-ai/models/)
* 720 requests per minute
### [Object Detection](/workers-ai/models/)
* 3000 requests per minute
### [Summarization](/workers-ai/models/)
* 1500 requests per minute
### [Text Classification](/workers-ai/models/#text-classification)
* 2000 requests per minute
### [Text Embeddings](/workers-ai/models/#text-embeddings)
* 3000 requests per minute
* [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/) is 1500 requests per minute
### [Text Generation](/workers-ai/models/#text-generation)
* 300 requests per minute
* [@hf/thebloke/mistral-7b-instruct-v0.1-awq](/workers-ai/models/mistral-7b-instruct-v0.1-awq/) is 400 requests per minute
* [@cf/microsoft/phi-2](/workers-ai/models/phi-2/) is 720 requests per minute
* [@cf/qwen/qwen1.5-0.5b-chat](/workers-ai/models/qwen1.5-0.5b-chat/) is 1500 requests per minute
* [@cf/qwen/qwen1.5-1.8b-chat](/workers-ai/models/qwen1.5-1.8b-chat/) is 720 requests per minute
* [@cf/qwen/qwen1.5-14b-chat-awq](/workers-ai/models/qwen1.5-14b-chat-awq/) is 150 requests per minute
* [@cf/tinyllama/tinyllama-1.1b-chat-v1.0](/workers-ai/models/tinyllama-1.1b-chat-v1.0/) is 720 requests per minute
### [Text-to-Image](/workers-ai/models/#text-to-image)
* 720 requests per minute
* [@cf/runwayml/stable-diffusion-v1-5-img2img](/workers-ai/models/stable-diffusion-v1-5-img2img/) is 1500 requests per minute
### [Translation](/workers-ai/models/#translation)
* 720 requests per minute
---
# Pricing
URL: https://developers.cloudflare.com/workers-ai/platform/pricing/
:::note
Workers AI has updated pricing to be more granular, with per-model unit-based pricing presented, but still billing in neurons in the back end.
:::
Workers AI is included in both the [Free and Paid Workers plans](/workers/platform/pricing/) and is priced at **$0.011 per 1,000 Neurons**.
Our free allocation allows anyone to use a total of **10,000 Neurons per day at no charge**. To use more than 10,000 Neurons per day, you need to sign up for the [Workers Paid plan](/workers/platform/pricing/#workers). On Workers Paid, you will be charged at $0.011 / 1,000 Neurons for any usage above the free allocation of 10,000 Neurons per day.
You can monitor your Neuron usage in the [Cloudflare Workers AI dashboard](https://dash.cloudflare.com/?to=/:account/ai/workers-ai).
All limits reset daily at 00:00 UTC. If you exceed any one of the above limits, further operations will fail with an error.
| | Free allocation | Pricing |
| ------------ | ---------------------- | ----------------------------- |
| Workers Free | 10,000 Neurons per day | N/A - Upgrade to Workers Paid |
| Workers Paid | 10,000 Neurons per day | $0.011 / 1,000 Neurons |
## What are Neurons?
Neurons are our way of measuring AI outputs across different models, representing the GPU compute needed to perform your request. Our serverless model allows you to pay only for what you use without having to worry about renting, managing, or scaling GPUs.
## LLM model pricing
| Model | Price in Tokens | Price in Neurons |
| -------------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------------------------- |
| @cf/meta/llama-3.2-1b-instruct | $0.027 per M input tokens $0.201 per M output tokens | 2457 neurons per M input tokens 18252 neurons per M output tokens |
| @cf/meta/llama-3.2-3b-instruct | $0.051 per M input tokens $0.335 per M output tokens | 4625 neurons per M input tokens 30475 neurons per M output tokens |
| @cf/meta/llama-3.1-8b-instruct-fp8-fast | $0.045 per M input tokens $0.384 per M output tokens | 4119 neurons per M input tokens 34868 neurons per M output tokens |
| @cf/meta/llama-3.2-11b-vision-instruct | $0.049 per M input tokens $0.676 per M output tokens | 4410 neurons per M input tokens 61493 neurons per M output tokens |
| @cf/meta/llama-3.1-70b-instruct-fp8-fast | $0.293 per M input tokens $2.253 per M output tokens | 26668 neurons per M input tokens 204805 neurons per M output tokens |
| @cf/meta/llama-3.3-70b-instruct-fp8-fast | $0.293 per M input tokens $2.253 per M output tokens | 26668 neurons per M input tokens 204805 neurons per M output tokens |
| @cf/deepseek-ai/deepseek-r1-distill-qwen-32b | $0.497 per M input tokens $4.881 per M output tokens | 45170 neurons per M input tokens 443756 neurons per M output tokens |
| @cf/mistral/mistral-7b-instruct-v0.1 | $0.110 per M input tokens $0.190 per M output tokens | 10000 neurons per M input tokens 17300 neurons per M output tokens |
| @cf/meta/llama-3.1-8b-instruct | $0.282 per M input tokens $0.827 per M output tokens | 25608 neurons per M input tokens 75147 neurons per M output tokens |
| @cf/meta/llama-3.1-8b-instruct-fp8 | $0.152 per M input tokens $0.287 per M output tokens | 13778 neurons per M input tokens 26128 neurons per M output tokens |
| @cf/meta/llama-3.1-8b-instruct-awq | $0.123 per M input tokens $0.266 per M output tokens | 11161 neurons per M input tokens 24215 neurons per M output tokens |
| @cf/meta/llama-3-8b-instruct | $0.282 per M input tokens $0.827 per M output tokens | 25608 neurons per M input tokens 75147 neurons per M output tokens |
| @cf/meta/llama-3-8b-instruct-awq | $0.123 per M input tokens $0.266 per M output tokens | 11161 neurons per M input tokens 24215 neurons per M output tokens |
| @cf/meta/llama-2-7b-chat-fp16 | $0.556 per M input tokens $6.667 per M output tokens | 50505 neurons per M input tokens 606061 neurons per M output tokens |
| @cf/meta/llama-guard-3-8b | $0.484 per M input tokens $0.030 per M output tokens | 44003 neurons per M input tokens 2730 neurons per M output tokens |
## Other model pricing
| Model | Price in Tokens | Price in Neurons |
| ------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------------------------ |
| @cf/black-forest-labs/flux-1-schnell | $0.0000528 per 512x512 tile $0.0001056 per step | 4.80 neurons per 512x512 tile 9.60 neurons per step |
| @cf/huggingface/distilbert-sst-2-int8 | $0.026 per M input tokens | 2394 neurons per M input tokens |
| @cf/baai/bge-small-en-v1.5 | $0.020 per M input tokens | 1841 neurons per M input tokens |
| @cf/baai/bge-base-en-v1.5 | $0.067 per M input tokens | 6058 neurons per M input tokens |
| @cf/baai/bge-large-en-v1.5 | $0.204 per M input tokens | 18582 neurons per M input tokens |
| @cf/meta/m2m100-1.2b | $0.342 per M input tokens $0.342 per M output tokens | 31050 neurons per M input tokens 31050 neurons per M output tokens |
| @cf/microsoft/resnet-50 | $2.51 per M images | 228055 neurons per M images |
| @cf/openai/whisper | $0.0005 per audio minute | 41.14 neurons per audio minute |
---
# Build a Retrieval Augmented Generation (RAG) AI
URL: https://developers.cloudflare.com/workers-ai/tutorials/build-a-retrieval-augmented-generation-ai/
import { Details, Render, PackageManagers, WranglerConfig } from "~/components";
This guide will instruct you through setting up and deploying your first application with Cloudflare AI. You will build a fully-featured AI-powered application, using tools like Workers AI, Vectorize, D1, and Cloudflare Workers.
At the end of this tutorial, you will have built an AI tool that allows you to store information and query it using a Large Language Model. This pattern, known as Retrieval Augmented Generation, or RAG, is a useful project you can build by combining multiple aspects of Cloudflare's AI toolkit. You do not need to have experience working with AI tools to build this application.
You will also need access to [Vectorize](/vectorize/platform/pricing/). During this tutorial, we will show how you can optionally integrate with [Anthropic Claude](http://anthropic.com) as well. You will need an [Anthropic API key](https://docs.anthropic.com/en/api/getting-started) to do so.
## 1. Create a new Worker project
C3 (`create-cloudflare-cli`) is a command-line tool designed to help you setup and deploy Workers to Cloudflare as fast as possible.
Open a terminal window and run C3 to create your Worker project:
In your project directory, C3 has generated several files.
1. `wrangler.jsonc`: Your [Wrangler](/workers/wrangler/configuration/#sample-wrangler-configuration) configuration file.
2. `worker.js` (in `/src`): A minimal `'Hello World!'` Worker written in [ES module](/workers/reference/migrate-to-module-workers/) syntax.
3. `package.json`: A minimal Node dependencies configuration file.
4. `package-lock.json`: Refer to [`npm` documentation on `package-lock.json`](https://docs.npmjs.com/cli/v9/configuring-npm/package-lock-json).
5. `node_modules`: Refer to [`npm` documentation `node_modules`](https://docs.npmjs.com/cli/v7/configuring-npm/folders#node-modules).
Now, move into your newly created directory:
```sh
cd rag-ai-tutorial
```
## 2. Develop with Wrangler CLI
The Workers command-line interface, [Wrangler](/workers/wrangler/install-and-update/), allows you to [create](/workers/wrangler/commands/#init), [test](/workers/wrangler/commands/#dev), and [deploy](/workers/wrangler/commands/#deploy) your Workers projects. C3 will install Wrangler in projects by default.
After you have created your first Worker, run the [`wrangler dev`](/workers/wrangler/commands/#dev) command in the project directory to start a local server for developing your Worker. This will allow you to test your Worker locally during development.
```sh
npx wrangler dev --remote
```
:::note
If you have not used Wrangler before, it will try to open your web browser to login with your Cloudflare account.
If you have issues with this step or you do not have access to a browser interface, refer to the [`wrangler login`](/workers/wrangler/commands/#login) documentation for more information.
:::
You will now be able to go to [http://localhost:8787](http://localhost:8787) to see your Worker running. Any changes you make to your code will trigger a rebuild, and reloading the page will show you the up-to-date output of your Worker.
## 3. Adding the AI binding
To begin using Cloudflare's AI products, you can add the `ai` block to the [Wrangler configuration file](/workers/wrangler/configuration/). This will set up a binding to Cloudflare's AI models in your code that you can use to interact with the available AI models on the platform.
This example features the [`@cf/meta/llama-3-8b-instruct` model](/workers-ai/models/llama-3-8b-instruct/), which generates text.
```toml
[ai]
binding = "AI"
```
Now, find the `src/index.js` file. Inside the `fetch` handler, you can query the `AI` binding:
```js
export default {
async fetch(request, env, ctx) {
const answer = await env.AI.run("@cf/meta/llama-3-8b-instruct", {
messages: [{ role: "user", content: `What is the square root of 9?` }],
});
return new Response(JSON.stringify(answer));
},
};
```
By querying the LLM through the `AI` binding, we can interact directly with Cloudflare AI's large language models directly in our code. In this example, we are using the [`@cf/meta/llama-3-8b-instruct` model](/workers-ai/models/llama-3-8b-instruct/), which generates text.
You can deploy your Worker using `wrangler`:
```sh
npx wrangler deploy
```
Making a request to your Worker will now generate a text response from the LLM, and return it as a JSON object.
```sh
curl https://example.username.workers.dev
```
```sh output
{"response":"Answer: The square root of 9 is 3."}
```
## 4. Adding embeddings using Cloudflare D1 and Vectorize
Embeddings allow you to add additional capabilities to the language models you can use in your Cloudflare AI projects. This is done via **Vectorize**, Cloudflare's vector database.
To begin using Vectorize, create a new embeddings index using `wrangler`. This index will store vectors with 768 dimensions, and will use cosine similarity to determine which vectors are most similar to each other:
```sh
npx wrangler vectorize create vector-index --dimensions=768 --metric=cosine
```
Then, add the configuration details for your new Vectorize index to the [Wrangler configuration file](/workers/wrangler/configuration/):
```toml
# ... existing wrangler configuration
[[vectorize]]
binding = "VECTOR_INDEX"
index_name = "vector-index"
```
A vector index allows you to store a collection of dimensions, which are floating point numbers used to represent your data. When you want to query the vector database, you can also convert your query into dimensions. **Vectorize** is designed to efficiently determine which stored vectors are most similar to your query.
To implement the searching feature, you must set up a D1 database from Cloudflare. In D1, you can store your app's data. Then, you change this data into a vector format. When someone searches and it matches the vector, you can show them the matching data.
Create a new D1 database using `wrangler`:
```sh
npx wrangler d1 create database
```
Then, paste the configuration details output from the previous command into the [Wrangler configuration file](/workers/wrangler/configuration/):
```toml
# ... existing wrangler configuration
[[d1_databases]]
binding = "DB" # available in your Worker on env.DB
database_name = "database"
database_id = "abc-def-geh" # replace this with a real database_id (UUID)
```
In this application, we'll create a `notes` table in D1, which will allow us to store notes and later retrieve them in Vectorize. To create this table, run a SQL command using `wrangler d1 execute`:
```sh
npx wrangler d1 execute database --remote --command "CREATE TABLE IF NOT EXISTS notes (id INTEGER PRIMARY KEY, text TEXT NOT NULL)"
```
Now, we can add a new note to our database using `wrangler d1 execute`:
```sh
npx wrangler d1 execute database --remote --command "INSERT INTO notes (text) VALUES ('The best pizza topping is pepperoni')"
```
## 5. Creating a workflow
Before we begin creating notes, we will introduce a [Cloudflare Workflow](/workflows). This will allow us to define a durable workflow that can safely and robustly execute all the steps of the RAG process.
To begin, add a new `[[workflows]]` block to your [Wrangler configuration file](/workers/wrangler/configuration/):
```toml
# ... existing wrangler configuration
[[workflows]]
name = "rag"
binding = "RAG_WORKFLOW"
class_name = "RAGWorkflow"
```
In `src/index.js`, add a new class called `RAGWorkflow` that extends `WorkflowEntrypoint`:
```js
import { WorkflowEntrypoint } from "cloudflare:workers";
export class RAGWorkflow extends WorkflowEntrypoint {
async run(event, step) {
await step.do('example step', async () => {
console.log("Hello World!")
})
}
}
```
This class will define a single workflow step that will log "Hello World!" to the console. You can add as many steps as you need to your workflow.
On its own, this workflow will not do anything. To execute the workflow, we will call the `RAG_WORKFLOW` binding, passing in any parameters that the workflow needs to properly complete. Here is an example of how we can call the workflow:
```js
env.RAG_WORKFLOW.create({ params: { text } })
```
## 6. Creating notes and adding them to Vectorize
To expand on your Workers function in order to handle multiple routes, we will add `hono`, a routing library for Workers. This will allow us to create a new route for adding notes to our database. Install `hono` using `npm`:
```sh
npm install hono
```
Then, import `hono` into your `src/index.js` file. You should also update the `fetch` handler to use `hono`:
```js
import { Hono } from "hono";
const app = new Hono();
app.get("/", async (c) => {
const answer = await c.env.AI.run("@cf/meta/llama-3-8b-instruct", {
messages: [{ role: "user", content: `What is the square root of 9?` }],
});
return c.json(answer);
});
export default app;
```
This will establish a route at the root path `/` that is functionally equivalent to the previous version of your application.
Now, we can update our workflow to begin adding notes to our database, and generating the related embeddings for them.
This example features the [`@cf/baai/bge-base-en-v1.5` model](/workers-ai/models/bge-base-en-v1.5/), which can be used to create an embedding. Embeddings are stored and retrieved inside [Vectorize](/vectorize/), Cloudflare's vector database. The user query is also turned into an embedding so that it can be used for searching within Vectorize.
```js
import { WorkflowEntrypoint } from "cloudflare:workers";
export class RAGWorkflow extends WorkflowEntrypoint {
async run(event, step) {
const env = this.env
const { text } = event.payload
const record = await step.do(`create database record`, async () => {
const query = "INSERT INTO notes (text) VALUES (?) RETURNING *"
const { results } = await env.DB.prepare(query)
.bind(text)
.run()
const record = results[0]
if (!record) throw new Error("Failed to create note")
return record;
})
const embedding = await step.do(`generate embedding`, async () => {
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: text })
const values = embeddings.data[0]
if (!values) throw new Error("Failed to generate vector embedding")
return values
})
await step.do(`insert vector`, async () => {
return env.VECTOR_INDEX.upsert([
{
id: record.id.toString(),
values: embedding,
}
]);
})
}
}
```
The workflow does the following things:
1. Accepts a `text` parameter.
2. Insert a new row into the `notes` table in D1, and retrieve the `id` of the new row.
3. Convert the `text` into a vector using the `embeddings` model of the LLM binding.
4. Upsert the `id` and `vectors` into the `vector-index` index in Vectorize.
By doing this, you will create a new vector representation of the note, which can be used to retrieve the note later.
To complete the code, we will add a route that allows users to submit notes to the database. This route will parse the JSON request body, get the `note` parameter, and create a new instance of the workflow, passing the parameter:
```js
app.post('/notes', async (c) => {
const { text } = await c.req.json();
if (!text) return c.text("Missing text", 400);
await c.env.RAG_WORKFLOW.create({ params: { text } })
return c.text("Created note", 201);
})
```
## 7. Querying Vectorize to retrieve notes
To complete your code, you can update the root path (`/`) to query Vectorize. You will convert the query into a vector, and then use the `vector-index` index to find the most similar vectors.
The `topK` parameter limits the number of vectors returned by the function. For instance, providing a `topK` of 1 will only return the _most similar_ vector based on the query. Setting `topK` to 5 will return the 5 most similar vectors.
Given a list of similar vectors, you can retrieve the notes that match the record IDs stored alongside those vectors. In this case, we are only retrieving a single note - but you may customize this as needed.
You can insert the text of those notes as context into the prompt for the LLM binding. This is the basis of Retrieval-Augmented Generation, or RAG: providing additional context from data outside of the LLM to enhance the text generated by the LLM.
We'll update the prompt to include the context, and to ask the LLM to use the context when responding:
```js
import { Hono } from "hono";
const app = new Hono();
// Existing post route...
// app.post('/notes', async (c) => { ... })
app.get('/', async (c) => {
const question = c.req.query('text') || "What is the square root of 9?"
const embeddings = await c.env.AI.run('@cf/baai/bge-base-en-v1.5', { text: question })
const vectors = embeddings.data[0]
const vectorQuery = await c.env.VECTOR_INDEX.query(vectors, { topK: 1 });
let vecId;
if (vectorQuery.matches && vectorQuery.matches.length > 0 && vectorQuery.matches[0]) {
vecId = vectorQuery.matches[0].id;
} else {
console.log("No matching vector found or vectorQuery.matches is empty");
}
let notes = []
if (vecId) {
const query = `SELECT * FROM notes WHERE id = ?`
const { results } = await c.env.DB.prepare(query).bind(vecId).all()
if (results) notes = results.map(vec => vec.text)
}
const contextMessage = notes.length
? `Context:\n${notes.map(note => `- ${note}`).join("\n")}`
: ""
const systemPrompt = `When answering the question or responding, use the context provided, if it is provided and relevant.`
const { response: answer } = await c.env.AI.run(
'@cf/meta/llama-3-8b-instruct',
{
messages: [
...(notes.length ? [{ role: 'system', content: contextMessage }] : []),
{ role: 'system', content: systemPrompt },
{ role: 'user', content: question }
]
}
)
return c.text(answer);
});
app.onError((err, c) => {
return c.text(err);
});
export default app;
```
## 8. Adding Anthropic Claude model (optional)
If you are working with larger documents, you have the option to use Anthropic's [Claude models](https://claude.ai/), which have large context windows and are well-suited to RAG workflows.
To begin, install the `@anthropic-ai/sdk` package:
```sh
npm install @anthropic-ai/sdk
```
In `src/index.js`, you can update the `GET /` route to check for the `ANTHROPIC_API_KEY` environment variable. If it's set, we can generate text using the Anthropic SDK. If it isn't set, we'll fall back to the existing Workers AI code:
```js
import Anthropic from '@anthropic-ai/sdk';
app.get('/', async (c) => {
// ... Existing code
const systemPrompt = `When answering the question or responding, use the context provided, if it is provided and relevant.`
let modelUsed: string = ""
let response = null
if (c.env.ANTHROPIC_API_KEY) {
const anthropic = new Anthropic({
apiKey: c.env.ANTHROPIC_API_KEY
})
const model = "claude-3-5-sonnet-latest"
modelUsed = model
const message = await anthropic.messages.create({
max_tokens: 1024,
model,
messages: [
{ role: 'user', content: question }
],
system: [systemPrompt, notes ? contextMessage : ''].join(" ")
})
response = {
response: message.content.map(content => content.text).join("\n")
}
} else {
const model = "@cf/meta/llama-3.1-8b-instruct"
modelUsed = model
response = await c.env.AI.run(
model,
{
messages: [
...(notes.length ? [{ role: 'system', content: contextMessage }] : []),
{ role: 'system', content: systemPrompt },
{ role: 'user', content: question }
]
}
)
}
if (response) {
c.header('x-model-used', modelUsed)
return c.text(response.response)
} else {
return c.text("We were unable to generate output", 500)
}
})
```
Finally, you'll need to set the `ANTHROPIC_API_KEY` environment variable in your Workers application. You can do this by using `wrangler secret put`:
```sh
$ npx wrangler secret put ANTHROPIC_API_KEY
```
## 9. Deleting notes and vectors
If you no longer need a note, you can delete it from the database. Any time that you delete a note, you will also need to delete the corresponding vector from Vectorize. You can implement this by building a `DELETE /notes/:id` route in your `src/index.js` file:
```js
app.delete("/notes/:id", async (c) => {
const { id } = c.req.param();
const query = `DELETE FROM notes WHERE id = ?`;
await c.env.DB.prepare(query).bind(id).run();
await c.env.VECTOR_INDEX.deleteByIds([id]);
return c.status(204);
});
```
## 10. Text splitting (optional)
For large pieces of text, it is recommended to split the text into smaller chunks. This allows LLMs to more effectively gather relevant context, without needing to retrieve large pieces of text.
To implement this, we'll add a new NPM package to our project, `@langchain/textsplitters':
```sh
npm install @langchain/textsplitters
```
The `RecursiveCharacterTextSplitter` class provided by this package will split the text into smaller chunks. It can be customized to your liking, but the default config works in most cases:
```js
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
const text = "Some long piece of text...";
const splitter = new RecursiveCharacterTextSplitter({
// These can be customized to change the chunking size
// chunkSize: 1000,
// chunkOverlap: 200,
});
const output = await splitter.createDocuments([text]);
console.log(output) // [{ pageContent: 'Some long piece of text...' }]
```
To use this splitter, we'll update the workflow to split the text into smaller chunks. We'll then iterate over the chunks and run the rest of the workflow for each chunk of text:
```js
export class RAGWorkflow extends WorkflowEntrypoint {
async run(event, step) {
const env = this.env
const { text } = event.payload;
let texts = await step.do('split text', async () => {
const splitter = new RecursiveCharacterTextSplitter();
const output = await splitter.createDocuments([text]);
return output.map(doc => doc.pageContent);
})
console.log("RecursiveCharacterTextSplitter generated ${texts.length} chunks")
for (const index in texts) {
const text = texts[index]
const record = await step.do(`create database record: ${index}/${texts.length}`, async () => {
const query = "INSERT INTO notes (text) VALUES (?) RETURNING *"
const { results } = await env.DB.prepare(query)
.bind(text)
.run()
const record = results[0]
if (!record) throw new Error("Failed to create note")
return record;
})
const embedding = await step.do(`generate embedding: ${index}/${texts.length}`, async () => {
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: text })
const values = embeddings.data[0]
if (!values) throw new Error("Failed to generate vector embedding")
return values
})
await step.do(`insert vector: ${index}/${texts.length}`, async () => {
return env.VECTOR_INDEX.upsert([
{
id: record.id.toString(),
values: embedding,
}
]);
})
}
}
}
```
Now, when large pieces of text are submitted to the `/notes` endpoint, they will be split into smaller chunks, and each chunk will be processed by the workflow.
## 11. Deploy your project
If you did not deploy your Worker during [step 1](/workers/get-started/guide/#1-create-a-new-worker-project), deploy your Worker via Wrangler, to a `*.workers.dev` subdomain, or a [Custom Domain](/workers/configuration/routing/custom-domains/), if you have one configured. If you have not configured any subdomain or domain, Wrangler will prompt you during the publish process to set one up.
```sh
npx wrangler deploy
```
Preview your Worker at `..workers.dev`.
:::note[Note]
When pushing to your `*.workers.dev` subdomain for the first time, you may see [`523` errors](/support/troubleshooting/cloudflare-errors/troubleshooting-cloudflare-5xx-errors/#error-523-origin-is-unreachable) while DNS is propagating. These errors should resolve themselves after a minute or so.
:::
## Related resources
A full version of this codebase is available on GitHub. It includes a frontend UI for querying, adding, and deleting notes, as well as a backend API for interacting with the database and vector index. You can find it here: [github.com/kristianfreeman/cloudflare-retrieval-augmented-generation-example](https://github.com/kristianfreeman/cloudflare-retrieval-augmented-generation-example/).
To do more:
- Explore the reference diagram for a [Retrieval Augmented Generation (RAG) Architecture](/reference-architecture/diagrams/ai/ai-rag/).
- Review Cloudflare's [AI documentation](/workers-ai).
- Review [Tutorials](/workers/tutorials/) to build projects on Workers.
- Explore [Examples](/workers/examples/) to experiment with copy and paste Worker code.
- Understand how Workers works in [Reference](/workers/reference/).
- Learn about Workers features and functionality in [Platform](/workers/platform/).
- Set up [Wrangler](/workers/wrangler/install-and-update/) to programmatically create, test, and deploy your Worker projects.
---
# Build a Voice Notes App with auto transcriptions using Workers AI
URL: https://developers.cloudflare.com/workers-ai/tutorials/build-a-voice-notes-app-with-auto-transcription/
import { Render, PackageManagers, Tabs, TabItem } from "~/components";
In this tutorial, you will learn how to create a Voice Notes App with automatic transcriptions of voice recordings, and optional post-processing. The following tools will be used to build the application:
- Workers AI to transcribe the voice recordings, and for the optional post processing
- D1 database to store the notes
- R2 storage to store the voice recordings
- Nuxt framework to build the full-stack application
- Workers to deploy the project
## Prerequisites
To continue, you will need:
## 1. Create a new Worker project
Create a new Worker project using the `c3` CLI with the `nuxt` framework preset.
### Install additional dependencies
Change into the newly created project directory
```sh
cd voice-notes
```
And install the following dependencies:
Then add the `@nuxt/ui` module to the `nuxt.config.ts` file:
```ts title="nuxt.config.ts"
export default defineNuxtConfig({
//..
modules: ['nitro-cloudflare-dev', '@nuxt/ui'],
//..
})
```
### [Optional] Move to Nuxt 4 compatibility mode
Moving to Nuxt 4 compatibility mode ensures that your application remains forward-compatible with upcoming updates to Nuxt.
Create a new `app` folder in the project's root directory and move the `app.vue` file to it. Also, add the following to your `nuxt.config.ts` file:
```ts title="nuxt.config.ts"
export default defineNuxtConfig({
//..
future: {
compatibilityVersion: 4,
},
//..
})
```
:::note
The rest of the tutorial will use the `app` folder for keeping the client side code. If you did not make this change, you should continue to use the project's root directory.
:::
### Start local development server
At this point you can test your application by starting a local development server using:
If everything is set up correctly, you should see a Nuxt welcome page at `http://localhost:3000`.
## 2. Create the transcribe API endpoint
This API makes use of Workers AI to transcribe the voice recordings. To use Workers AI within your project, you first need to bind it to the Worker.
Add the `AI` binding to the Wrangler file.
```toml title="wrangler.toml"
[ai]
binding = "AI"
```
Once the `AI` binding has been configured, run the `cf-typegen` command to generate the necessary Cloudflare type definitions. This makes the types definitions available in the server event contexts.
Create a transcribe `POST` endpoint by creating `transcribe.post.ts` file inside the `/server/api` directory.
```ts title="server/api/transcribe.post.ts"
export default defineEventHandler(async (event) => {
const { cloudflare } = event.context;
const form = await readFormData(event);
const blob = form.get('audio') as Blob;
if (!blob) {
throw createError({
statusCode: 400,
message: 'Missing audio blob to transcribe',
});
}
try {
const response = await cloudflare.env.AI.run('@cf/openai/whisper', {
audio: [...new Uint8Array(await blob.arrayBuffer())],
});
return response.text;
} catch (err) {
console.error('Error transcribing audio:', err);
throw createError({
statusCode: 500,
message: 'Failed to transcribe audio. Please try again.',
});
}
});
```
The above code does the following:
1. Extracts the audio blob from the event.
2. Transcribes the blob using the `@cf/openai/whisper` model and returns the transcription text as response.
## 3. Create an API endpoint for uploading audio recordings to R2
Before uploading the audio recordings to `R2`, you need to create a bucket first. You will also need to add the R2 binding to your Wrangler file and regenerate the Cloudflare type definitions.
Create an `R2` bucket.
```sh
npx wrangler r2 bucket create
```
```sh
yarn dlx wrangler r2 bucket create
```
```sh
pnpm dlx wrangler r2 bucket create
```
Add the storage binding to your Wrangler file.
```toml title="wrangler.toml"
[[r2_buckets]]
binding = "R2"
bucket_name = ""
```
Finally, generate the type definitions by rerunning the `cf-typegen` script.
Now you are ready to create the upload endpoint. Create a new `upload.put.ts` file in your `server/api` directory, and add the following code to it:
```ts title="server/api/upload.put.ts"
export default defineEventHandler(async (event) => {
const { cloudflare } = event.context;
const form = await readFormData(event);
const files = form.getAll('files') as File[];
if (!files) {
throw createError({ statusCode: 400, message: 'Missing files' });
}
const uploadKeys: string[] = [];
for (const file of files) {
const obj = await cloudflare.env.R2.put(`recordings/${file.name}`, file);
if (obj) {
uploadKeys.push(obj.key);
}
}
return uploadKeys;
});
```
The above code does the following:
1. The files variable retrieves all files sent by the client using form.getAll(), which allows for multiple uploads in a single request.
2. Uploads the files to the R2 bucket using the binding (`R2`) you created earlier.
:::note
The `recordings/` prefix organizes uploaded files within a dedicated folder in your bucket. This will also come in handy when serving these recordings to the client (covered later).
:::
## 4. Create an API endpoint to save notes entries
Before creating the endpoint, you will need to perform steps similar to those for the R2 bucket, with some additional steps to prepare a notes table.
Create a `D1` database.
```sh
npx wrangler d1 create
```
```sh
yarn dlx wrangler d1 create
```
```sh
pnpm dlx wrangler d1 create
```
Add the D1 bindings to the Wrangler file. You can get the `DB_ID` from the output of the `d1 create` command.
```toml title="wrangler.toml"
[[d1_databases]]
binding = "DB"
database_name = ""
database_id = ""
```
As before, rerun the `cf-typegen` command to generate the types.
Next, create a DB migration.
```sh
npx wrangler d1 migrations create "create notes table"
```
```sh
yarn dlx wrangler d1 migrations create "create notes table"
```
```sh
pnpm dlx wrangler d1 migrations create "create notes table"
```
This will create a new `migrations` folder in the project's root directory, and add an empty `0001_create_notes_table.sql` file to it. Replace the contents of this file with the code below.
```sql
CREATE TABLE IF NOT EXISTS notes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
text TEXT NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
audio_urls TEXT
);
```
And then apply this migration to create the `notes` table.
```sh
npx wrangler d1 migrations apply
```
```sh
yarn dlx wrangler d1 migrations apply
```
```sh
pnpm dlx wrangler d1 migrations apply
```
:::note
The above command will create the notes table locally. To apply the migration on your remote production database, use the `--remote` flag.
:::
Now you can create the API endpoint. Create a new file `index.post.ts` in the `server/api/notes` directory, and change its content to the following:
```ts title="server/api/notes/index.post.ts"
export default defineEventHandler(async (event) => {
const { cloudflare } = event.context;
const { text, audioUrls } = await readBody(event);
if (!text) {
throw createError({
statusCode: 400,
message: 'Missing note text',
});
}
try {
await cloudflare.env.DB.prepare(
'INSERT INTO notes (text, audio_urls) VALUES (?1, ?2)'
)
.bind(text, audioUrls ? JSON.stringify(audioUrls) : null)
.run();
return setResponseStatus(event, 201);
} catch (err) {
console.error('Error creating note:', err);
throw createError({
statusCode: 500,
message: 'Failed to create note. Please try again.',
});
}
});
```
The above does the following:
1. Extracts the text, and optional audioUrls from the event.
2. Saves it to the database after converting the audioUrls to a `JSON` string.
## 5. Handle note creation on the client-side
Now you're ready to work on the client side. Let's start by tackling the note creation part first.
### Recording user audio
Create a composable to handle audio recording using the MediaRecorder API. This will be used to record notes through the user's microphone.
Create a new file `useMediaRecorder.ts` in the `app/composables` folder, and add the following code to it:
```ts title="app/composables/useMediaRecorder.ts"
interface MediaRecorderState {
isRecording: boolean;
recordingDuration: number;
audioData: Uint8Array | null;
updateTrigger: number;
}
export function useMediaRecorder() {
const state = ref({
isRecording: false,
recordingDuration: 0,
audioData: null,
updateTrigger: 0,
});
let mediaRecorder: MediaRecorder | null = null;
let audioContext: AudioContext | null = null;
let analyser: AnalyserNode | null = null;
let animationFrame: number | null = null;
let audioChunks: Blob[] | undefined = undefined;
const updateAudioData = () => {
if (!analyser || !state.value.isRecording || !state.value.audioData) {
if (animationFrame) {
cancelAnimationFrame(animationFrame);
animationFrame = null;
}
return;
}
analyser.getByteTimeDomainData(state.value.audioData);
state.value.updateTrigger += 1;
animationFrame = requestAnimationFrame(updateAudioData);
};
const startRecording = async () => {
try {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
audioContext = new AudioContext();
analyser = audioContext.createAnalyser();
const source = audioContext.createMediaStreamSource(stream);
source.connect(analyser);
mediaRecorder = new MediaRecorder(stream);
audioChunks = [];
mediaRecorder.ondataavailable = (e: BlobEvent) => {
audioChunks?.push(e.data);
state.value.recordingDuration += 1;
};
state.value.audioData = new Uint8Array(analyser.frequencyBinCount);
state.value.isRecording = true;
state.value.recordingDuration = 0;
state.value.updateTrigger = 0;
mediaRecorder.start(1000);
updateAudioData();
} catch (err) {
console.error('Error accessing microphone:', err);
throw err;
}
};
const stopRecording = async () => {
return await new Promise((resolve) => {
if (mediaRecorder && state.value.isRecording) {
mediaRecorder.onstop = () => {
const blob = new Blob(audioChunks, { type: 'audio/webm' });
audioChunks = undefined;
state.value.recordingDuration = 0;
state.value.updateTrigger = 0;
state.value.audioData = null;
resolve(blob);
};
state.value.isRecording = false;
mediaRecorder.stop();
mediaRecorder.stream.getTracks().forEach((track) => track.stop());
if (animationFrame) {
cancelAnimationFrame(animationFrame);
animationFrame = null;
}
audioContext?.close();
audioContext = null;
}
});
};
onUnmounted(() => {
stopRecording();
});
return {
state: readonly(state),
startRecording,
stopRecording,
};
}
```
The above code does the following:
1. Exposes functions to start and stop audio recordings in a Vue application.
2. Captures audio input from the user's microphone using MediaRecorder API.
3. Processes real-time audio data for visualization using AudioContext and AnalyserNode.
4. Stores recording state including duration and recording status.
5. Maintains chunks of audio data and combines them into a final audio blob when recording stops.
6. Updates audio visualization data continuously using animation frames while recording.
7. Automatically cleans up all audio resources when recording stops or component unmounts.
8. Returns audio recordings in webm format for further processing.
### Create a component for note creation
This component allows users to create notes by either typing or recording audio. It also handles audio transcription and uploading the recordings to the server.
Create a new file named `CreateNote.vue` inside the `app/components` folder. Add the following template code to the newly created file:
```vue title="app/components/CreateNote.vue"
Note transcript
Note recordings
Transcribing...
No recordings...
Clear
Save
```
The above template results in the following:
1. A panel with a `textarea` inside to type the note manually.
2. Another panel to manage start/stop of an audio recording, and show the recordings done already.
3. A bottom panel to reset or save the note (along with the recordings).
Now, add the following code below the template code in the same file:
```vue title="app/components/CreateNote.vue"
```
The above code does the following:
1. When a recording is stopped by calling `handleRecordingStop` function, the audio blob is sent for transcribing to the transcribe API endpoint.
2. The transcription response text is appended to the existing textarea content.
3. When the note is saved by calling the `saveNote` function, the audio recordings are uploaded first to R2 by using the upload endpoint we created earlier. Then, the actual note content along with the audioUrls (the R2 object keys) are saved by calling the notes post endpoint.
### Create a new page route for showing the component
You can use this component in a Nuxt page to show it to the user. But before that you need to modify your `app.vue` file. Update the content of your `app.vue` to the following:
```vue title="/app/app.vue"
New Note
```
The above code allows for a nuxt page to be shown to the user, apart from showing an app header and a navigation sidebar.
Next, add a new file named `new.vue` inside the `app/pages` folder, add the following code to it:
```vue title="app/pages/new.vue"
Create note
```
The above code shows the `CreateNote` component inside a modal, and navigates back to the home page on successful note creation.
## 6. Showing the notes on the client side
To show the notes from the database on the client side, create an API endpoint first that will interact with the database.
### Create an API endpoint to fetch notes from the database
Create a new file named `index.get.ts` inside the `server/api/notes` directory, and add the following code to it:
```ts title="server/api/index.get.ts"
import type { Note } from '~~/types';
export default defineEventHandler(async (event) => {
const { cloudflare } = event.context;
const res = await cloudflare.env.DB.prepare(
`SELECT
id,
text,
audio_urls AS audioUrls,
created_at AS createdAt,
updated_at AS updatedAt
FROM notes
ORDER BY created_at DESC
LIMIT 50;`
).all & { audioUrls: string | null }>();
return res.results.map((note) => ({
...note,
audioUrls: note.audioUrls ? JSON.parse(note.audioUrls) : undefined,
}));
});
```
The above code fetches the last 50 notes from the database, ordered by their creation date in descending order. The `audio_urls` field is stored as a string in the database, but it's converted to an array using `JSON.parse` to handle multiple audio files seamlessly on the client side.
Next, create a page named `index.vue` inside the `app/pages` directory. This will be the home page of the application. Add the following code to it:
```vue title="app/pages/index.vue"
No notes created
Get started by creating your first note
```
The above code fetches the notes from the database by calling the `/api/notes` endpoint you created just now, and renders them as note cards.
### Serving the saved recordings from R2
To be able to play the audio recordings of these notes, you need to serve the saved recordings from the R2 storage.
Create a new file named `[...pathname].get.ts` inside the `server/routes/recordings` directory, and add the following code to it:
:::note
The `...` prefix in the file name makes it a catch all route. This allows it to receive all events that are meant for paths starting with `/recordings` prefix. This is where the `recordings` prefix that was added previously while saving the recordings becomes helpful.
:::
```ts title="server/routes/recordings/[...pathname].get.ts"
export default defineEventHandler(async (event) => {
const { cloudflare, params } = event.context;
const { pathname } = params || {};
return cloudflare.env.R2.get(`recordings/${pathname}`);
});
```
The above code extracts the path name from the event params, and serves the saved recording matching that object key from the R2 bucket.
## 7. [Optional] Post Processing the transcriptions
Even though the speech-to-text transcriptions models perform satisfactorily, sometimes you want to post process the transcriptions for various reasons. It could be to remove any discrepancy, or to change the tone/style of the final text.
### Create a settings page
Create a new file named `settings.vue` in the `app/pages` folder, and add the following code to it:
```vue title="app/pages/settings.vue"
Post Processing
Configure post-processing of recording transcriptions with AI models.
Settings changes are auto-saved locally.
```
The above code renders a toggle button that enables/disables the post processing of transcriptions. If enabled, users can change the prompt that will used while post processing the transcription with an AI model.
The transcription settings are saved using useStorageAsync, which utilizes the browser's local storage. This ensures that users' preferences are retained even after refreshing the page.
### Send the post processing prompt with recorded audio
Modify the `CreateNote` component to send the post processing prompt along with the audio blob, while calling the `transcribe` API endpoint.
```vue title="app/components/CreateNote.vue" ins={2, 6-9, 17-22}
```
The code blocks added above checks for the saved post processing setting. If enabled, and there is a defined prompt, it sends the prompt to the `transcribe` API endpoint.
### Handle post processing in the transcribe API endpoint
Modify the transcribe API endpoint, and update it to the following:
```ts title="server/api/transcribe.post.ts" ins={9-20, 22}
export default defineEventHandler(async (event) => {
// ...
try {
const response = await cloudflare.env.AI.run('@cf/openai/whisper', {
audio: [...new Uint8Array(await blob.arrayBuffer())],
});
const postProcessingPrompt = form.get('prompt') as string;
if (postProcessingPrompt && response.text) {
const postProcessResult = await cloudflare.env.AI.run(
'@cf/meta/llama-3.1-8b-instruct',
{
temperature: 0.3,
prompt: `${postProcessingPrompt}.\n\nText:\n\n${response.text}\n\nResponse:`,
}
);
return (postProcessResult as { response?: string }).response;
} else {
return response.text;
}
} catch (err) {
// ...
}
});
```
The above code does the following:
1. Extracts the post processing prompt from the event FormData.
2. If present, it calls the Workers AI API to process the transcription text using the `@cf/meta/llama-3.1-8b-instruct` model.
3. Finally, it returns the response from Workers AI to the client.
## 8. Deploy the application
Now you are ready to deploy the project to a `.workers.dev` sub-domain by running the deploy command.
You can preview your application at `..workers.dev`.
:::note
If you used `pnpm` as your package manager, you may face build errors like `"stdin" is not exported by "node_modules/.pnpm/unenv@1.10.0/node_modules/unenv/runtime/node/process/index.mjs"`. To resolve it, you can try hoisting your node modules with the [`shamefully-hoist-true`](https://pnpm.io/npmrc) option.
:::
## Conclusion
In this tutorial, you have gone through the steps of building a voice notes application using Nuxt 3, Cloudflare Workers, D1, and R2 storage. You learnt to:
- Set up the backend to store and manage notes
- Create API endpoints to fetch and display notes
- Handle audio recordings
- Implement optional post-processing for transcriptions
- Deploy the application using the Cloudflare module syntax
The complete source code of the project is available on GitHub. You can go through it to see the code for various frontend components not covered in the article. You can find it here: [github.com/ra-jeev/vnotes](https://github.com/ra-jeev/vnotes).
---
# Build an interview practice tool with Workers AI
URL: https://developers.cloudflare.com/workers-ai/tutorials/build-ai-interview-practice-tool/
import { Render, PackageManagers } from "~/components";
Job interviews can be stressful, and practice is key to building confidence. While traditional mock interviews with friends or mentors are valuable, they are not always available when you need them. In this tutorial, you will learn how to build an AI-powered interview practice tool that provides real-time feedback to help improve interview skills.
By the end of this tutorial, you will have built a complete interview practice tool with the following core functionalities:
- A real-time interview simulation tool using WebSocket connections
- An AI-powered speech processing pipeline that converts audio to text
- An intelligent response system that provides interviewer-like interactions
- A persistent storage system for managing interview sessions and history using Durable Objects
### Prerequisites
This tutorial demonstrates how to use multiple Cloudflare products and while many features are available in free tiers, some components of Workers AI may incur usage-based charges. Please review the pricing documentation for Workers AI before proceeding.
## 1. Create a new Worker project
Create a Cloudflare Workers project using the Create Cloudflare CLI (C3) tool and the Hono framework.
:::note
[Hono](https://hono.dev) is a lightweight web framework that helps build API endpoints and handle HTTP requests. This tutorial uses Hono to create and manage the application's routing and middleware components.
:::
Create a new Worker project by running the following commands, using `ai-interview-tool` as the Worker name:
To develop and test your Cloudflare Workers application locally:
1. Navigate to your Workers project directory in your terminal:
```sh
cd ai-interview-tool
```
2. Start the development server by running:
```sh
npx wrangler dev
```
When you run `wrangler dev`, the command starts a local development server and provides a `localhost` URL where you can preview your application.
You can now make changes to your code and see them reflected in real-time at the provided localhost address.
## 2. Define TypeScript types for the interview system
Now that the project is set up, create the TypeScript types that will form the foundation of the interview system. These types will help you maintain type safety and provide clear interfaces for the different components of your application.
Create a new file `types.ts` that will contain essential types and enums for:
- Interview skills that can be assessed (JavaScript, React, etc.)
- Different interview positions (Junior Developer, Senior Developer, etc.)
- Interview status tracking
- Message handling between user and AI
- Core interview data structure
```typescript title="src/types.ts"
import { Context } from "hono";
// Context type for API endpoints, including environment bindings and user info
export interface ApiContext {
Bindings: CloudflareBindings;
Variables: {
username: string;
};
}
export type HonoCtx = Context;
// List of technical skills you can assess during mock interviews.
// This application focuses on popular web technologies and programming languages
// that are commonly tested in real interviews.
export enum InterviewSkill {
JavaScript = "JavaScript",
TypeScript = "TypeScript",
React = "React",
NodeJS = "NodeJS",
Python = "Python",
}
// Available interview types based on different engineering roles.
// This helps tailor the interview experience and questions to
// match the candidate's target position.
export enum InterviewTitle {
JuniorDeveloper = "Junior Developer Interview",
SeniorDeveloper = "Senior Developer Interview",
FullStackDeveloper = "Full Stack Developer Interview",
FrontendDeveloper = "Frontend Developer Interview",
BackendDeveloper = "Backend Developer Interview",
SystemArchitect = "System Architect Interview",
TechnicalLead = "Technical Lead Interview",
}
// Tracks the current state of an interview session.
// This will help you to manage the interview flow and show appropriate UI/actions
// at each stage of the process.
export enum InterviewStatus {
Created = "created", // Interview is created but not started
Pending = "pending", // Waiting for interviewer/system
InProgress = "in_progress", // Active interview session
Completed = "completed", // Interview finished successfully
Cancelled = "cancelled", // Interview terminated early
}
// Defines who sent a message in the interview chat
export type MessageRole = "user" | "assistant" | "system";
// Structure of individual messages exchanged during the interview
export interface Message {
messageId: string; // Unique identifier for the message
interviewId: string; // Links message to specific interview
role: MessageRole; // Who sent the message
content: string; // The actual message content
timestamp: number; // When the message was sent
}
// Main data structure that holds all information about an interview session.
// This includes metadata, messages exchanged, and the current status.
export interface InterviewData {
interviewId: string;
title: InterviewTitle;
skills: InterviewSkill[];
messages: Message[];
status: InterviewStatus;
createdAt: number;
updatedAt: number;
}
// Input format for creating a new interview session.
// Simplified interface that accepts basic parameters needed to start an interview.
export interface InterviewInput {
title: string;
skills: string[];
}
```
## 3. Configure error types for different services
Next, set up custom error types to handle different kinds of errors that may occur in your application. This includes:
- Database errors (for example, connection issues, query failures)
- Interview-related errors (for example, invalid input, transcription failures)
- Authentication errors (for example, invalid sessions)
Create the following `errors.ts` file:
```typescript title="src/errors.ts"
export const ErrorCodes = {
INVALID_MESSAGE: "INVALID_MESSAGE",
TRANSCRIPTION_FAILED: "TRANSCRIPTION_FAILED",
LLM_FAILED: "LLM_FAILED",
DATABASE_ERROR: "DATABASE_ERROR",
} as const;
export class AppError extends Error {
constructor(
message: string,
public statusCode: number,
) {
super(message);
this.name = this.constructor.name;
}
}
export class UnauthorizedError extends AppError {
constructor(message: string) {
super(message, 401);
}
}
export class BadRequestError extends AppError {
constructor(message: string) {
super(message, 400);
}
}
export class NotFoundError extends AppError {
constructor(message: string) {
super(message, 404);
}
}
export class InterviewError extends Error {
constructor(
message: string,
public code: string,
public statusCode: number = 500,
) {
super(message);
this.name = "InterviewError";
}
}
```
## 4. Configure authentication middleware and user routes
In this step, you will implement a basic authentication system to track and identify users interacting with your AI interview practice tool. The system uses HTTP-only cookies to store usernames, allowing you to identify both the request sender and their corresponding Durable Object. This straightforward authentication approach requires users to provide a username, which is then stored securely in a cookie. This approach allows you to:
- Identify users across requests
- Associate interview sessions with specific users
- Secure access to interview-related endpoints
### Create the Authentication Middleware
Create a middleware function that will check for the presence of a valid authentication cookie. This middleware will be used to protect routes that require authentication.
Create a new middleware file `middleware/auth.ts`:
```typescript title="src/middleware/auth.ts"
import { Context } from "hono";
import { getCookie } from "hono/cookie";
import { UnauthorizedError } from "../errors";
export const requireAuth = async (ctx: Context, next: () => Promise) => {
// Get username from cookie
const username = getCookie(ctx, "username");
if (!username) {
throw new UnauthorizedError("User is not logged in");
}
// Make username available to route handlers
ctx.set("username", username);
await next();
};
```
This middleware:
- Checks for a `username` cookie
- Throws an `Error` if the cookie is missing
- Makes the username available to downstream handlers via the context
### Create Authentication Routes
Next, create the authentication routes that will handle user login. Create a new file `routes/auth.ts`:
```typescript title="src/routes/auth.ts"
import { Context, Hono } from "hono";
import { setCookie } from "hono/cookie";
import { BadRequestError } from "../errors";
import { ApiContext } from "../types";
export const authenticateUser = async (ctx: Context) => {
// Extract username from request body
const { username } = await ctx.req.json();
// Make sure username was provided
if (!username) {
throw new BadRequestError("Username is required");
}
// Create a secure cookie to track the user's session
// This cookie will:
// - Be HTTP-only for security (no JS access)
// - Work across all routes via path="/"
// - Last for 24 hours
// - Only be sent in same-site requests to prevent CSRF
setCookie(ctx, "username", username, {
httpOnly: true,
path: "/",
maxAge: 60 * 60 * 24,
sameSite: "Strict",
});
// Let the client know login was successful
return ctx.json({ success: true });
};
// Set up authentication-related routes
export const configureAuthRoutes = () => {
const router = new Hono();
// POST /login - Authenticate user and create session
router.post("/login", authenticateUser);
return router;
};
```
Finally, update main application file to include the authentication routes. Modify `src/index.ts`:
```typescript title="src/index.ts"
import { configureAuthRoutes } from "./routes/auth";
import { Hono } from "hono";
import { logger } from "hono/logger";
import type { ApiContext } from "./types";
import { requireAuth } from "./middleware/auth";
// Create our main Hono app instance with proper typing
const app = new Hono();
// Create a separate router for API endpoints to keep things organized
const api = new Hono();
// Set up global middleware that runs on every request
// - Logger gives us visibility into what is happening
app.use("*", logger());
// Wire up all our authentication routes (login, etc)
// These will be mounted under /api/v1/auth/
api.route("/auth", configureAuthRoutes());
// Mount all API routes under the version prefix (for example, /api/v1)
// This allows us to make breaking changes in v2 without affecting v1 users
app.route("/api/v1", api);
export default app;
```
Now we have a basic authentication system that:
1. Provides a login endpoint at `/api/v1/auth/login`
2. Securely stores the username in a cookie
3. Includes middleware to protect authenticated routes
## 5. Create a Durable Object to manage interviews
Now that you have your authentication system in place, create a Durable Object to manage interview sessions. Durable Objects are perfect for this interview practice tool because they provide the following functionalities:
- Maintains states between connections, so users can reconnect without losing progress.
- Provides a SQLite database to store all interview Q&A, feedback and metrics.
- Enables smooth real-time interactions between the interviewer AI and candidate.
- Handles multiple interview sessions efficiently without performance issues.
- Creates a dedicated instance for each user, giving them their own isolated environment.
First, you will need to configure the Durable Object in Wrangler file. Add the following configuration:
```toml title="wrangler.toml"
[[durable_objects.bindings]]
name = "INTERVIEW"
class_name = "Interview"
[[migrations]]
tag = "v1"
new_sqlite_classes = ["Interview"]
```
Next, create a new file `interview.ts` to define our Interview Durable Object:
```typescript title="src/interview.ts"
import { DurableObject } from "cloudflare:workers";
export class Interview extends DurableObject {
// We will use it to keep track of all active WebSocket connections for real-time communication
private sessions: Map;
constructor(state: DurableObjectState, env: CloudflareBindings) {
super(state, env);
// Initialize empty sessions map - we will add WebSocket connections as users join
this.sessions = new Map();
}
// Entry point for all HTTP requests to this Durable Object
// This will handle both initial setup and WebSocket upgrades
async fetch(request: Request) {
// For now, just confirm the object is working
// We'll add WebSocket upgrade logic and request routing later
return new Response("Interview object initialized");
}
// Broadcasts a message to all connected WebSocket clients.
private broadcast(message: string) {
this.ctx.getWebSockets().forEach((ws) => {
try {
if (ws.readyState === WebSocket.OPEN) {
ws.send(message);
}
} catch (error) {
console.error(
"Error broadcasting message to a WebSocket client:",
error,
);
}
});
}
}
```
Now we need to export the Durable Object in our main `src/index.ts` file:
```typescript title="src/index.ts"
import { Interview } from "./interview";
// ... previous code ...
export { Interview };
export default app;
```
Since the Worker code is written in TypeScript, you should run the following command to add the necessary type definitions:
```sh
npm run cf-typegen
```
### Set up SQLite database schema to store interview data
Now you will use SQLite at the Durable Object level for data persistence. This gives each user their own isolated database instance. You will need two main tables:
- `interviews`: Stores interview session data
- `messages`: Stores all messages exchanged during interviews
Before you create these tables, create a service class to handle your database operations. This encapsulates database logic and helps you:
- Manage database schema changes
- Handle errors consistently
- Keep database queries organized
Create a new file called `services/InterviewDatabaseService.ts`:
```typescript title="src/services/InterviewDatabaseService.ts"
import {
InterviewData,
Message,
InterviewStatus,
InterviewTitle,
InterviewSkill,
} from "../types";
import { InterviewError, ErrorCodes } from "../errors";
const CONFIG = {
database: {
tables: {
interviews: "interviews",
messages: "messages",
},
indexes: {
messagesByInterview: "idx_messages_interviewId",
},
},
} as const;
export class InterviewDatabaseService {
constructor(private sql: SqlStorage) {}
/**
* Sets up the database schema by creating tables and indexes if they do not exist.
* This is called when initializing a new Durable Object instance to ensure
* we have the required database structure.
*
* The schema consists of:
* - interviews table: Stores interview metadata like title, skills, and status
* - messages table: Stores the conversation history between user and AI
* - messages index: Helps optimize queries when fetching messages for a specific interview
*/
createTables() {
try {
// Get list of existing tables to avoid recreating them
const cursor = this.sql.exec(`PRAGMA table_list`);
const existingTables = new Set([...cursor].map((table) => table.name));
// The interviews table is our main table storing interview sessions.
// We only create it if it does not exist yet.
if (!existingTables.has(CONFIG.database.tables.interviews)) {
this.sql.exec(InterviewDatabaseService.QUERIES.CREATE_INTERVIEWS_TABLE);
}
// The messages table stores the actual conversation history.
// It references interviews table via foreign key for data integrity.
if (!existingTables.has(CONFIG.database.tables.messages)) {
this.sql.exec(InterviewDatabaseService.QUERIES.CREATE_MESSAGES_TABLE);
}
// Add an index on interviewId to speed up message retrieval.
// This is important since we will frequently query messages by interview.
this.sql.exec(InterviewDatabaseService.QUERIES.CREATE_MESSAGE_INDEX);
} catch (error: unknown) {
const message = error instanceof Error ? error.message : String(error);
throw new InterviewError(
`Failed to initialize database: ${message}`,
ErrorCodes.DATABASE_ERROR,
);
}
}
private static readonly QUERIES = {
CREATE_INTERVIEWS_TABLE: `
CREATE TABLE IF NOT EXISTS interviews (
interviewId TEXT PRIMARY KEY,
title TEXT NOT NULL,
skills TEXT NOT NULL,
createdAt INTEGER NOT NULL DEFAULT (strftime('%s','now') * 1000),
updatedAt INTEGER NOT NULL DEFAULT (strftime('%s','now') * 1000),
status TEXT NOT NULL DEFAULT 'pending'
)
`,
CREATE_MESSAGES_TABLE: `
CREATE TABLE IF NOT EXISTS messages (
messageId TEXT PRIMARY KEY,
interviewId TEXT NOT NULL,
role TEXT NOT NULL,
content TEXT NOT NULL,
timestamp INTEGER NOT NULL,
FOREIGN KEY (interviewId) REFERENCES interviews(interviewId)
)
`,
CREATE_MESSAGE_INDEX: `
CREATE INDEX IF NOT EXISTS idx_messages_interview ON messages(interviewId)
`,
};
}
```
Update the `Interview` Durable Object to use the database service by modifying `src/interview.ts`:
```typescript title="src/interview.ts"
import { InterviewDatabaseService } from "./services/InterviewDatabaseService";
export class Interview extends DurableObject {
// Database service for persistent storage of interview data and messages
private readonly db: InterviewDatabaseService;
private sessions: Map;
constructor(state: DurableObjectState, env: CloudflareBindings) {
// ... previous code ...
// Set up our database connection using the DO's built-in SQLite instance
this.db = new InterviewDatabaseService(state.storage.sql);
// First-time setup: ensure our database tables exist
// This is idempotent so safe to call on every instantiation
this.db.createTables();
}
}
```
Add methods to create and retrieve interviews in `services/InterviewDatabaseService.ts`:
```typescript title="src/services/InterviewDatabaseService.ts"
export class InterviewDatabaseService {
/**
* Creates a new interview session in the database.
*
* This is the main entry point for starting a new interview. It handles all the
* initial setup like:
* - Generating a unique ID using crypto.randomUUID() for reliable uniqueness
* - Recording the interview title and required skills
* - Setting up timestamps for tracking interview lifecycle
* - Setting the initial status to "Created"
*
*/
createInterview(title: InterviewTitle, skills: InterviewSkill[]): string {
try {
const interviewId = crypto.randomUUID();
const currentTime = Date.now();
this.sql.exec(
InterviewDatabaseService.QUERIES.INSERT_INTERVIEW,
interviewId,
title,
JSON.stringify(skills), // Store skills as JSON for flexibility
InterviewStatus.Created,
currentTime,
currentTime,
);
return interviewId;
} catch (error: unknown) {
const message = error instanceof Error ? error.message : String(error);
throw new InterviewError(
`Failed to create interview: ${message}`,
ErrorCodes.DATABASE_ERROR,
);
}
}
/**
* Fetches all interviews from the database, ordered by creation date.
*
* This is useful for displaying interview history and letting users
* resume previous sessions. We order by descending creation date since
* users typically want to see their most recent interviews first.
*
* Returns an array of InterviewData objects with full interview details
* including metadata and message history.
*/
getAllInterviews(): InterviewData[] {
try {
const cursor = this.sql.exec(
InterviewDatabaseService.QUERIES.GET_ALL_INTERVIEWS,
);
return [...cursor].map(this.parseInterviewRecord);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
throw new InterviewError(
`Failed to retrieve interviews: ${message}`,
ErrorCodes.DATABASE_ERROR,
);
}
}
// Retrieves an interview and its messages by ID
getInterview(interviewId: string): InterviewData | null {
try {
const cursor = this.sql.exec(
InterviewDatabaseService.QUERIES.GET_INTERVIEW,
interviewId,
);
const record = [...cursor][0];
if (!record) return null;
return this.parseInterviewRecord(record);
} catch (error: unknown) {
const message = error instanceof Error ? error.message : String(error);
throw new InterviewError(
`Failed to retrieve interview: ${message}`,
ErrorCodes.DATABASE_ERROR,
);
}
}
addMessage(
interviewId: string,
role: Message["role"],
content: string,
messageId: string,
): Message {
try {
const timestamp = Date.now();
this.sql.exec(
InterviewDatabaseService.QUERIES.INSERT_MESSAGE,
messageId,
interviewId,
role,
content,
timestamp,
);
return {
messageId,
interviewId,
role,
content,
timestamp,
};
} catch (error: unknown) {
const message = error instanceof Error ? error.message : String(error);
throw new InterviewError(
`Failed to add message: ${message}`,
ErrorCodes.DATABASE_ERROR,
);
}
}
/**
* Transforms raw database records into structured InterviewData objects.
*
* This helper does the heavy lifting of:
* - Type checking critical fields to catch database corruption early
* - Converting stored JSON strings back into proper objects
* - Filtering out any null messages that might have snuck in
* - Ensuring timestamps are proper numbers
*
* If any required data is missing or malformed, it throws an error
* rather than returning partially valid data that could cause issues
* downstream.
*/
private parseInterviewRecord(record: any): InterviewData {
const interviewId = record.interviewId as string;
const createdAt = Number(record.createdAt);
const updatedAt = Number(record.updatedAt);
if (!interviewId || !createdAt || !updatedAt) {
throw new InterviewError(
"Invalid interview data in database",
ErrorCodes.DATABASE_ERROR,
);
}
return {
interviewId,
title: record.title as InterviewTitle,
skills: JSON.parse(record.skills as string) as InterviewSkill[],
messages: record.messages
? JSON.parse(record.messages)
.filter((m: any) => m !== null)
.map((m: any) => ({
messageId: m.messageId,
role: m.role,
content: m.content,
timestamp: m.timestamp,
}))
: [],
status: record.status as InterviewStatus,
createdAt,
updatedAt,
};
}
// Add these SQL queries to the QUERIES object
private static readonly QUERIES = {
// ... previous queries ...
INSERT_INTERVIEW: `
INSERT INTO ${CONFIG.database.tables.interviews}
(interviewId, title, skills, status, createdAt, updatedAt)
VALUES (?, ?, ?, ?, ?, ?)
`,
GET_ALL_INTERVIEWS: `
SELECT
interviewId,
title,
skills,
createdAt,
updatedAt,
status
FROM ${CONFIG.database.tables.interviews}
ORDER BY createdAt DESC
`,
INSERT_MESSAGE: `
INSERT INTO ${CONFIG.database.tables.messages}
(messageId, interviewId, role, content, timestamp)
VALUES (?, ?, ?, ?, ?)
`,
GET_INTERVIEW: `
SELECT
i.interviewId,
i.title,
i.skills,
i.status,
i.createdAt,
i.updatedAt,
COALESCE(
json_group_array(
CASE WHEN m.messageId IS NOT NULL THEN
json_object(
'messageId', m.messageId,
'role', m.role,
'content', m.content,
'timestamp', m.timestamp
)
END
),
'[]'
) as messages
FROM ${CONFIG.database.tables.interviews} i
LEFT JOIN ${CONFIG.database.tables.messages} m ON i.interviewId = m.interviewId
WHERE i.interviewId = ?
GROUP BY i.interviewId
`,
};
}
```
Add RPC methods to the `Interview` Durable Object to expose database operations through API. Add this code to `src/interview.ts`:
```typescript title="src/interview.ts"
import {
InterviewData,
InterviewTitle,
InterviewSkill,
Message,
} from "./types";
export class Interview extends DurableObject {
// Creates a new interview session
createInterview(title: InterviewTitle, skills: InterviewSkill[]): string {
return this.db.createInterview(title, skills);
}
// Retrieves all interview sessions
getAllInterviews(): InterviewData[] {
return this.db.getAllInterviews();
}
// Adds a new message to the 'messages' table and broadcasts it to all connected WebSocket clients.
addMessage(
interviewId: string,
role: "user" | "assistant",
content: string,
messageId: string,
): Message {
const newMessage = this.db.addMessage(
interviewId,
role,
content,
messageId,
);
this.broadcast(
JSON.stringify({
...newMessage,
type: "message",
}),
);
return newMessage;
}
}
```
## 6. Create REST API endpoints
With your Durable Object and database service ready, create REST API endpoints to manage interviews. You will need endpoints to:
- Create new interviews
- Retrieve all interviews for a user
Create a new file for your interview routes at `routes/interview.ts`:
```typescript title="src/routes/interview.ts"
import { Hono } from "hono";
import { BadRequestError } from "../errors";
import {
InterviewInput,
ApiContext,
HonoCtx,
InterviewTitle,
InterviewSkill,
} from "../types";
import { requireAuth } from "../middleware/auth";
/**
* Gets the Interview Durable Object instance for a given user.
* We use the username as a stable identifier to ensure each user
* gets their own dedicated DO instance that persists across requests.
*/
const getInterviewDO = (ctx: HonoCtx) => {
const username = ctx.get("username");
const id = ctx.env.INTERVIEW.idFromName(username);
return ctx.env.INTERVIEW.get(id);
};
/**
* Validates the interview creation payload.
* Makes sure we have all required fields in the correct format:
* - title must be present
* - skills must be a non-empty array
* Throws an error if validation fails.
*/
const validateInterviewInput = (input: InterviewInput) => {
if (
!input.title ||
!input.skills ||
!Array.isArray(input.skills) ||
input.skills.length === 0
) {
throw new BadRequestError("Invalid input");
}
};
/**
* GET /interviews
* Retrieves all interviews for the authenticated user.
* The interviews are stored and managed by the user's DO instance.
*/
const getAllInterviews = async (ctx: HonoCtx) => {
const interviewDO = getInterviewDO(ctx);
const interviews = await interviewDO.getAllInterviews();
return ctx.json(interviews);
};
/**
* POST /interviews
* Creates a new interview session with the specified title and skills.
* Each interview gets a unique ID that can be used to reference it later.
* Returns the newly created interview ID on success.
*/
const createInterview = async (ctx: HonoCtx) => {
const body = await ctx.req.json();
validateInterviewInput(body);
const interviewDO = getInterviewDO(ctx);
const interviewId = await interviewDO.createInterview(
body.title as InterviewTitle,
body.skills as InterviewSkill[],
);
return ctx.json({ success: true, interviewId });
};
/**
* Sets up all interview-related routes.
* Currently supports:
* - GET / : List all interviews
* - POST / : Create a new interview
*/
export const configureInterviewRoutes = () => {
const router = new Hono();
router.use("*", requireAuth);
router.get("/", getAllInterviews);
router.post("/", createInterview);
return router;
};
```
The `getInterviewDO` helper function uses the username from our authentication cookie to create a unique Durable Object ID. This ensures each user has their own isolated interview state.
Update your main application file to include the routes and protect them with authentication middleware. Update `src/index.ts`:
```typescript title="src/index.ts"
import { configureAuthRoutes } from "./routes/auth";
import { configureInterviewRoutes } from "./routes/interview";
import { Hono } from "hono";
import { Interview } from "./interview";
import { logger } from "hono/logger";
import type { ApiContext } from "./types";
const app = new Hono();
const api = new Hono();
app.use("*", logger());
api.route("/auth", configureAuthRoutes());
api.route("/interviews", configureInterviewRoutes());
app.route("/api/v1", api);
export { Interview };
export default app;
```
Now you have two new API endpoints:
- `POST /api/v1/interviews`: Creates a new interview session
- `GET /api/v1/interviews`: Retrieves all interviews for the authenticated user
You can test these endpoints running the following command:
1. Create a new interview:
```sh
curl -X POST http://localhost:8787/api/v1/interviews \
-H "Content-Type: application/json" \
-H "Cookie: username=testuser; HttpOnly" \
-d '{"title":"Frontend Developer Interview","skills":["JavaScript","React","CSS"]}'
```
2. Get all interviews:
```sh
curl http://localhost:8787/api/v1/interviews \
-H "Cookie: username=testuser; HttpOnly"
```
## 7. Set up WebSockets to handle real-time communication
With the basic interview management system in place, you will now implement Durable Objects to handle real-time message processing and maintain WebSocket connections.
Update the `Interview` Durable Object to handle WebSocket connections by adding the following code to `src/interview.ts`:
```typescript
export class Interview extends DurableObject {
// Services for database operations and managing WebSocket sessions
private readonly db: InterviewDatabaseService;
private sessions: Map;
constructor(state: DurableObjectState, env: CloudflareBindings) {
// ... previous code ...
// Keep WebSocket connections alive by automatically responding to pings
// This prevents timeouts and connection drops
this.ctx.setWebSocketAutoResponse(
new WebSocketRequestResponsePair("ping", "pong"),
);
}
async fetch(request: Request): Promise {
// Check if this is a WebSocket upgrade request
const upgradeHeader = request.headers.get("Upgrade");
if (upgradeHeader?.toLowerCase().includes("websocket")) {
return this.handleWebSocketUpgrade(request);
}
// If it is not a WebSocket request, we don't handle it
return new Response("Not found", { status: 404 });
}
private async handleWebSocketUpgrade(request: Request): Promise {
// Extract the interview ID from the URL - it should be the last segment
const url = new URL(request.url);
const interviewId = url.pathname.split("/").pop();
if (!interviewId) {
return new Response("Missing interviewId parameter", { status: 400 });
}
// Create a new WebSocket connection pair - one for the client, one for the server
const pair = new WebSocketPair();
const [client, server] = Object.values(pair);
// Keep track of which interview this WebSocket is connected to
// This is important for routing messages to the right interview session
this.sessions.set(server, { interviewId });
// Tell the Durable Object to start handling this WebSocket
this.ctx.acceptWebSocket(server);
// Send the current interview state to the client right away
// This helps initialize their UI with the latest data
const interviewData = await this.db.getInterview(interviewId);
if (interviewData) {
server.send(
JSON.stringify({
type: "interview_details",
data: interviewData,
}),
);
}
// Return the client WebSocket as part of the upgrade response
return new Response(null, {
status: 101,
webSocket: client,
});
}
async webSocketClose(
ws: WebSocket,
code: number,
reason: string,
wasClean: boolean,
) {
// Clean up when a connection closes to prevent memory leaks
// This is especially important in long-running Durable Objects
console.log(
`WebSocket closed: Code ${code}, Reason: ${reason}, Clean: ${wasClean}`,
);
}
}
```
Next, update the interview routes to include a WebSocket endpoint. Add the following to `routes/interview.ts`:
```typescript title="src/routes/interview.ts"
// ... previous code ...
const streamInterviewProcess = async (ctx: HonoCtx) => {
const interviewDO = getInterviewDO(ctx);
return await interviewDO.fetch(ctx.req.raw);
};
export const configureInterviewRoutes = () => {
const router = new Hono();
router.get("/", getAllInterviews);
router.post("/", createInterview);
// Add WebSocket route
router.get("/:interviewId", streamInterviewProcess);
return router;
};
```
The WebSocket system provides real-time communication features for interview practice tool:
- Each interview session gets its own dedicated WebSocket connection, allowing seamless communication between the candidate and AI interviewer
- The Durable Object maintains the connection state, ensuring no messages are lost even if the client temporarily disconnects
- To keep connections stable, it automatically responds to ping messages with pongs, preventing timeouts
- Candidates and interviewers receive instant updates as the interview progresses, creating a natural conversational flow
## 8. Add audio processing capabilities with Workers AI
Now that WebSocket connection set up, the next step is to add speech-to-text capabilities using Workers AI. Let's use Cloudflare's Whisper model to transcribe audio in real-time during the interview.
The audio processing pipeline will work like this:
1. Client sends audio through the WebSocket connection
2. Our Durable Object receives the binary audio data
3. We pass the audio to Whisper for transcription
4. The transcribed text is saved as a new message
5. We immediately send the transcription back to the client
6. The client receives a notification that the AI interviewer is generating a response
### Create audio processing pipeline
In this step you will update the Interview Durable Object to handle the following:
1. Detect binary audio data sent through WebSocket
2. Create a unique message ID for tracking the processing status
3. Notify clients that audio processing has begun
4. Include error handling for failed audio processing
5. Broadcast status updates to all connected clients
First, update Interview Durable Object to handle binary WebSocket messages. Add the following methods to your `src/interview.ts` file:
```typescript title="src/interview.ts"
// ... previous code ...
/**
* Handles incoming WebSocket messages, both binary audio data and text messages.
* This is the main entry point for all WebSocket communication.
*/
async webSocketMessage(ws: WebSocket, eventData: ArrayBuffer | string): Promise {
try {
// Handle binary audio data from the client's microphone
if (eventData instanceof ArrayBuffer) {
await this.handleBinaryAudio(ws, eventData);
return;
}
// Text messages will be handled by other methods
} catch (error) {
this.handleWebSocketError(ws, error);
}
}
/**
* Processes binary audio data received from the client.
* Converts audio to text using Whisper and broadcasts processing status.
*/
private async handleBinaryAudio(ws: WebSocket, audioData: ArrayBuffer): Promise {
try {
const uint8Array = new Uint8Array(audioData);
// Retrieve the associated interview session
const session = this.sessions.get(ws);
if (!session?.interviewId) {
throw new Error("No interview session found");
}
// Generate unique ID to track this message through the system
const messageId = crypto.randomUUID();
// Let the client know we're processing their audio
this.broadcast(
JSON.stringify({
type: "message",
status: "processing",
role: "user",
messageId,
interviewId: session.interviewId,
}),
);
// TODO: Implement Whisper transcription in next section
// For now, just log the received audio data size
console.log(`Received audio data of length: ${uint8Array.length}`);
} catch (error) {
console.error("Audio processing failed:", error);
this.handleWebSocketError(ws, error);
}
}
/**
* Handles WebSocket errors by logging them and notifying the client.
* Ensures errors are properly communicated back to the user.
*/
private handleWebSocketError(ws: WebSocket, error: unknown): void {
const errorMessage = error instanceof Error ? error.message : "An unknown error occurred.";
console.error("WebSocket error:", errorMessage);
if (ws.readyState === WebSocket.OPEN) {
ws.send(
JSON.stringify({
type: "error",
message: errorMessage,
}),
);
}
}
```
Your `handleBinaryAudio` method currently logs when it receives audio data. Next, you'll enhance it to transcribe speech using Workers AI's Whisper model.
### Configure speech-to-text
Now that audio processing pipeline is set up, you will now integrate Workers AI's Whisper model for speech-to-text transcription.
Configure the Worker AI binding in your Wrangler file by adding:
```toml
# ... previous configuration ...
[ai]
binding = "AI"
```
Next, generate TypeScript types for our AI binding. Run the following command:
```sh
npm run cf-typegen
```
You will need a new service class for AI operations. Create a new file called `services/AIService.ts`:
```typescript title="src/services/AIService.ts"
import { InterviewError, ErrorCodes } from "../errors";
export class AIService {
constructor(private readonly AI: Ai) {}
async transcribeAudio(audioData: Uint8Array): Promise {
try {
// Call the Whisper model to transcribe the audio
const response = await this.AI.run("@cf/openai/whisper-tiny-en", {
audio: Array.from(audioData),
});
if (!response?.text) {
throw new Error("Failed to transcribe audio content.");
}
return response.text;
} catch (error) {
throw new InterviewError(
"Failed to transcribe audio content",
ErrorCodes.TRANSCRIPTION_FAILED,
);
}
}
}
```
You will need to update the `Interview` Durable Object to use this new AI service. To do this, update the handleBinaryAudio method in `src/interview.ts`:
```typescript title="src/interview.ts"
import { AIService } from "./services/AIService";
export class Interview extends DurableObject {
private readonly aiService: AIService;
constructor(state: DurableObjectState, env: Env) {
// ... previous code ...
// Initialize the AI service with the Workers AI binding
this.aiService = new AIService(this.env.AI);
}
private async handleBinaryAudio(ws: WebSocket, audioData: ArrayBuffer): Promise {
try {
const uint8Array = new Uint8Array(audioData);
const session = this.sessions.get(ws);
if (!session?.interviewId) {
throw new Error("No interview session found");
}
// Create a message ID for tracking
const messageId = crypto.randomUUID();
// Send processing state to client
this.broadcast(
JSON.stringify({
type: "message",
status: "processing",
role: "user",
messageId,
interviewId: session.interviewId,
}),
);
// NEW: Use AI service to transcribe the audio
const transcribedText = await this.aiService.transcribeAudio(uint8Array);
// Store the transcribed message
await this.addMessage(session.interviewId, "user", transcribedText, messageId);
} catch (error) {
console.error("Audio processing failed:", error);
this.handleWebSocketError(ws, error);
}
}
```
:::note
The Whisper model `@cf/openai/whisper-tiny-en` is optimized for English speech recognition. If you need support for other languages, you can use different Whisper model variants available through Workers AI.
:::
When users speak during the interview, their audio will be automatically transcribed and stored as messages in the interview session. The transcribed text will be immediately available to both the user and the AI interviewer for generating appropriate responses.
## 9. Integrate AI response generation
Now that you have audio transcription working, let's implement AI interviewer response generation using Workers AI's LLM capabilities. You'll create an interview system that:
- Maintains context of the conversation
- Provides relevant follow-up questions
- Gives constructive feedback
- Stays in character as a professional interviewer
### Set up Workers AI LLM integration
First, update the `AIService` class to handle LLM interactions. You will need to add methods for:
- Processing interview context
- Generating appropriate responses
- Handling conversation flow
Update the `services/AIService.ts` class to include LLM functionality:
```typescript title="src/services/AIService.ts"
import { InterviewData, Message } from "../types";
export class AIService {
async processLLMResponse(interview: InterviewData): Promise {
const messages = this.prepareLLMMessages(interview);
try {
const { response } = await this.AI.run("@cf/meta/llama-2-7b-chat-int8", {
messages,
});
if (!response) {
throw new Error("Failed to generate a response from the LLM model.");
}
return response;
} catch (error) {
throw new InterviewError("Failed to generate a response from the LLM model.", ErrorCodes.LLM_FAILED);
}
}
private prepareLLMMessages(interview: InterviewData) {
const messageHistory = interview.messages.map((msg: Message) => ({
role: msg.role,
content: msg.content,
}));
return [
{
role: "system",
content: this.createSystemPrompt(interview),
},
...messageHistory,
];
}
```
:::note
The @cf/meta/llama-2-7b-chat-int8 model is optimized for chat-like interactions and provides good performance while maintaining reasonable resource usage.
:::
### Create the conversation prompt
Prompt engineering is crucial for getting high-quality responses from the LLM. Next, you will create a system prompt that:
- Sets the context for the interview
- Defines the interviewer's role and behavior
- Specifies the technical focus areas
- Guides the conversation flow
Add the following method to your `services/AIService.ts` class:
```typescript title="src/services/AIService.ts"
private createSystemPrompt(interview: InterviewData): string {
const basePrompt = "You are conducting a technical interview.";
const rolePrompt = `The position is for ${interview.title}.`;
const skillsPrompt = `Focus on topics related to: ${interview.skills.join(", ")}.`;
const instructionsPrompt = "Ask relevant technical questions and provide constructive feedback.";
return `${basePrompt} ${rolePrompt} ${skillsPrompt} ${instructionsPrompt}`;
}
```
### Implement response generation logic
Finally, integrate the LLM response generation into the interview flow. Update the `handleBinaryAudio` method in the `src/interview.ts` Durable Object to:
- Process transcribed user responses
- Generate appropriate AI interviewer responses
- Maintain conversation context
Update the `handleBinaryAudio` method in `src/interview.ts`:
```typescript title="src/interview.ts"
private async handleBinaryAudio(ws: WebSocket, audioData: ArrayBuffer): Promise {
try {
// Convert raw audio buffer to uint8 array for processing
const uint8Array = new Uint8Array(audioData);
const session = this.sessions.get(ws);
if (!session?.interviewId) {
throw new Error("No interview session found");
}
// Generate a unique ID to track this message through the system
const messageId = crypto.randomUUID();
// Let the client know we're processing their audio
// This helps provide immediate feedback while transcription runs
this.broadcast(
JSON.stringify({
type: "message",
status: "processing",
role: "user",
messageId,
interviewId: session.interviewId,
}),
);
// Convert the audio to text using our AI transcription service
// This typically takes 1-2 seconds for normal speech
const transcribedText = await this.aiService.transcribeAudio(uint8Array);
// Save the user's message to our database so we maintain chat history
await this.addMessage(session.interviewId, "user", transcribedText, messageId);
// Look up the full interview context - we need this to generate a good response
const interview = await this.db.getInterview(session.interviewId);
if (!interview) {
throw new Error(`Interview not found: ${session.interviewId}`);
}
// Now it's the AI's turn to respond
// First generate an ID for the assistant's message
const assistantMessageId = crypto.randomUUID();
// Let the client know we're working on the AI response
this.broadcast(
JSON.stringify({
type: "message",
status: "processing",
role: "assistant",
messageId: assistantMessageId,
interviewId: session.interviewId,
}),
);
// Generate the AI interviewer's response based on the conversation history
const llmResponse = await this.aiService.processLLMResponse(interview);
await this.addMessage(session.interviewId, "assistant", llmResponse, assistantMessageId);
} catch (error) {
// Something went wrong processing the audio or generating a response
// Log it and let the client know there was an error
console.error("Audio processing failed:", error);
this.handleWebSocketError(ws, error);
}
}
```
## Conclusion
You have successfully built an AI-powered interview practice tool using Cloudflare's Workers AI. In summary, you have:
- Created a real-time WebSocket communication system using Durable Objects
- Implemented speech-to-text processing with Workers AI Whisper model
- Built an intelligent interview system using Workers AI LLM capabilities
- Designed a persistent storage system with SQLite in Durable Objects
The complete source code for this tutorial is available on GitHub:
[ai-interview-practice-tool](https://github.com/berezovyy/ai-interview-practice-tool)
---
# Explore Code Generation Using DeepSeek Coder Models
URL: https://developers.cloudflare.com/workers-ai/tutorials/explore-code-generation-using-deepseek-coder-models/
import { Stream } from "~/components"
A handy way to explore all of the models available on [Workers AI](/workers-ai) is to use a [Jupyter Notebook](https://jupyter.org/).
You can [download the DeepSeek Coder notebook](/workers-ai/static/documentation/notebooks/deepseek-coder-exploration.ipynb) or view the embedded notebook below.
[comment]: <> "The markdown below is auto-generated from https://github.com/craigsdennis/notebooks-cloudflare-workers-ai"
***
## Exploring Code Generation Using DeepSeek Coder
AI Models being able to generate code unlocks all sorts of use cases. The [DeepSeek Coder](https://github.com/deepseek-ai/DeepSeek-Coder) models `@hf/thebloke/deepseek-coder-6.7b-base-awq` and `@hf/thebloke/deepseek-coder-6.7b-instruct-awq` are now available on [Workers AI](/workers-ai).
Let's explore them using the API!
```python
import sys
!{sys.executable} -m pip install requests python-dotenv
```
```
Requirement already satisfied: requests in ./venv/lib/python3.12/site-packages (2.31.0)
Requirement already satisfied: python-dotenv in ./venv/lib/python3.12/site-packages (1.0.1)
Requirement already satisfied: charset-normalizer<4,>=2 in ./venv/lib/python3.12/site-packages (from requests) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.12/site-packages (from requests) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./venv/lib/python3.12/site-packages (from requests) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.12/site-packages (from requests) (2023.11.17)
```
```python
import os
from getpass import getpass
from IPython.display import display, Image, Markdown, Audio
import requests
```
```python
%load_ext dotenv
%dotenv
```
### Configuring your environment
To use the API you'll need your [Cloudflare Account ID](https://dash.cloudflare.com) (head to Workers & Pages > Overview > Account details > Account ID) and a [Workers AI enabled API Token](https://dash.cloudflare.com/profile/api-tokens).
If you want to add these files to your environment, you can create a new file named `.env`
```bash
CLOUDFLARE_API_TOKEN="YOUR-TOKEN"
CLOUDFLARE_ACCOUNT_ID="YOUR-ACCOUNT-ID"
```
```python
if "CLOUDFLARE_API_TOKEN" in os.environ:
api_token = os.environ["CLOUDFLARE_API_TOKEN"]
else:
api_token = getpass("Enter you Cloudflare API Token")
```
```python
if "CLOUDFLARE_ACCOUNT_ID" in os.environ:
account_id = os.environ["CLOUDFLARE_ACCOUNT_ID"]
else:
account_id = getpass("Enter your account id")
```
### Generate code from a comment
A common use case is to complete the code for the user after they provide a descriptive comment.
````python
model = "@hf/thebloke/deepseek-coder-6.7b-base-awq"
prompt = "# A function that checks if a given word is a palindrome"
response = requests.post(
f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
headers={"Authorization": f"Bearer {api_token}"},
json={"messages": [
{"role": "user", "content": prompt}
]}
)
inference = response.json()
code = inference["result"]["response"]
display(Markdown(f"""
```python
{prompt}
{code.strip()}
```
"""))
````
```python
# A function that checks if a given word is a palindrome
def is_palindrome(word):
# Convert the word to lowercase
word = word.lower()
# Reverse the word
reversed_word = word[::-1]
# Check if the reversed word is the same as the original word
if word == reversed_word:
return True
else:
return False
# Test the function
print(is_palindrome("racecar")) # Output: True
print(is_palindrome("hello")) # Output: False
```
### Assist in debugging
We've all been there, bugs happen. Sometimes those stacktraces can be very intimidating, and a great use case of using Code Generation is to assist in explaining the problem.
```python
model = "@hf/thebloke/deepseek-coder-6.7b-instruct-awq"
system_message = "The user is going to give you code that isn't working. Explain to the user what might be wrong"
code = """# Welcomes our user
def hello_world(first_name="World"):
print(f"Hello, {name}!")
"""
response = requests.post(
f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
headers={"Authorization": f"Bearer {api_token}"},
json={"messages": [
{"role": "system", "content": system_message},
{"role": "user", "content": code},
]}
)
inference = response.json()
response = inference["result"]["response"]
display(Markdown(response))
```
The error in your code is that you are trying to use a variable `name` which is not defined anywhere in your function. The correct variable to use is `first_name`. So, you should change `f"Hello, {name}!"` to `f"Hello, {first_name}!"`.
Here is the corrected code:
```python
# Welcomes our user
def hello_world(first_name="World"):
print(f"Hello, {first_name}")
```
Now, when you call `hello_world()`, it will print "Hello, World" by default. If you call `hello_world("John")`, it will print "Hello, John".
### Write tests!
Writing unit tests is a common best practice. With the enough context, it's possible to write unit tests.
```python
model = "@hf/thebloke/deepseek-coder-6.7b-instruct-awq"
system_message = "The user is going to give you code and would like to have tests written in the Python unittest module."
code = """
class User:
def __init__(self, first_name, last_name=None):
self.first_name = first_name
self.last_name = last_name
if last_name is None:
self.last_name = "Mc" + self.first_name
def full_name(self):
return self.first_name + " " + self.last_name
"""
response = requests.post(
f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
headers={"Authorization": f"Bearer {api_token}"},
json={"messages": [
{"role": "system", "content": system_message},
{"role": "user", "content": code},
]}
)
inference = response.json()
response = inference["result"]["response"]
display(Markdown(response))
```
Here is a simple unittest test case for the User class:
```python
import unittest
class TestUser(unittest.TestCase):
def test_full_name(self):
user = User("John", "Doe")
self.assertEqual(user.full_name(), "John Doe")
def test_default_last_name(self):
user = User("Jane")
self.assertEqual(user.full_name(), "Jane McJane")
if __name__ == '__main__':
unittest.main()
```
In this test case, we have two tests:
* `test_full_name` tests the `full_name` method when the user has both a first name and a last name.
* `test_default_last_name` tests the `full_name` method when the user only has a first name and the last name is set to "Mc" + first name.
If all these tests pass, it means that the `full_name` method is working as expected. If any of these tests fail, it
### Fill-in-the-middle Code Completion
A common use case in Developer Tools is to autocomplete based on context. DeepSeek Coder provides the ability to submit existing code with a placeholder, so that the model can complete in context.
Warning: The tokens are prefixed with `<๏ฝ` and suffixed with `๏ฝ>` make sure to copy and paste them.
````python
model = "@hf/thebloke/deepseek-coder-6.7b-base-awq"
code = """
<๏ฝfimโbegin๏ฝ>import re
from jklol import email_service
def send_email(email_address, body):
<๏ฝfimโhole๏ฝ>
if not is_valid_email:
raise InvalidEmailAddress(email_address)
return email_service.send(email_address, body)<๏ฝfimโend๏ฝ>
"""
response = requests.post(
f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
headers={"Authorization": f"Bearer {api_token}"},
json={"messages": [
{"role": "user", "content": code}
]}
)
inference = response.json()
response = inference["result"]["response"]
display(Markdown(f"""
```python
{response.strip()}
```
"""))
````
```python
is_valid_email = re.match(r"[^@]+@[^@]+\.[^@]+", email_address)
```
### Experimental: Extract data into JSON
No need to threaten the model or bring grandma into the prompt. Get back JSON in the format you want.
````python
model = "@hf/thebloke/deepseek-coder-6.7b-instruct-awq"
# Learn more at https://json-schema.org/
json_schema = """
{
"title": "User",
"description": "A user from our example app",
"type": "object",
"properties": {
"firstName": {
"description": "The user's first name",
"type": "string"
},
"lastName": {
"description": "The user's last name",
"type": "string"
},
"numKids": {
"description": "Amount of children the user has currently",
"type": "integer"
},
"interests": {
"description": "A list of what the user has shown interest in",
"type": "array",
"items": {
"type": "string"
}
},
},
"required": [ "firstName" ]
}
"""
system_prompt = f"""
The user is going to discuss themselves and you should create a JSON object from their description to match the json schema below.
{json_schema}
Return JSON only. Do not explain or provide usage examples.
"""
prompt = """Hey there, I'm Craig Dennis and I'm a Developer Educator at Cloudflare. My email is craig@cloudflare.com.
I am very interested in AI. I've got two kids. I love tacos, burritos, and all things Cloudflare"""
response = requests.post(
f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
headers={"Authorization": f"Bearer {api_token}"},
json={"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt}
]}
)
inference = response.json()
response = inference["result"]["response"]
display(Markdown(f"""
```json
{response.strip()}
```
"""))
````
```json
{
"firstName": "Craig",
"lastName": "Dennis",
"numKids": 2,
"interests": ["AI", "Cloudflare", "Tacos", "Burritos"]
}
```
---
# Explore Workers AI Models Using a Jupyter Notebook
URL: https://developers.cloudflare.com/workers-ai/tutorials/explore-workers-ai-models-using-a-jupyter-notebook/
import { Stream } from "~/components"
A handy way to explore all of the models available on [Workers AI](/workers-ai) is to use a [Jupyter Notebook](https://jupyter.org/).
You can [download the Workers AI notebook](/workers-ai-notebooks/cloudflare-workers-ai.ipynb) or view the embedded notebook below.
Or you can run this on [Google Colab](https://colab.research.google.com/github/craigsdennis/notebooks-cloudflare-workers-ai/blob/main/cloudflare-workers-ai.ipynb)
[comment]: <> "The markdown below is auto-generated from https://github.com/craigsdennis/notebooks-cloudflare-workers-ai the