# Architectures
URL: https://developers.cloudflare.com/vectorize/demos/
import { GlossaryTooltip, ResourcesBySelector } from "~/components"
Learn how you can use Vectorize within your existing architecture.
## Reference architectures
Explore the following reference architectures that use Vectorize:
---
# Overview
URL: https://developers.cloudflare.com/vectorize/
import { CardGrid, Description, Feature, LinkTitleCard, Plan, RelatedProduct, Render } from "~/components"
Build full-stack AI applications with Vectorize, Cloudflare's powerful vector database.
Vectorize is a globally distributed vector database that enables you to build full-stack, AI-powered applications with [Cloudflare Workers](/workers/). Vectorize makes querying embeddings — representations of values or objects like text, images, audio that are designed to be consumed by machine learning models and semantic search algorithms — faster, easier and more affordable.
For example, by storing the embeddings (vectors) generated by a machine learning model, including those built-in to [Workers AI](/workers-ai/) or by bringing your own from platforms like [OpenAI](#), you can build applications with powerful search, similarity, recommendation, classification and/or anomaly detection capabilities based on your own data.
The vectors returned can reference images stored in Cloudflare R2, documents in KV, and/or user profiles stored in D1 — enabling you to go from vector search result to concrete object all within the Workers platform, and without standing up additional infrastructure.
***
## Features
Learn how to create your first Vectorize database, upload vector embeddings, and query those embeddings from [Cloudflare Workers](/workers/).
Learn how to use Vectorize to generate vector embeddings using Workers AI.
***
## Related products
Run machine learning models, powered by serverless GPUs, on Cloudflare’s global network.
Store large amounts of unstructured data without the costly egress bandwidth fees associated with typical cloud storage services.
***
## More resources
Learn about Vectorize limits and how to work within them.
Learn how you can build and deploy ambitious AI applications to Cloudflare's global network.
Learn more about the storage and database options you can build on with Workers.
Connect with the Workers community on Discord to ask questions, join the `#vectorize` channel to show what you are building, and discuss the platform with other developers.
Follow @CloudflareDev on Twitter to learn about product announcements, and what is new in Cloudflare Developer Platform.
---
# Create indexes
URL: https://developers.cloudflare.com/vectorize/best-practices/create-indexes/
import { Render } from "~/components";
Indexes are the "atom" of Vectorize. Vectors are inserted into an index and enable you to query the index for similar vectors for a given input vector.
Creating an index requires three inputs:
- A name, for example `prod-search-index` or `recommendations-idx-dev`.
- The (fixed) [dimension size](#dimensions) of each vector, for example 384 or 1536.
- The (fixed) [distance metric](#distance-metrics) to use for calculating vector similarity.
An index cannot be created using the same name as an index that is currently active on your account. However, an index can be created with a name that belonged to an index that has been deleted.
The configuration of an index cannot be changed after creation.
## Create an index
### wrangler CLI
To create an index with `wrangler`:
```sh
npx wrangler vectorize create your-index-name --dimensions=NUM_DIMENSIONS --metric=SELECTED_METRIC
```
To create an index that can accept vector embeddings from Worker's AI's [`@cf/baai/bge-base-en-v1.5`](/workers-ai/models/#text-embeddings) embedding model, which outputs vectors with 768 dimensions, use the following command:
```sh
npx wrangler vectorize create your-index-name --dimensions=768 --metric=cosine
```
### HTTP API
Vectorize also supports creating indexes via [REST API](/api/resources/vectorize/subresources/indexes/methods/create/).
For example, to create an index directly from a Python script:
```py
import requests
url = "https://api.cloudflare.com/client/v4/accounts/{}/vectorize/v2/indexes".format("your-account-id")
headers = {
"Authorization": "Bearer "
}
body = {
"name": "demo-index"
"description": "some index description",
"config": {
"dimensions": 1024,
"metric": "euclidean"
},
}
resp = requests.post(url, headers=headers, json=body)
print('Status Code:', resp.status_code)
print('Response JSON:', resp.json())
```
This script should print the response with a status code `201`, along with a JSON response body indicating the creation of an index with the provided configuration.
## Dimensions
Dimensions are determined from the output size of the machine learning (ML) model used to generate them, and are a function of how the model encodes and describes features into a vector embedding.
The number of output dimensions can determine vector search accuracy, search performance (latency), and the overall size of the index. Smaller output dimensions can be faster to search across, which can be useful for user-facing applications. Larger output dimensions can provide more accurate search, especially over larger datasets and/or datasets with substantially similar inputs.
The number of dimensions an index is created for cannot change. Indexes expect to receive dense vectors with the same number of dimensions.
The following table highlights some example embeddings models and their output dimensions:
| Model / Embeddings API | Output dimensions | Use-case |
| ---------------------------------------- | ----------------- | -------------------------- |
| Workers AI - `@cf/baai/bge-base-en-v1.5` | 768 | Text |
| OpenAI - `ada-002` | 1536 | Text |
| Cohere - `embed-multilingual-v2.0` | 768 | Text |
| Google Cloud - `multimodalembedding` | 1408 | Multi-modal (text, images) |
:::note[Learn more about Workers AI]
Refer to the [Workers AI documentation](/workers-ai/models/#text-embeddings) to learn about its built-in embedding models.
:::
## Distance metrics
Distance metrics are functions that determine how close vectors are from each other. Vectorize indexes support the following distance metrics:
| Metric | Details |
| ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `cosine` | Distance is measured between `-1` (most dissimilar) to `1` (identical). `0` denotes an orthogonal vector. |
| `euclidean` | Euclidean (L2) distance. `0` denotes identical vectors. The larger the positive number, the further the vectors are apart. |
| `dot-product` | Negative dot product. Larger negative values _or_ smaller positive values denote more similar vectors. A score of `-1000` is more similar than `-500`, and a score of `15` more similar than `50`. |
Determining the similarity between vectors can be subjective based on how the machine-learning model that represents features in the resulting vector embeddings. For example, a score of `0.8511` when using a `cosine` metric means that two vectors are close in distance, but whether data they represent is _similar_ is a function of how well the model is able to represent the original content.
When querying vectors, you can specify Vectorize to use either:
- High-precision scoring, which increases the precision of the query matches scores as well as the accuracy of the query results.
- Approximate scoring for faster response times. Using approximate scoring, returned scores will be an approximation of the real distance/similarity between your query and the returned vectors. Refer to [Control over scoring precision and query accuracy](/vectorize/best-practices/query-vectors/#control-over-scoring-precision-and-query-accuracy).
Distance metrics cannot be changed after index creation, and that each metric has a different scoring function.
---
# Best practices
URL: https://developers.cloudflare.com/vectorize/best-practices/
import { DirectoryListing } from "~/components"
---
# Insert vectors
URL: https://developers.cloudflare.com/vectorize/best-practices/insert-vectors/
import { Render } from "~/components";
Vectorize indexes allow you to insert vectors at any point: Vectorize will optimize the index behind the scenes to ensure that vector search remains efficient, even as new vectors are added or existing vectors updated.
:::note[Insert vs Upsert]
If the same vector id is _inserted_ twice in a Vectorize index, the index would reflect the vector that was added first.
If the same vector id is _upserted_ twice in a Vectorize index, the index would reflect the vector that was added last.
Use the upsert operation if you want to overwrite the vector value for a vector id that already exists in an index.
:::
## Supported vector formats
Vectorize supports vectors in three formats:
- An array of floating point numbers (converted into a JavaScript `number[]` array).
- A [Float32Array](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Float32Array)
- A [Float64Array](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Float64Array)
In most cases, a `number[]` array is the easiest when dealing with other APIs, and is the return type of most machine-learning APIs.
## Metadata
Metadata is an optional set of key-value pairs that can be attached to a vector on insert or upsert, and allows you to embed or co-locate data about the vector itself.
Metadata keys cannot be empty, contain the dot character (`.`), contain the double-quote character (`"`), or start with the dollar character (`$`).
Metadata can be used to:
- Include the object storage key, database UUID or other identifier to look up the content the vector embedding represents.
- Store JSON data (up to the [metadata limits](/vectorize/platform/limits/)), which can allow you to skip additional lookups for smaller content.
- Keep track of dates, timestamps, or other metadata that describes when the vector embedding was generated or how it was generated.
For example, a vector embedding representing an image could include the path to the [R2 object](/r2/) it was generated from, the format, and a category lookup:
```ts
{ id: '1', values: [32.4, 74.1, 3.2, ...], metadata: { path: 'r2://bucket-name/path/to/image.png', format: 'png', category: 'profile_image' } }
```
## Namespaces
Namespaces provide a way to segment the vectors within your index. For example, by customer, merchant or store ID.
To associate vectors with a namespace, you can optionally provide a `namespace: string` value when performing an insert or upsert operation. When querying, you can pass the namespace to search within as an optional parameter to your query.
A namespace can be up to 64 characters (bytes) in length and you can have up to 1,000 namespaces per index. Refer to the [Limits](/vectorize/platform/limits/) documentation for more details.
When a namespace is specified in a query operation, only vectors within that namespace are used for the search. Namespace filtering is applied before vector search, increasing the precision of the matched results.
To insert vectors with a namespace:
```ts
// Mock vectors
// Vectors from a machine-learning model are typically ~100 to 1536 dimensions
// wide (or wider still).
const sampleVectors: Array = [
{
id: "1",
values: [32.4, 74.1, 3.2, ...],
namespace: "text",
},
{
id: "2",
values: [15.1, 19.2, 15.8, ...],
namespace: "images",
},
{
id: "3",
values: [0.16, 1.2, 3.8, ...],
namespace: "pdfs",
},
];
// Insert your vectors, returning a count of the vectors inserted and their vector IDs.
let inserted = await env.TUTORIAL_INDEX.insert(sampleVectors);
```
To query vectors within a namespace:
```ts
// Your queryVector will be searched against vectors within the namespace (only)
let matches = await env.TUTORIAL_INDEX.query(queryVector, {
namespace: "images",
});
```
## Examples
### Workers API
Use the `insert()` and `upsert()` methods available on an index from within a Cloudflare Worker to insert vectors into the current index.
```ts
// Mock vectors
// Vectors from a machine-learning model are typically ~100 to 1536 dimensions
// wide (or wider still).
const sampleVectors: Array = [
{
id: "1",
values: [32.4, 74.1, 3.2, ...],
metadata: { url: "/products/sku/13913913" },
},
{
id: "2",
values: [15.1, 19.2, 15.8, ...],
metadata: { url: "/products/sku/10148191" },
},
{
id: "3",
values: [0.16, 1.2, 3.8, ...],
metadata: { url: "/products/sku/97913813" },
},
];
// Insert your vectors, returning a count of the vectors inserted and their vector IDs.
let inserted = await env.TUTORIAL_INDEX.insert(sampleVectors);
```
Refer to [Vectorize API](/vectorize/reference/client-api/) for additional examples.
### wrangler CLI
:::note[Cloudflare API rate limit]
Please use a maximum of 5000 vectors per embeddings.ndjson file to prevent the global [rate limit](/fundamentals/api/reference/limits/) for the Cloudflare API.
:::
You can bulk upload vector embeddings directly:
- The file must be in newline-delimited JSON (NDJSON format): each complete vector must be newline separated, and not within an array or object.
- Vectors must be complete and include a unique string `id` per vector.
An example NDJSON formatted file:
```json
{ "id": "4444", "values": [175.1, 167.1, 129.9], "metadata": {"url": "/products/sku/918318313"}}
{ "id": "5555", "values": [158.8, 116.7, 311.4], "metadata": {"url": "/products/sku/183183183"}}
{ "id": "6666", "values": [113.2, 67.5, 11.2], "metadata": {"url": "/products/sku/717313811"}}
```
```sh
wrangler vectorize insert --file=embeddings.ndjson
```
### HTTP API
Vectorize also supports inserting vectors via the [REST API](/api/resources/vectorize/subresources/indexes/methods/insert/), which allows you to operate on a Vectorize index from existing machine-learning tooling and languages (including Python).
For example, to insert embeddings in [NDJSON format](#workers-api) directly from a Python script:
```py
import requests
url = "https://api.cloudflare.com/client/v4/accounts/{}/vectorize/v2/indexes/{}/insert".format("your-account-id", "index-name")
headers = {
"Authorization": "Bearer "
}
with open('embeddings.ndjson', 'rb') as embeddings:
resp = requests.post(url, headers=headers, files=dict(vectors=embeddings))
print(resp)
```
This code would insert the vectors defined in `embeddings.ndjson` into the provided index. Python libraries, including Pandas, also support the NDJSON format via the built-in `read_json` method:
```py
import pandas as pd
data = pd.read_json('embeddings.ndjson', lines=True)
```
---
# Query vectors
URL: https://developers.cloudflare.com/vectorize/best-practices/query-vectors/
Querying an index, or vector search, enables you to search an index by providing an input vector and returning the nearest vectors based on the [configured distance metric](/vectorize/best-practices/create-indexes/#distance-metrics).
Optionally, you can apply [metadata filters](/vectorize/reference/metadata-filtering/) or a [namespace](/vectorize/best-practices/insert-vectors/#namespaces) to narrow the vector search space.
## Example query
To pass a vector as a query to an index, use the `query()` method on the index itself.
A query vector is either an array of JavaScript numbers, 32-bit floating point or 64-bit floating point numbers: `number[]`, `Float32Array`, or `Float64Array`. Unlike when [inserting vectors](/vectorize/best-practices/insert-vectors/), a query vector does not need an ID or metadata.
```ts
// query vector dimensions must match the Vectorize index dimension being queried
let queryVector = [54.8, 5.5, 3.1, ...];
let matches = await env.YOUR_INDEX.query(queryVector);
```
This would return a set of matches resembling the following, based on the distance metric configured for the Vectorize index. Example response with `cosine` distance metric:
```json
{
"count": 5,
"matches": [
{ "score": 0.999909486, "id": "5" },
{ "score": 0.789848214, "id": "4" },
{ "score": 0.720476967, "id": "4444" },
{ "score": 0.463884663, "id": "6" },
{ "score": 0.378282232, "id": "1" }
]
}
```
You can optionally change the number of results returned and/or whether results should include metadata and values:
```ts
// query vector dimensions must match the Vectorize index dimension being queried
let queryVector = [54.8, 5.5, 3.1, ...];
// topK defaults to 5; returnValues defaults to false; returnMetadata defaults to "none"
let matches = await env.YOUR_INDEX.query(queryVector, {
topK: 1,
returnValues: true,
returnMetadata: "all",
});
```
This would return a set of matches resembling the following, based on the distance metric configured for the Vectorize index. Example response with `cosine` distance metric:
```json
{
"count": 1,
"matches": [
{
"score": 0.999909486,
"id": "5",
"values": [58.79999923706055, 6.699999809265137, 3.4000000953674316, ...],
"metadata": { "url": "/products/sku/55519183" }
}
]
}
```
Refer to [Vectorize API](/vectorize/reference/client-api/) for additional examples.
## Query by vector identifier
Vectorize now offers the ability to search for vectors similar to a vector that is already present in the index using the `queryById()` operation. This can be considered as a single operation that combines the `getById()` and the `query()` operation.
```ts
// the query operation would yield results if a vector with id `some-vector-id` is already present in the index.
let matches = await env.YOUR_INDEX.queryById("some-vector-id");
```
## Control over scoring precision and query accuracy
When querying vectors, you can specify to either use high-precision scoring, thereby increasing the precision of the query matches scores as well as the accuracy of the query results, or use approximate scoring for faster response times.
Using approximate scoring, returned scores will be an approximation of the real distance/similarity between your query and the returned vectors; this is the query's default as it's a nice trade-off between accuracy and latency.
High-precision scoring is enabled by setting `returnValues: true` on your query. This setting tells Vectorize to use the original vector values for your matches, allowing the computation of exact match scores and increasing the accuracy of the results. Because it processes more data, though, high-precision scoring will increase the latency of queries.
## Workers AI
If you are generating embeddings from a [Workers AI](/workers-ai/models/#text-embeddings) text embedding model, the response type from `env.AI.run()` is an object that includes both the `shape` of the response vector - e.g. `[1,768]` - and the vector `data` as an array of vectors:
```ts
interface EmbeddingResponse {
shape: number[];
data: number[][];
}
let userQuery = "a query from a user or service";
const queryVector: EmbeddingResponse = await env.AI.run(
"@cf/baai/bge-base-en-v1.5",
{
text: [userQuery],
},
);
```
When passing the vector to the `query()` method of a Vectorize index, pass only the vector embedding itself on the `.data` sub-object, and not the top-level response.
For example:
```ts
let matches = await env.TEXT_EMBEDDINGS.query(queryVector.data[0], { topK: 1 });
```
Passing `queryVector` or `queryVector.data` will cause `query()` to return an error.
## OpenAI
When using OpenAI's [JavaScript client API](https://github.com/openai/openai-node) and [Embeddings API](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings), the response type from `embeddings.create` is an object that includes the model, usage information and the requested vector embedding.
```ts
const openai = new OpenAI({ apiKey: env.YOUR_OPENAPI_KEY });
let userQuery = "a query from a user or service";
let embeddingResponse = await openai.embeddings.create({
input: userQuery,
model: "text-embedding-ada-002",
});
```
Similar to Workers AI, you will need to provide the vector embedding itself (`.embedding[0]`) and not the `EmbeddingResponse` wrapper when querying a Vectorize index:
```ts
let matches = await env.TEXT_EMBEDDINGS.query(embeddingResponse.embedding[0], {
topK: 1,
});
```
---
# Examples
URL: https://developers.cloudflare.com/vectorize/examples/
import { GlossaryTooltip, DirectoryListing } from "~/components"
Explore the following examples for Vectorize.
---
# Get started
URL: https://developers.cloudflare.com/vectorize/get-started/
import { DirectoryListing } from "~/components"
---
# Vectorize and Workers AI
URL: https://developers.cloudflare.com/vectorize/get-started/embeddings/
import { Render, PackageManagers, WranglerConfig } from "~/components";
Vectorize allows you to generate [vector embeddings](/vectorize/reference/what-is-a-vector-database/) using a machine-learning model, including the models available in [Workers AI](/workers-ai/).
:::note[New to Vectorize?]
If this is your first time using Vectorize or a vector database, start with the [Vectorize Get started guide](/vectorize/get-started/intro/).
:::
This guide will instruct you through:
- Creating a Vectorize index.
- Connecting a [Cloudflare Worker](/workers/) to your index.
- Using [Workers AI](/workers-ai/) to generate vector embeddings.
- Using Vectorize to query those vector embeddings.
## Prerequisites
To continue:
1. Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up/workers-and-pages) if you have not already.
2. Install [`npm`](https://docs.npmjs.com/getting-started).
3. Install [`Node.js`](https://nodejs.org/en/). Use a Node version manager like [Volta](https://volta.sh/) or [nvm](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions. [Wrangler](/workers/wrangler/install-and-update/) requires a Node version of `16.17.0` or later.
## 1. Create a Worker
You will create a new project that will contain a Worker script, which will act as the client application for your Vectorize index.
Open your terminal and create a new project named `embeddings-tutorial` by running the following command:
This will create a new `embeddings-tutorial` directory. Your new `embeddings-tutorial` directory will include:
- A `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code) at `src/index.ts`.
- A [`wrangler.jsonc`](/workers/wrangler/configuration/) configuration file. `wrangler.jsonc` is how your `embeddings-tutorial` Worker will access your index.
:::note
If you are familiar with Cloudflare Workers, or initializing projects in a Continuous Integration (CI) environment, initialize a new project non-interactively by setting `CI=true` as an environmental variable when running `create cloudflare@latest`.
For example: `CI=true npm create cloudflare@latest embeddings-tutorial --type=simple --git --ts --deploy=false` will create a basic "Hello World" project ready to build on.
:::
## 2. Create an index
A vector database is distinct from a traditional SQL or NoSQL database. A vector database is designed to store vector embeddings, which are representations of data, but not the original data itself.
To create your first Vectorize index, change into the directory you just created for your Workers project:
```sh
cd embeddings-tutorial
```
To create an index, use the `wrangler vectorize create` command and provide a name for the index. A good index name is:
- A combination of lowercase and/or numeric ASCII characters, shorter than 32 characters, starts with a letter, and uses dashes (-) instead of spaces.
- Descriptive of the use-case and environment. For example, "production-doc-search" or "dev-recommendation-engine".
- Only used for describing the index, and is not directly referenced in code.
In addition, define both the `dimensions` of the vectors you will store in the index, as well as the distance `metric` used to determine similar vectors when creating the index. **This configuration cannot be changed later**, as a vector database is configured for a fixed vector configuration.
Run the following `wrangler vectorize` command, ensuring that the `dimensions` are set to `768`: this is important, as the Workers AI model used in this tutorial outputs vectors with 768 dimensions.
```sh
npx wrangler vectorize create embeddings-index --dimensions=768 --metric=cosine
```
```sh output
✅ Successfully created index 'embeddings-index'
[[vectorize]]
binding = "VECTORIZE" # available in your Worker on env.VECTORIZE
index_name = "embeddings-index"
```
This will create a new vector database, and output the [binding](/workers/runtime-apis/bindings/) configuration needed in the next step.
## 3. Bind your Worker to your index
You must create a binding for your Worker to connect to your Vectorize index. [Bindings](/workers/runtime-apis/bindings/) allow your Workers to access resources, like Vectorize or R2, from Cloudflare Workers. You create bindings by updating your Wrangler file.
To bind your index to your Worker, add the following to the end of your Wrangler file:
```toml
[[vectorize]]
binding = "VECTORIZE" # available in your Worker on env.VECTORIZE
index_name = "embeddings-index"
```
Specifically:
- The value (string) you set for `` will be used to reference this database in your Worker. In this tutorial, name your binding `VECTORIZE`.
- The binding must be [a valid JavaScript variable name](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Grammar_and_types#variables). For example, `binding = "MY_INDEX"` or `binding = "PROD_SEARCH_INDEX"` would both be valid names for the binding.
- Your binding is available in your Worker at `env.` and the Vectorize [client API](/vectorize/reference/client-api/) is exposed on this binding for use within your Workers application.
## 4. Set up Workers AI
Before you deploy your embedding example, ensure your Worker uses your model catalog, including the [text embedding model](/workers-ai/models/#text-embeddings) built-in.
From within the `embeddings-tutorial` directory, open your Wrangler file in your editor and add the new `[[ai]]` binding to make Workers AI's models available in your Worker:
```toml
[[vectorize]]
binding = "VECTORIZE" # available in your Worker on env.VECTORIZE
index_name = "embeddings-index"
[ai]
binding = "AI" # available in your Worker on env.AI
```
With Workers AI ready, you can write code in your Worker.
## 5. Write code in your Worker
To write code in your Worker, go to your `embeddings-tutorial` Worker and open the `src/index.ts` file. The `index.ts` file is where you configure your Worker's interactions with your Vectorize index.
Clear the content of `index.ts`. Paste the following code snippet into your `index.ts` file. On the `env` parameter, replace `` with `VECTORIZE`:
```typescript
export interface Env {
VECTORIZE: Vectorize;
AI: Ai;
}
interface EmbeddingResponse {
shape: number[];
data: number[][];
}
export default {
async fetch(request, env, ctx): Promise {
let path = new URL(request.url).pathname;
if (path.startsWith("/favicon")) {
return new Response("", { status: 404 });
}
// You only need to generate vector embeddings once (or as
// data changes), not on every request
if (path === "/insert") {
// In a real-world application, you could read content from R2 or
// a SQL database (like D1) and pass it to Workers AI
const stories = [
"This is a story about an orange cloud",
"This is a story about a llama",
"This is a story about a hugging emoji",
];
const modelResp: EmbeddingResponse = await env.AI.run(
"@cf/baai/bge-base-en-v1.5",
{
text: stories,
},
);
// Convert the vector embeddings into a format Vectorize can accept.
// Each vector needs an ID, a value (the vector) and optional metadata.
// In a real application, your ID would be bound to the ID of the source
// document.
let vectors: VectorizeVector[] = [];
let id = 1;
modelResp.data.forEach((vector) => {
vectors.push({ id: `${id}`, values: vector });
id++;
});
let inserted = await env.VECTORIZE.upsert(vectors);
return Response.json(inserted);
}
// Your query: expect this to match vector ID. 1 in this example
let userQuery = "orange cloud";
const queryVector: EmbeddingResponse = await env.AI.run(
"@cf/baai/bge-base-en-v1.5",
{
text: [userQuery],
},
);
let matches = await env.VECTORIZE.query(queryVector.data[0], {
topK: 1,
});
return Response.json({
// Expect a vector ID. 1 to be your top match with a score of
// ~0.89693683
// This tutorial uses a cosine distance metric, where the closer to one,
// the more similar.
matches: matches,
});
},
} satisfies ExportedHandler;
```
## 6. Deploy your Worker
Before deploying your Worker globally, log in with your Cloudflare account by running:
```sh
npx wrangler login
```
You will be directed to a web page asking you to log in to the Cloudflare dashboard. After you have logged in, you will be asked if Wrangler can make changes to your Cloudflare account. Scroll down and select **Allow** to continue.
From here, deploy your Worker to make your project accessible on the Internet. To deploy your Worker, run:
```sh
npx wrangler deploy
```
Preview your Worker at `https://embeddings-tutorial..workers.dev`.
## 7. Query your index
You can now visit the URL for your newly created project to insert vectors and then query them.
With the URL for your deployed Worker (for example,`https://embeddings-tutorial..workers.dev/`), open your browser and:
1. Insert your vectors first by visiting `/insert`.
2. Query your index by visiting the index route - `/`.
This should return the following JSON:
```json
{
"matches": {
"count": 1,
"matches": [
{
"id": "1",
"score": 0.89693683
}
]
}
}
```
Extend this example by:
- Adding more inputs and generating a larger set of vectors.
- Accepting a custom query parameter passed in the URL, for example via `URL.searchParams`.
- Creating a new index with a different [distance metric](/vectorize/best-practices/create-indexes/#distance-metrics) and observing how your scores change in response to your inputs.
By finishing this tutorial, you have successfully created a Vectorize index, used Workers AI to generate vector embeddings, and deployed your project globally.
## Next steps
- Build a [generative AI chatbot](/workers-ai/tutorials/build-a-retrieval-augmented-generation-ai/) using Workers AI and Vectorize.
- Learn more about [how vector databases work](/vectorize/reference/what-is-a-vector-database/).
- Read [examples](/vectorize/reference/client-api/) on how to use the Vectorize API from Cloudflare Workers.
---
# Introduction to Vectorize
URL: https://developers.cloudflare.com/vectorize/get-started/intro/
import { Render, PackageManagers, WranglerConfig } from "~/components";
Vectorize is Cloudflare's vector database. Vector databases allow you to use machine learning (ML) models to perform semantic search, recommendation, classification and anomaly detection tasks, as well as provide context to LLMs (Large Language Models).
This guide will instruct you through:
- Creating your first Vectorize index.
- Connecting a [Cloudflare Worker](/workers/) to your index.
- Inserting and performing a similarity search by querying your index.
## Prerequisites
:::note[Workers Free or Paid plans required]
Vectorize is available to all users on the [Workers Free or Paid plans](/workers/platform/pricing/#workers).
:::
To continue, you will need:
1. Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up/workers-and-pages) if you have not already.
2. Install [`npm`](https://docs.npmjs.com/getting-started).
3. Install [`Node.js`](https://nodejs.org/en/). Use a Node version manager like [Volta](https://volta.sh/) or [nvm](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions. [Wrangler](/workers/wrangler/install-and-update/) requires a Node version of `16.17.0` or later.
## 1. Create a Worker
:::note[New to Workers?]
Refer to [How Workers works](/workers/reference/how-workers-works/) to learn about the Workers serverless execution model works. Go to the [Workers Get started guide](/workers/get-started/guide/) to set up your first Worker.
:::
You will create a new project that will contain a Worker, which will act as the client application for your Vectorize index.
Create a new project named `vectorize-tutorial` by running:
This will create a new `vectorize-tutorial` directory. Your new `vectorize-tutorial` directory will include:
- A `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code) at `src/index.ts`.
- A [`wrangler.jsonc`](/workers/wrangler/configuration/) configuration file. `wrangler.jsonc` is how your `vectorize-tutorial` Worker will access your index.
:::note
If you are familiar with Cloudflare Workers, or initializing projects in a Continuous Integration (CI) environment, initialize a new project non-interactively by setting `CI=true` as an environmental variable when running `create cloudflare@latest`.
For example: `CI=true npm create cloudflare@latest vectorize-tutorial --type=simple --git --ts --deploy=false` will create a basic "Hello World" project ready to build on.
:::
## 2. Create an index
A vector database is distinct from a traditional SQL or NoSQL database. A vector database is designed to store vector embeddings, which are representations of data, but not the original data itself.
To create your first Vectorize index, change into the directory you just created for your Workers project:
```sh
cd vectorize-tutorial
```
To create an index, you will need to use the `wrangler vectorize create` command and provide a name for the index. A good index name is:
- A combination of lowercase and/or numeric ASCII characters, shorter than 32 characters, starts with a letter, and uses dashes (-) instead of spaces.
- Descriptive of the use-case and environment. For example, "production-doc-search" or "dev-recommendation-engine".
- Only used for describing the index, and is not directly referenced in code.
In addition, you will need to define both the `dimensions` of the vectors you will store in the index, as well as the distance `metric` used to determine similar vectors when creating the index. A `metric` can be euclidean, cosine, or dot product. **This configuration cannot be changed later**, as a vector database is configured for a fixed vector configuration.
Run the following `wrangler vectorize` command:
```sh
npx wrangler vectorize create tutorial-index --dimensions=32 --metric=euclidean
```
```sh output
🚧 Creating index: 'tutorial-index'
✅ Successfully created a new Vectorize index: 'tutorial-index'
📋 To start querying from a Worker, add the following binding configuration into 'wrangler.toml':
[[vectorize]]
binding = "VECTORIZE" # available in your Worker on env.VECTORIZE
index_name = "tutorial-index"
```
The command above will create a new vector database, and output the [binding](/workers/runtime-apis/bindings/) configuration needed in the next step.
## 3. Bind your Worker to your index
You must create a binding for your Worker to connect to your Vectorize index. [Bindings](/workers/runtime-apis/bindings/) allow your Workers to access resources, like Vectorize or R2, from Cloudflare Workers. You create bindings by updating the worker's Wrangler file.
To bind your index to your Worker, add the following to the end of your Wrangler file:
```toml
[[vectorize]]
binding = "VECTORIZE" # available in your Worker on env.VECTORIZE
index_name = "tutorial-index"
```
Specifically:
- The value (string) you set for `` will be used to reference this database in your Worker. In this tutorial, name your binding `VECTORIZE`.
- The binding must be [a valid JavaScript variable name](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Grammar_and_types#variables). For example, `binding = "MY_INDEX"` or `binding = "PROD_SEARCH_INDEX"` would both be valid names for the binding.
- Your binding is available in your Worker at `env.` and the Vectorize [client API](/vectorize/reference/client-api/) is exposed on this binding for use within your Workers application.
## 4. [Optional] Create metadata indexes
Vectorize allows you to add up to 10KiB of metadata per vector into your index, and also provides the ability to filter on that metadata while querying vectors. To do so you would need to specify a metadata field as a "metadata index" for your Vectorize index.
:::note[When to create metadata indexes?]
As of today, the metadata fields on which vectors can be filtered need to be specified before the vectors are inserted, and it is recommended that these metadata fields are specified right after the creation of a Vectorize index.
:::
To enable vector filtering on a metadata field during a query, use a command like:
```sh
npx wrangler vectorize create-metadata-index tutorial-index --property-name=url --type=string
```
```sh output
📋 Creating metadata index...
✅ Successfully enqueued metadata index creation request. Mutation changeset identifier: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.
```
Here `url` is the metadata field on which filtering would be enabled. The `--type` parameter defines the data type for the metadata field; `string`, `number` and `boolean` types are supported.
It typically takes a few seconds for the metadata index to be created. You can check the list of metadata indexes for your Vectorize index by running:
```sh
npx wrangler vectorize list-metadata-index tutorial-index
```
```sh output
📋 Fetching metadata indexes...
┌──────────────┬────────┐
│ propertyName │ type │
├──────────────┼────────┤
│ url │ String │
└──────────────┴────────┘
```
You can create up to 10 metadata indexes per Vectorize index.
For metadata indexes of type `number`, the indexed number precision is that of float64.
For metadata indexes of type `string`, each vector indexes the first 64B of the string data truncated on UTF-8 character boundaries to the longest well-formed UTF-8 substring within that limit, so vectors are filterable on the first 64B of their value for each indexed property.
See [Vectorize Limits](/vectorize/platform/limits/) for a complete list of limits.
## 5. Insert vectors
Before you can query a vector database, you need to insert vectors for it to query against. These vectors would be generated from data (such as text or images) you pass to a machine learning model. However, this tutorial will define static vectors to illustrate how vector search works on its own.
First, go to your `vectorize-tutorial` Worker and open the `src/index.ts` file. The `index.ts` file is where you configure your Worker's interactions with your Vectorize index.
Clear the content of `index.ts`, and paste the following code snippet into your `index.ts` file. On the `env` parameter, replace `` with `VECTORIZE`:
```typescript
export interface Env {
// This makes your vector index methods available on env.VECTORIZE.*
// For example, env.VECTORIZE.insert() or query()
VECTORIZE: Vectorize;
}
// Sample vectors: 32 dimensions wide.
//
// Vectors from popular machine-learning models are typically ~100 to 1536 dimensions
// wide (or wider still).
const sampleVectors: Array = [
{
id: "1",
values: [
0.12, 0.45, 0.67, 0.89, 0.23, 0.56, 0.34, 0.78, 0.12, 0.9, 0.24, 0.67,
0.89, 0.35, 0.48, 0.7, 0.22, 0.58, 0.74, 0.33, 0.88, 0.66, 0.45, 0.27,
0.81, 0.54, 0.39, 0.76, 0.41, 0.29, 0.83, 0.55,
],
metadata: { url: "/products/sku/13913913" },
},
{
id: "2",
values: [
0.14, 0.23, 0.36, 0.51, 0.62, 0.47, 0.59, 0.74, 0.33, 0.89, 0.41, 0.53,
0.68, 0.29, 0.77, 0.45, 0.24, 0.66, 0.71, 0.34, 0.86, 0.57, 0.62, 0.48,
0.78, 0.52, 0.37, 0.61, 0.69, 0.28, 0.8, 0.53,
],
metadata: { url: "/products/sku/10148191" },
},
{
id: "3",
values: [
0.21, 0.33, 0.55, 0.67, 0.8, 0.22, 0.47, 0.63, 0.31, 0.74, 0.35, 0.53,
0.68, 0.45, 0.55, 0.7, 0.28, 0.64, 0.71, 0.3, 0.77, 0.6, 0.43, 0.39, 0.85,
0.55, 0.31, 0.69, 0.52, 0.29, 0.72, 0.48,
],
metadata: { url: "/products/sku/97913813" },
},
{
id: "4",
values: [
0.17, 0.29, 0.42, 0.57, 0.64, 0.38, 0.51, 0.72, 0.22, 0.85, 0.39, 0.66,
0.74, 0.32, 0.53, 0.48, 0.21, 0.69, 0.77, 0.34, 0.8, 0.55, 0.41, 0.29,
0.7, 0.62, 0.35, 0.68, 0.53, 0.3, 0.79, 0.49,
],
metadata: { url: "/products/sku/418313" },
},
{
id: "5",
values: [
0.11, 0.46, 0.68, 0.82, 0.27, 0.57, 0.39, 0.75, 0.16, 0.92, 0.28, 0.61,
0.85, 0.4, 0.49, 0.67, 0.19, 0.58, 0.76, 0.37, 0.83, 0.64, 0.53, 0.3,
0.77, 0.54, 0.43, 0.71, 0.36, 0.26, 0.8, 0.53,
],
metadata: { url: "/products/sku/55519183" },
},
];
export default {
async fetch(request, env, ctx): Promise {
let path = new URL(request.url).pathname;
if (path.startsWith("/favicon")) {
return new Response("", { status: 404 });
}
// You only need to insert vectors into your index once
if (path.startsWith("/insert")) {
// Insert some sample vectors into your index
// In a real application, these vectors would be the output of a machine learning (ML) model,
// such as Workers AI, OpenAI, or Cohere.
const inserted = await env.VECTORIZE.insert(sampleVectors);
// Return the mutation identifier for this insert operation
return Response.json(inserted);
}
return Response.json({ text: "nothing to do... yet" }, { status: 404 });
},
} satisfies ExportedHandler;
```
In the code above, you:
1. Define a binding to your Vectorize index from your Workers code. This binding matches the `binding` value you set in the `wrangler.jsonc` file under the `"vectorise"` key.
2. Specify a set of example vectors that you will query against in the next step.
3. Insert those vectors into the index and confirm it was successful.
In the next step, you will expand the Worker to query the index and the vectors you insert.
## 6. Query vectors
In this step, you will take a vector representing an incoming query and use it to search your index.
First, go to your `vectorize-tutorial` Worker and open the `src/index.ts` file. The `index.ts` file is where you configure your Worker's interactions with your Vectorize index.
Clear the content of `index.ts`. Paste the following code snippet into your `index.ts` file. On the `env` parameter, replace `` with `VECTORIZE`:
```typescript
export interface Env {
// This makes your vector index methods available on env.VECTORIZE.*
// For example, env.VECTORIZE.insert() or query()
VECTORIZE: Vectorize;
}
// Sample vectors: 32 dimensions wide.
//
// Vectors from popular machine-learning models are typically ~100 to 1536 dimensions
// wide (or wider still).
const sampleVectors: Array = [
{
id: "1",
values: [
0.12, 0.45, 0.67, 0.89, 0.23, 0.56, 0.34, 0.78, 0.12, 0.9, 0.24, 0.67,
0.89, 0.35, 0.48, 0.7, 0.22, 0.58, 0.74, 0.33, 0.88, 0.66, 0.45, 0.27,
0.81, 0.54, 0.39, 0.76, 0.41, 0.29, 0.83, 0.55,
],
metadata: { url: "/products/sku/13913913" },
},
{
id: "2",
values: [
0.14, 0.23, 0.36, 0.51, 0.62, 0.47, 0.59, 0.74, 0.33, 0.89, 0.41, 0.53,
0.68, 0.29, 0.77, 0.45, 0.24, 0.66, 0.71, 0.34, 0.86, 0.57, 0.62, 0.48,
0.78, 0.52, 0.37, 0.61, 0.69, 0.28, 0.8, 0.53,
],
metadata: { url: "/products/sku/10148191" },
},
{
id: "3",
values: [
0.21, 0.33, 0.55, 0.67, 0.8, 0.22, 0.47, 0.63, 0.31, 0.74, 0.35, 0.53,
0.68, 0.45, 0.55, 0.7, 0.28, 0.64, 0.71, 0.3, 0.77, 0.6, 0.43, 0.39, 0.85,
0.55, 0.31, 0.69, 0.52, 0.29, 0.72, 0.48,
],
metadata: { url: "/products/sku/97913813" },
},
{
id: "4",
values: [
0.17, 0.29, 0.42, 0.57, 0.64, 0.38, 0.51, 0.72, 0.22, 0.85, 0.39, 0.66,
0.74, 0.32, 0.53, 0.48, 0.21, 0.69, 0.77, 0.34, 0.8, 0.55, 0.41, 0.29,
0.7, 0.62, 0.35, 0.68, 0.53, 0.3, 0.79, 0.49,
],
metadata: { url: "/products/sku/418313" },
},
{
id: "5",
values: [
0.11, 0.46, 0.68, 0.82, 0.27, 0.57, 0.39, 0.75, 0.16, 0.92, 0.28, 0.61,
0.85, 0.4, 0.49, 0.67, 0.19, 0.58, 0.76, 0.37, 0.83, 0.64, 0.53, 0.3,
0.77, 0.54, 0.43, 0.71, 0.36, 0.26, 0.8, 0.53,
],
metadata: { url: "/products/sku/55519183" },
},
];
export default {
async fetch(request, env, ctx): Promise {
let path = new URL(request.url).pathname;
if (path.startsWith("/favicon")) {
return new Response("", { status: 404 });
}
// You only need to insert vectors into your index once
if (path.startsWith("/insert")) {
// Insert some sample vectors into your index
// In a real application, these vectors would be the output of a machine learning (ML) model,
// such as Workers AI, OpenAI, or Cohere.
let inserted = await env.VECTORIZE.insert(sampleVectors);
// Return the mutation identifier for this insert operation
return Response.json(inserted);
}
// return Response.json({text: "nothing to do... yet"}, { status: 404 })
// In a real application, you would take a user query. For example, "what is a
// vector database" - and transform it into a vector embedding first.
//
// In this example, you will construct a vector that should
// match vector id #4
const queryVector: Array = [
0.13, 0.25, 0.44, 0.53, 0.62, 0.41, 0.59, 0.68, 0.29, 0.82, 0.37, 0.5,
0.74, 0.46, 0.57, 0.64, 0.28, 0.61, 0.73, 0.35, 0.78, 0.58, 0.42, 0.32,
0.77, 0.65, 0.49, 0.54, 0.31, 0.29, 0.71, 0.57,
]; // vector of dimensions 32
// Query your index and return the three (topK = 3) most similar vector
// IDs with their similarity score.
//
// By default, vector values are not returned, as in many cases the
// vector id and scores are sufficient to map the vector back to the
// original content it represents.
const matches = await env.VECTORIZE.query(queryVector, {
topK: 3,
returnValues: true,
returnMetadata: "all",
});
return Response.json({
// This will return the closest vectors: the vectors are arranged according
// to their scores. Vectors that are more similar would show up near the top.
// In this example, Vector id #4 would turn out to be the most similar to the queried vector.
// You return the full set of matches so you can check the possible scores.
matches: matches,
});
},
} satisfies ExportedHandler;
```
You can also use the Vectorize `queryById()` operation to search for vectors similar to a vector that is already present in the index.
## 7. Deploy your Worker
Before deploying your Worker globally, log in with your Cloudflare account by running:
```sh
npx wrangler login
```
You will be directed to a web page asking you to log in to the Cloudflare dashboard. After you have logged in, you will be asked if Wrangler can make changes to your Cloudflare account. Scroll down and select **Allow** to continue.
From here, you can deploy your Worker to make your project accessible on the Internet. To deploy your Worker, run:
```sh
npx wrangler deploy
```
Once deployed, preview your Worker at `https://vectorize-tutorial..workers.dev`.
## 8. Query your index
To insert vectors and then query them, use the URL for your deployed Worker, such as `https://vectorize-tutorial..workers.dev/`. Open your browser and:
1. Insert your vectors first by visiting `/insert`. This should return the below JSON:
```json
// https://vectorize-tutorial..workers.dev/insert
{
"mutationId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
```
The mutationId here refers to a unique identifier that corresponds to this asynchronous insert operation. Typically it takes a few seconds for inserted vectors to be available for querying.
You can use the index info operation to check the last processed mutation:
```sh
npx wrangler vectorize info tutorial-index
```
```sh output
📋 Fetching index info...
┌────────────┬─────────────┬──────────────────────────────────────┬──────────────────────────┐
│ dimensions │ vectorCount │ processedUpToMutation │ processedUpToDatetime │
├────────────┼─────────────┼──────────────────────────────────────┼──────────────────────────┤
│ 32 │ 5 │ xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx │ YYYY-MM-DDThh:mm:ss.SSSZ │
└────────────┴─────────────┴──────────────────────────────────────┴──────────────────────────┘
```
Subsequent inserts using the same vector ids will return a mutation id, but it would not change the index vector count since the same vector ids cannot be inserted twice. You will need to use an `upsert` operation instead to update the vector values for an id that already exists in an index.
2. Query your index - expect your query vector of `[0.13, 0.25, 0.44, ...]` to be closest to vector ID `4` by visiting the root path of `/` . This query will return the three (`topK: 3`) closest matches, as well as their vector values and metadata.
You will notice that `id: 4` has a `score` of `0.46348256`. Because you are using `euclidean` as our distance metric, the closer the score to `0.0`, the closer your vectors are.
```json
// https://vectorize-tutorial..workers.dev/
{
"matches": {
"count": 3,
"matches": [
{
"id": "4",
"score": 0.46348256,
"values": [
0.17, 0.29, 0.42, 0.57, 0.64, 0.38, 0.51, 0.72, 0.22, 0.85, 0.39,
0.66, 0.74, 0.32, 0.53, 0.48, 0.21, 0.69, 0.77, 0.34, 0.8, 0.55, 0.41,
0.29, 0.7, 0.62, 0.35, 0.68, 0.53, 0.3, 0.79, 0.49
],
"metadata": {
"url": "/products/sku/418313"
}
},
{
"id": "3",
"score": 0.52920616,
"values": [
0.21, 0.33, 0.55, 0.67, 0.8, 0.22, 0.47, 0.63, 0.31, 0.74, 0.35, 0.53,
0.68, 0.45, 0.55, 0.7, 0.28, 0.64, 0.71, 0.3, 0.77, 0.6, 0.43, 0.39,
0.85, 0.55, 0.31, 0.69, 0.52, 0.29, 0.72, 0.48
],
"metadata": {
"url": "/products/sku/97913813"
}
},
{
"id": "2",
"score": 0.6337869,
"values": [
0.14, 0.23, 0.36, 0.51, 0.62, 0.47, 0.59, 0.74, 0.33, 0.89, 0.41,
0.53, 0.68, 0.29, 0.77, 0.45, 0.24, 0.66, 0.71, 0.34, 0.86, 0.57,
0.62, 0.48, 0.78, 0.52, 0.37, 0.61, 0.69, 0.28, 0.8, 0.53
],
"metadata": {
"url": "/products/sku/10148191"
}
}
]
}
}
```
From here, experiment by passing a different `queryVector` and observe the results: the matches and the `score` should change based on the change in distance between the query vector and the vectors in our index.
In a real-world application, the `queryVector` would be the vector embedding representation of a query from a user or system, and our `sampleVectors` would be generated from real content. To build on this example, read the [vector search tutorial](/vectorize/get-started/embeddings/) that combines Workers AI and Vectorize to build an end-to-end application with Workers.
By finishing this tutorial, you have successfully created and queried your first Vectorize index, a Worker to access that index, and deployed your project globally.
## Related resources
- [Build an end-to-end vector search application](/vectorize/get-started/embeddings/) using Workers AI and Vectorize.
- Learn more about [how vector databases work](/vectorize/reference/what-is-a-vector-database/).
- Read [examples](/vectorize/reference/client-api/) on how to use the Vectorize API from Cloudflare Workers.
- [Euclidean Distance vs Cosine Similarity](https://www.baeldung.com/cs/euclidean-distance-vs-cosine-similarity).
- [Dot product](https://en.wikipedia.org/wiki/Dot_product).
---
# Changelog
URL: https://developers.cloudflare.com/vectorize/platform/changelog/
import { ProductReleaseNotes } from "~/components";
{/* */}
---
# Platform
URL: https://developers.cloudflare.com/vectorize/platform/
import { DirectoryListing } from "~/components"
---
# Limits
URL: https://developers.cloudflare.com/vectorize/platform/limits/
The following limits apply to accounts, indexes and vectors (as specified):
| Feature | Current Limit |
| ------------------------------------------------------------- | ----------------------------------- |
| Indexes per account | 50,000 (Workers Paid) / 100 (Free) |
| Maximum dimensions per vector | 1536 dimensions |
| Maximum vector ID length | 64 bytes |
| Metadata per vector | 10KiB |
| Maximum returned results (`topK`) with values or metadata | 20 |
| Maximum returned results (`topK`) without values and metadata | 100 |
| Maximum upsert batch size (per batch) | 1000 (Workers) / 5000 (HTTP API) |
| Maximum index name length | 64 bytes |
| Maximum vectors per index | 5,000,000 |
| Maximum namespaces per index | 50,000 (Workers Paid) / 1000 (Free) |
| Maximum namespace name length | 64 bytes |
| Maximum vectors upload size | 100 MB |
| Maximum metadata indexes per Vectorize index | 10 |
| Maximum indexed data per metadata index per vector | 64 bytes |
## Limits V1 (deprecated)
The following limits apply to accounts, indexes and vectors (as specified):
| Feature | Current Limit |
| ------------------------------------- | -------------------------------- |
| Indexes per account | 100 indexes |
| Maximum dimensions per vector | 1536 dimensions |
| Maximum vector ID length | 64 bytes |
| Metadata per vector | 10KiB |
| Maximum returned results (`topK`) | 20 |
| Maximum upsert batch size (per batch) | 1000 (Workers) / 5000 (HTTP API) |
| Maximum index name length | 63 bytes |
| Maximum vectors per index | 200,000 |
| Maximum namespaces per index | 1000 namespaces |
| Maximum namespace name length | 63 bytes |
---
# Pricing
URL: https://developers.cloudflare.com/vectorize/platform/pricing/
import { Render } from "~/components";
Vectorize bills are based on:
- **Queried Vector Dimensions**: The total number of vector dimensions queried. If you have 10,000 vectors with 384-dimensions in an index, and make 100 queries against that index, your total queried vector dimensions would sum to 3.878 million (`(10000 + 100) * 384`).
- **Stored Vector Dimensions**: The total number of vector dimensions stored. If you have 1,000 vectors with 1536-dimensions in an index, your stored vector dimensions would sum to 1.536 million (`1000 * 1536`).
You are not billed for CPU, memory, "active index hours", or the number of indexes you create. If you are not issuing queries against your indexes, you are not billed for queried vector dimensions.
## Billing metrics
### Usage examples
The following table defines a number of example use-cases and the estimated monthly cost for querying a Vectorize index. These estimates do not include the Vectorize usage that is part of the Workers Free and Paid plans.
| Workload | Dimensions per vector | Stored dimensions | Queries per month | Calculation | Estimated total |
| ---------- | --------------------- | ----------------- | ----------------- | ------------------------------------------------------------------------- | ------------------------------ |
| Experiment | 384 | 5,000 vectors | 10,000 | `((10000+5000)*384*(0.01/1000000)) + (5000*384*(0.05/100000000))` | $0.06 / mo included |
| Scaling | 768 | 25,000 vectors | 50,000 | `((50000+25000)*768*(0.01/1000000)) + (25000*768*(0.05/100000000))` | $0.59 / mo most |
| Production | 768 | 50,000 vectors | 200,000 | `((200000+50000)*768*(0.01/1000000)) + (50000*768*(0.05/100000000))` | $1.94 / mo |
| Large | 768 | 250,000 vectors | 500,000 | `((500000+250000)*768*(0.01/1000000)) + (250000*768*(0.05/100000000))` | $5.86 / mo |
| XL | 1536 | 500,000 vectors | 1,000,000 | `((1000000+500000)*1536*(0.01/1000000)) + (500000*1536*(0.05/100000000))` | $23.42 / mo |
included All of this usage would fall into the Vectorize usage
included in the Workers Free or Paid plan.
most Most of this usage would fall into the Vectorize usage included
within the Workers Paid plan.
## Frequently Asked Questions
Frequently asked questions related to Vectorize pricing:
- Will Vectorize always have a free tier?
Yes, the [Workers free tier](/workers/platform/pricing/#workers) will always include the ability to prototype and experiment with Vectorize for free.
- What happens if I exceed the monthly included reads, writes and/or storage on the paid tier?
You will be billed for the additional reads, writes and storage according to [Vectorize's pricing](#billing-metrics).
- Does Vectorize charge for data transfer / egress?
No.
- Do queries I issue from the HTTP API or the Wrangler command-line count as billable usage?
Yes: any queries you issue against your index, including from the Workers API, HTTP API and CLI all count as usage.
- Does an empty index, with no vectors, contribute to storage?
No. Empty indexes do not count as stored vector dimensions.
---
# Tutorials
URL: https://developers.cloudflare.com/vectorize/tutorials/
import { GlossaryTooltip, ListTutorials } from "~/components"
View tutorials to help you get started with Vectorize.
---
# Vectorize API
URL: https://developers.cloudflare.com/vectorize/reference/client-api/
import { Render, WranglerConfig } from "~/components";
This page covers the Vectorize API available within [Cloudflare Workers](/workers/), including usage examples.
## Operations
### Insert vectors
```ts
let vectorsToInsert = [
{ id: "123", values: [32.4, 6.5, 11.2, 10.3, 87.9] },
{ id: "456", values: [2.5, 7.8, 9.1, 76.9, 8.5] },
];
let inserted = await env.YOUR_INDEX.insert(vectorsToInsert);
```
Inserts vectors into the index. Vectorize inserts are asynchronous and the insert operation returns a mutation identifier unique for that operation. It typically takes a few seconds for inserted vectors to be available for querying in a Vectorize index.
If vectors with the same vector ID already exist in the index, only the vectors with new IDs will be inserted.
If you need to update existing vectors, use the [upsert](#upsert-vectors) operation.
### Upsert vectors
```ts
let vectorsToUpsert = [
{ id: "123", values: [32.4, 6.5, 11.2, 10.3, 87.9] },
{ id: "456", values: [2.5, 7.8, 9.1, 76.9, 8.5] },
{ id: "768", values: [29.1, 5.7, 12.9, 15.4, 1.1] },
];
let upserted = await env.YOUR_INDEX.upsert(vectorsToUpsert);
```
Upserts vectors into an index. Vectorize upserts are asynchronous and the upsert operation returns a mutation identifier unique for that operation. It typically takes a few seconds for upserted vectors to be available for querying in a Vectorize index.
An upsert operation will insert vectors into the index if vectors with the same ID do not exist, and overwrite vectors with the same ID.
Upserting does not merge or combine the values or metadata of an existing vector with the upserted vector: the upserted vector replaces the existing vector in full.
### Query vectors
```ts
let queryVector = [32.4, 6.55, 11.2, 10.3, 87.9];
let matches = await env.YOUR_INDEX.query(queryVector);
```
Query an index with the provided vector, returning the score(s) of the closest vectors based on the configured distance metric.
- Configure the number of returned matches by setting `topK` (default: 5)
- Return vector values by setting `returnValues: true` (default: false)
- Return vector metadata by setting `returnMetadata: 'indexed'` or `returnMetadata: 'all'` (default: 'none')
```ts
let matches = await env.YOUR_INDEX.query(queryVector, {
topK: 5,
returnValues: true,
returnMetadata: "all",
});
```
#### topK
The `topK` can be configured to specify the number of matches returned by the query operation. Vectorize now supports an upper limit of `100` for the `topK` value. However, for a query operation with `returnValues` set to `true` or `returnMetadata` set to `all`, `topK` would be limited to a maximum value of `20`.
#### returnMetadata
The `returnMetadata` field provides three ways to fetch vector metadata while querying:
1. `none`: Do not fetch metadata.
2. `indexed`: Fetched metadata only for the indexed metadata fields. There is no latency overhead with this option, but long text fields may be truncated.
3. `all`: Fetch all metadata associated with a vector. Queries may run slower with this option, and `topK` would be limited to 20.
:::note[`topK` and `returnMetadata` for legacy Vectorize indexes]
For legacy Vectorize (V1) indexes, `topK` is limited to 20, and the `returnMetadata` is a boolean field.
:::
### Query vectors by ID
```ts
let matches = await env.YOUR_INDEX.queryById("some-vector-id");
```
Query an index using a vector that is already present in the index.
Query options remain the same as the query operation described above.
```ts
let matches = await env.YOUR_INDEX.queryById("some-vector-id", {
topK: 5,
returnValues: true,
returnMetadata: "all",
});
```
### Get vectors by ID
```ts
let ids = ["11", "22", "33", "44"];
const vectors = await env.YOUR_INDEX.getByIds(ids);
```
Retrieves the specified vectors by their ID, including values and metadata.
### Delete vectors by ID
```ts
let idsToDelete = ["11", "22", "33", "44"];
const deleted = await env.YOUR_INDEX.deleteByIds(idsToDelete);
```
Deletes the vector IDs provided from the current index. Vectorize deletes are asynchronous and the delete operation returns a mutation identifier unique for that operation. It typically takes a few seconds for vectors to be removed from the Vectorize index.
### Retrieve index details
```ts
const details = await env.YOUR_INDEX.describe();
```
Retrieves the configuration of a given index directly, including its configured `dimensions` and distance `metric`.
### Create Metadata Index
Enable metadata filtering on the specified property. Limited to 10 properties.
Run the following `wrangler vectorize` command:
```sh
wrangler vectorize create-metadata-index --property-name='some-prop' --type='string'
```
### Delete Metadata Index
Allow Vectorize to delete the specified metadata index.
Run the following `wrangler vectorize` command:
```sh
wrangler vectorize delete-metadata-index --property-name='some-prop'
```
### List Metadata Indexes
List metadata properties on which metadata filtering is enabled.
Run the following `wrangler vectorize` command:
```sh
wrangler vectorize list-metadata-index
```
### Get Index Info
Get additional details about the index.
Run the following `wrangler vectorize` command:
```sh
wrangler vectorize info
```
## Vectors
A vector represents the vector embedding output from a machine learning model.
- `id` - a unique `string` identifying the vector in the index. This should map back to the ID of the document, object or database identifier that the vector values were generated from.
- `namespace` - an optional partition key within a index. Operations are performed per-namespace, so this can be used to create isolated segments within a larger index.
- `values` - an array of `number`, `Float32Array`, or `Float64Array` as the vector embedding itself. This must be a dense array, and the length of this array must match the `dimensions` configured on the index.
- `metadata` - an optional set of key-value pairs that can be used to store additional metadata alongside a vector.
```ts
let vectorExample = {
id: "12345",
values: [32.4, 6.55, 11.2, 10.3, 87.9],
metadata: {
key: "value",
hello: "world",
url: "r2://bucket/some/object.json",
},
};
```
## Binding to a Worker
[Bindings](/workers/runtime-apis/bindings/) allow you to attach resources, including Vectorize indexes or R2 buckets, to your Worker.
Bindings are defined in either the [Wrangler configuration file](/workers/wrangler/configuration/) associated with your Workers project, or via the Cloudflare dashboard for your project.
Vectorize indexes are bound by name. A binding for an index named `production-doc-search` would resemble the below:
```toml
[[vectorize]]
binding = "PROD_SEARCH" # the index will be available as env.PROD_SEARCH in your Worker
index_name = "production-doc-search"
```
Refer to the [bindings documentation](/workers/wrangler/configuration/#vectorize-indexes) for more details.
## TypeScript Types
New Workers projects created via `npm create cloudflare@latest` automatically include the relevant TypeScript types for Vectorize.
Older projects, or non-Workers projects looking to use Vectorize's [REST API](https://developers.cloudflare.com/api/resources/vectorize/subresources/indexes/methods/list/) in a TypeScript project, should ensure `@cloudflare/workers-types` version `4.20230922.0` or later is installed.
---
# Reference
URL: https://developers.cloudflare.com/vectorize/reference/
import { DirectoryListing } from "~/components"
---
# Transition legacy Vectorize indexes
URL: https://developers.cloudflare.com/vectorize/reference/transition-vectorize-legacy/
Legacy Vectorize (V1) indexes are on a deprecation path as of Aug 15, 2024. Your Vectorize index may be a legacy index if it fulfills any of the follwing crieria:
1. Was created with a Wrangler version lower than `v3.71.0`.
2. Was created using the "--deprecated-v1" flag enabled.
3. Was created using the legacy REST API.
This document provides details around any transition steps that may be needed to move away from legacy Vectorize indexes.
## Why should I transition?
Legacy Vectorize (V1) indexes are on a deprecation path. Support for these indexes would be limited and their usage is not recommended for any production workloads.
Furthermore, you will no longer be able to create legacy Vectorize indexes by December 2024. Other operations will be unaffected and will remain functional.
Additionally, the new Vectorize (V2) indexes can operate at a significantly larger scale (with a capacity for multi-million vectors), and provide faster performance. Please review the [Limits](/vectorize/platform/limits/) page to understand the latest capabilities supported by Vectorize.
## Notable changes
In addition to supporting significantly larger indexes with multi-million vectors, and faster performance, these are some of the changes that need to be considered when transitioning away from legacy Vectorize indexes:
1. The new Vectorize (V2) indexes now support asynchronous mutations. Any vector inserts or deletes, and metadata index creation or deletes may take a few seconds to be reflected.
2. Vectorize (V2) support metadata and namespace filtering for much larger indexes with significantly lower latencies. However, the fields on which metadata filtering can be applied need to be specified before vectors are inserted. Refer to the [metadata index creation](/vectorize/reference/client-api/#create-metadata-index) page for more details.
3. Vectorize (V2) [query operation](/vectorize/reference/client-api/#query-vectors) now supports the ability to search for and return up to 100 most similar vectors.
4. Vectorize (V2) query operations provide a more granular control for querying metadata along with vectors. Refer to the [query operation](/vectorize/reference/client-api/#query-vectors) page for more details.
5. Vectorize (V2) expands the Vectorize capabilities that are available via Wrangler (with Wrangler version > `v3.71.0`).
## Transition
:::note[Automated Migration]
Watch this space for the upcoming capability to migrate legacy (V1) indexes to the new Vectorize (V2) indexes automatically.
:::
1. Wrangler now supports operations on the new version of Vectorize (V2) indexes by default. To use Wrangler commands for legacy (V1) indexes, the `--deprecated-v1` flag must be enabled. Please note that this flag is only supported to create, get, list and delete indexes and to insert vectors.
2. Refer to the [REST API](/api/resources/vectorize/subresources/indexes/methods/create/) page for details on the routes and payload types for the new Vectorize (V2) indexes.
3. To use the new version of Vectorize indexes in Workers, the environment binding must be defined as a `Vectorize` interface.
```typescript
export interface Env {
// This makes your vector index methods available on env.VECTORIZE.*
// For example, env.VECTORIZE.insert() or query()
VECTORIZE: Vectorize;
}
```
The `Vectorize` interface includes the type changes and the capabilities supported by new Vectorize (V2) indexes.
For legacy Vectorize (V1) indexes, use the `VectorizeIndex` interface.
```typescript
export interface Env {
// This makes your vector index methods available on env.VECTORIZE.*
// For example, env.VECTORIZE.insert() or query()
VECTORIZE: VectorizeIndex;
}
```
4. With the new Vectorize (V2) version, the `returnMetadata` option for the [query operation](/vectorize/reference/client-api/#query-vectors) now expects either `all`, `indexed` or `none` string values. For legacy Vectorize (V1), the `returnMetadata` option was a boolean field.
5. With the new Vectorize (V2) indexes, all index and vector mutations are asynchronous and return a `mutationId` in the response as a unique identifier for that mutation operation.
These mutation operations are: [Vector Inserts](/vectorize/reference/client-api/#insert-vectors), [Vector Upserts](/vectorize/reference/client-api/#upsert-vectors), [Vector Deletes](/vectorize/reference/client-api/#delete-vectors-by-id), [Metadata Index Creation](/vectorize/reference/client-api/#create-metadata-index), [Metadata Index Deletion](/vectorize/reference/client-api/#delete-metadata-index).
To check the identifier and the timestamp of the last mutation processed, use the Vectorize [Info command](/vectorize/reference/client-api/#get-index-info).
---
# Metadata filtering
URL: https://developers.cloudflare.com/vectorize/reference/metadata-filtering/
import { Render, PackageManagers } from "~/components";
In addition to providing an input vector to your query, you can also filter by [vector metadata](/vectorize/best-practices/insert-vectors/#metadata) associated with every vector. Query results will only include vectors that match the `filter` criteria, meaning that `filter` is applied first, and the `topK` results are taken from the filtered set.
By using metadata filtering to limit the scope of a query, you can filter by specific customer IDs, tenant, product category or any other metadata you associate with your vectors.
## Metadata indexes
Vectorize supports [namespace](/vectorize/best-practices/insert-vectors/#namespaces) filtering by default, but to filter on another metadata property of your vectors, you'll need to create a metadata index. You can create up to 10 metadata indexes per Vectorize index.
Metadata indexes for properties of type `string`, `number` and `boolean` are supported. Please refer to [Create metadata indexes](/vectorize/get-started/intro/#4-optional-create-metadata-indexes) for details.
You can store up to 10KiB of metadata per vector. See [Vectorize Limits](/vectorize/platform/limits/) for a complete list of limits.
For metadata indexes of type `number`, the indexed number precision is that of float64.
For metadata indexes of type `string`, each vector indexes the first 64B of the string data truncated on UTF-8 character boundaries to the longest well-formed UTF-8 substring within that limit, so vectors are filterable on the first 64B of their value for each indexed property.
:::note[Enable metadata filtering]
Vectors upserted before a metadata index was created won't have their metadata contained in that index. Upserting/re-upserting vectors after it was created will have them indexed as expected. Please refer to [Create metadata indexes](/vectorize/get-started/intro/#4-optional-create-metadata-indexes) for details.
:::
## Supported operations
An optional `filter` property on `query()` method specifies metadata filters:
| Operator | Description |
| -------- | ------------------------ |
| `$eq` | Equals |
| `$ne` | Not equals |
| `$in` | In |
| `$nin` | Not in |
| `$lt` | Less than |
| `$lte` | Less than or equal to |
| `$gt` | Greater than |
| `$gte` | Greater than or equal to |
- `filter` must be non-empty object whose compact JSON representation must be less than 2048 bytes.
- `filter` object keys cannot be empty, contain `" | .` (dot is reserved for nesting), start with `$`, or be longer than 512 characters.
- For `$eq` and `$ne`, `filter` object non-nested values can be `string`, `number`, `boolean`, or `null` values.
- For `$in` and `$nin`, `filter` object values can be arrays of `string`, `number`, `boolean`, or `null` values.
- Upper-bound range queries (i.e. `$lt` and `$lte`) can be combined with lower-bound range queries (i.e. `$gt` and `$gte`) within the same filter. Other combinations are not allowed.
- For range queries (i.e. `$lt`, `$lte`, `$gt`, `$gte`), `filter` object non-nested values can be `string` or `number` values. Strings are ordered lexicographically.
- Range queries involving a large number of vectors (~10M and above) may experience reduced accuracy.
### Namespace versus metadata filtering
Both [namespaces](/vectorize/best-practices/insert-vectors/#namespaces) and metadata filtering narrow the vector search space for a query. Consider the following when evaluating both filter types:
- A namespace filter is applied before metadata filter(s).
- A vector can only be part of a single namespace with the documented [limits](/vectorize/platform/limits/). Vector metadata can contain multiple key-value pairs up to [metadata per vector limits](/vectorize/platform/limits/). Metadata values support different types (`string`, `boolean`, and others), therefore offering more flexibility.
### Valid `filter` examples
#### Implicit `$eq` operator
```json
{ "streaming_platform": "netflix" }
```
#### Explicit operator
```json
{ "someKey": { "$ne": "hbo" } }
```
#### `$in` operator
```json
{ "someKey": { "$in": ["hbo", "netflix"] } }
```
#### `$nin` operator
```json
{ "someKey": { "$nin": ["hbo", "netflix"] } }
```
#### Range query involving numbers
```json
{ "timestamp": { "$gte": 1734242400, "$lt": 1734328800 } }
```
#### Range query involving strings
Range queries can be used to implement prefix searching on string metadata fields.
For example, the following filter matches all values starting with "net":
```json
{ "someKey": { "$gte": "net", "$lt": "neu" } }
```
#### Implicit logical `AND` with multiple keys
```json
{ "pandas.nice": 42, "someKey": { "$ne": "someValue" } }
```
#### Keys define nesting with `.` (dot)
```json
{ "pandas.nice": 42 }
// looks for { "pandas": { "nice": 42 } }
```
## Examples
### Add metadata
With the following index definition:
```sh
npx wrangler vectorize create tutorial-index --dimensions=32 --metric=cosine
```
Create metadata indexes:
```sh
npx wrangler vectorize create-metadata-index tutorial-index --property-name=url --type=string
```
```sh
npx wrangler vectorize create-metadata-index tutorial-index --property-name=streaming_platform --type=string
```
Metadata can be added when [inserting or upserting vectors](/vectorize/best-practices/insert-vectors/#examples).
```ts
const newMetadataVectors: Array = [
{
id: "1",
values: [32.4, 74.1, 3.2, ...],
metadata: { url: "/products/sku/13913913", streaming_platform: "netflix" },
},
{
id: "2",
values: [15.1, 19.2, 15.8, ...],
metadata: { url: "/products/sku/10148191", streaming_platform: "hbo" },
},
{
id: "3",
values: [0.16, 1.2, 3.8, ...],
metadata: { url: "/products/sku/97913813", streaming_platform: "amazon" },
},
{
id: "4",
values: [75.1, 67.1, 29.9, ...],
metadata: { url: "/products/sku/418313", streaming_platform: "netflix" },
},
{
id: "5",
values: [58.8, 6.7, 3.4, ...],
metadata: { url: "/products/sku/55519183", streaming_platform: "hbo" },
},
];
// Upsert vectors with added metadata, returning a count of the vectors upserted and their vector IDs
let upserted = await env.YOUR_INDEX.upsert(newMetadataVectors);
```
### Query examples
Use the `query()` method:
```ts
let queryVector: Array = [54.8, 5.5, 3.1, ...];
let originalMatches = await env.YOUR_INDEX.query(queryVector, {
topK: 3,
returnValues: true,
returnMetadata: 'all',
});
```
Results without metadata filtering:
```json
{
"count": 3,
"matches": [
{
"id": "5",
"score": 0.999909486,
"values": [58.79999923706055, 6.699999809265137, 3.4000000953674316],
"metadata": {
"url": "/products/sku/55519183",
"streaming_platform": "hbo"
}
},
{
"id": "4",
"score": 0.789848214,
"values": [75.0999984741211, 67.0999984741211, 29.899999618530273],
"metadata": {
"url": "/products/sku/418313",
"streaming_platform": "netflix"
}
},
{
"id": "2",
"score": 0.611976262,
"values": [15.100000381469727, 19.200000762939453, 15.800000190734863],
"metadata": {
"url": "/products/sku/10148191",
"streaming_platform": "hbo"
}
}
]
}
```
The same `query()` method with a `filter` property supports metadata filtering.
```ts
let queryVector: Array = [54.8, 5.5, 3.1, ...];
let metadataMatches = await env.YOUR_INDEX.query(queryVector, {
topK: 3,
filter: { streaming_platform: "netflix" },
returnValues: true,
returnMetadata: 'all',
});
```
Results with metadata filtering:
```json
{
"count": 2,
"matches": [
{
"id": "4",
"score": 0.789848214,
"values": [75.0999984741211, 67.0999984741211, 29.899999618530273],
"metadata": {
"url": "/products/sku/418313",
"streaming_platform": "netflix"
}
},
{
"id": "1",
"score": 0.491185264,
"values": [32.400001525878906, 74.0999984741211, 3.200000047683716],
"metadata": {
"url": "/products/sku/13913913",
"streaming_platform": "netflix"
}
}
]
}
```
## Limitations
- As of now, metadata indexes need to be created for Vectorize indexes _before_ vectors can be inserted to support metadata filtering.
- Only indexes created on or after 2023-12-06 support metadata filtering. Previously created indexes cannot be migrated to support metadata filtering.
---
# Vector databases
URL: https://developers.cloudflare.com/vectorize/reference/what-is-a-vector-database/
Vector databases are a key part of building scalable AI-powered applications. Vector databases provide long term memory, on top of an existing machine learning model.
Without a vector database, you would need to train your model (or models) or re-run your dataset through a model before making a query, which would be slow and expensive.
## Why is a vector database useful?
A vector database determines what other data (represented as vectors) is near your input query. This allows you to build different use-cases on top of a vector database, including:
* Semantic search, used to return results similar to the input of the query.
* Classification, used to return the grouping (or groupings) closest to the input query.
* Recommendation engines, used to return content similar to the input based on different criteria (for example previous product sales, or user history).
* Anomaly detection, used to identify whether specific data points are similar to existing data, or different.
Vector databases can also power [Retrieval Augmented Generation](https://arxiv.org/abs/2005.11401) (RAG) tasks, which allow you to bring additional context to LLMs (Large Language Models) by using the context from a vector search to augment the user prompt.
### Vector search
In a traditional vector search use-case, queries are made against a vector database by passing it a query vector, and having the vector database return a configurable list of vectors with the shortest distance ("most similar") to the query vector.
The step-by-step workflow resembles the below:
1. A developer converts their existing dataset (documentation, images, logs stored in R2) into a set of vector embeddings (a one-way representation) by passing them through a machine learning model that is trained for that data type.
2. The output embeddings are inserted into a Vectorize database index.
3. A search query, classification request or anomaly detection query is also passed through the same ML model, returning a vector embedding representation of the query.
4. Vectorize is queried with this embedding, and returns a set of the most similar vector embeddings to the provided query.
5. The returned embeddings are used to retrieve the original source objects from dedicated storage (for example, R2, KV, and D1) and returned back to the user.
In a workflow without a vector database, you would need to pass your entire dataset alongside your query each time, which is neither practical (models have limits on input size) and would consume significant resources and time.
### Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is an approach used to improve the context provided to an LLM (Large Language Model) in generative AI use-cases, including chatbot and general question-answer applications. The vector database is used to enhance the prompt passed to the LLM by adding additional context alongside the query.
Instead of passing the prompt directly to the LLM, in the RAG approach you:
1. Generate vector embeddings from an existing dataset or corpus (for example, the dataset you want to use to add additional context to the LLMs response). An existing dataset or corpus could be a product documentation, research data, technical specifications, or your product catalog and descriptions.
2. Store the output embeddings in a Vectorize database index.
When a user initiates a prompt, instead of passing it (without additional context) to the LLM, you *augment* it with additional context:
1. The user prompt is passed into the same ML model used for your dataset, returning a vector embedding representation of the query.
2. This embedding is used as the query (semantic search) against the vector database, which returns similar vectors.
3. These vectors are used to look up the content they relate to (if not embedded directly alongside the vectors as metadata).
4. This content is provided as context alongside the original user prompt, providing additional context to the LLM and allowing it to return an answer that is likely to be far more contextual than the standalone prompt.
Refer to the [RAG using Workers AI tutorial](/workers-ai/tutorials/build-a-retrieval-augmented-generation-ai/) to learn how to combine Workers AI and Vectorize for generative AI use-cases.
1 You can learn more about the theory behind RAG by reading the [RAG paper](https://arxiv.org/abs/2005.11401).
## Terminology
### Databases and indexes
In Vectorize, a database and an index are the same concept. Each index you create is separate from other indexes you create. Vectorize automatically manages optimizing and re-generating the index for you when you insert new data.
### Vector Embeddings
Vector embeddings represent the features of a machine learning model as a numerical vector (array of numbers). They are a one-way representation that encodes how a machine learning model understands the input(s) provided to it, based on how the model was originally trained and its' internal structure.
For example, a [text embedding model](/workers-ai/models/#text-embeddings) available in Workers AI is able to take text input and represent it as a 768-dimension vector. The text `This is a story about an orange cloud`, when represented as a vector embedding, resembles the following:
```json
[-0.019273685291409492,-0.01913292706012726,<764 dimensions here>,0.0007094172760844231,0.043409910053014755]
```
When a model considers the features of an input as "similar" (based on its understanding), the distance between the vector embeddings for those two inputs will be short.
### Dimensions
Vector dimensions describe the width of a vector embedding. The width of a vector embedding is the number of floating point elements that comprise a given vector.
The number of dimensions are defined by the machine learning model used to generate the vector embeddings, and how it represents input features based on its internal model and complexity. More dimensions ("wider" vectors) may provide more accuracy at the cost of compute and memory resources, as well as latency (speed) of vector search.
Refer to the [dimensions](/vectorize/best-practices/create-indexes/#dimensions) documentation to learn how to configure the accepted vector dimension size when creating a Vectorize index.
### Distance metrics
The distance metric is an index used for vector search. It defines how it determines how close your query vector is to other vectors within the index.
* Distance metrics determine how the vector search engine assesses similarity between vectors.
* Cosine, Euclidean (L2), and Dot Product are the most commonly used distance metrics in vector search.
* The machine learning model and type of embedding you use will determine which distance metric is best suited for your use-case.
* Different metrics determine different scoring characteristics. For example, the `cosine` distance metric is well suited to text, sentence similarity and/or document search use-cases. `euclidean` can be better suited for image or speech recognition use-cases.
Refer to the [distance metrics](/vectorize/best-practices/create-indexes/#distance-metrics) documentation to learn how to configure a distance metric when creating a Vectorize index.
---