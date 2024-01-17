Insert vectors

Vectorize indexes allow you to insert vectors at any point: Vectorize will optimize the index behind the scenes to ensure that vector search remains efficient, even as new vectors are added or existing vectors updated.

​​ Supported vector formats

Vectorize supports vectors in three formats:

In most cases, a number[] array is the easiest when dealing with other APIs, and is the return type of most machine-learning APIs.

Metadata is an optional set of key-value pairs that can be attached to a vector on insert or upsert, and allows you to embed or co-locate data about the vector itself.

Metadata keys cannot be empty, contain the dot character (.), contain the double-quote character ("), or start with the dollar character ($).

Metadata can be used to:

Include the object storage key, database UUID or other identifier to look up the content the vector embedding represents.

The raw content (up to the metadata limits ), which can allow you to skip additional lookups for smaller content.

), which can allow you to skip additional lookups for smaller content. Dates, timestamps, or other metadata that describes when the vector embedding was generated or how it was generated.

For example, a vector embedding representing an image could include the path to the R2 object it was generated from, the format, and a category lookup:

{ id : '1' , values : [ 32.4 , 74.1 , 3.2 ] , metadata : { path : 'r2://bucket-name/path/to/image.png' , format : 'png' , category : 'profile_image' } }

Namespaces provide a way to segment the vectors within your index. For example, by customer, merchant or store ID.

To associate vectors with a namespace, you can optionally provide a namespace: string value when performing an insert or upsert operation. When querying, you can pass the namespace to search within as an optional parameter to your query.

A namespace can be up to 63 characters (bytes) in length and you can have up to 1000 namespaces per index. Refer to the Limits documentation for more details.

When a namespace is provided, only vectors within that namespace are used for the search. Namespace filtering is applied before vector search, not after.

To insert vectors with a namespace:

const sampleVectors : Array < VectorizeVector > = [ { id : "1" , values : [ 32.4 , 74.1 , 3.2 ] , namespace : "text" , } , { id : "2" , values : [ 15.1 , 19.2 , 15.8 ] , namespace : "images" , } , { id : "3" , values : [ 0.16 , 1.2 , 3.8 ] , namespace : "pdfs" , } , ] ; let inserted = await env . TUTORIAL_INDEX . insert ( sampleVectors ) ;

To query vectors within a namespace:

let matches = await env . TUTORIAL_INDEX . query ( queryVector , { namespace : "images" } )

​​ Workers API

Use the insert() and upsert() methods available on an index from within a Cloudflare Worker to insert vectors into the current index.

const sampleVectors : Array < VectorizeVector > = [ { id : "1" , values : [ 32.4 , 74.1 , 3.2 ] , metadata : { url : "/products/sku/13913913" } , } , { id : "2" , values : [ 15.1 , 19.2 , 15.8 ] , metadata : { url : "/products/sku/10148191" } , } , { id : "3" , values : [ 0.16 , 1.2 , 3.8 ] , metadata : { url : "/products/sku/97913813" } , } , ] ; let inserted = await env . TUTORIAL_INDEX . insert ( sampleVectors ) ;

Refer to the Workers Client API documentation for additional examples.

​​ wrangler CLI

You can bulk upload vector embeddings directly:

The file must be in newline-delimited JSON (NDJSON format): each complete vector must be newline separated, and not within an array or object.

Vectors must be complete and include a unique string id per vector.

An example NDJSON formatted file:

embeddings.ndjson { "id" : "4444" , "values" : [ 175.1 , 167.1 , 129.9 ] , "metadata" : { "url" : "/products/sku/918318313" } } { "id" : "5555" , "values" : [ 158.8 , 116.7 , 311.4 ] , "metadata" : { "url" : "/products/sku/183183183" } } { "id" : "6666" , "values" : [ 113.2 , 67.5 , 11.2 ] , "metadata" : { "url" : "/products/sku/717313811" } }

Wrangler version 3.10 required Vectorize requires wrangler version 3.10 or later. Ensure you have the latest version of wrangler installed, or use npx wrangler@latest vectorize to always use the latest version.

$ wrangler vectorize insert <your-index-name> --file=embeddings.ndjson

​​ HTTP API

Vectorize also supports inserting vectors via the HTTP API API link label Open API docs link , which allows you to operate on a Vectorize index from existing machine-learning tooling and languages (including Python).

For example, to insert embeddings in NDJSON format directly from a Python script:

import requests url = "https://api.cloudflare.com/client/v4/accounts/{}/vectorize/indexes/{}/insert" . format ( "your-account-id" , "index-name" ) headers = { "Authorization" : "Bearer <your-api-token>" } with open ( 'embeddings.ndjson' , 'rb' ) as embeddings : resp = requests . post ( url , headers = headers , files = dict ( vectors = embeddings ) ) print ( resp )

This code would insert the vectors defined in embeddings.ndjson into the provided index. Python libraries, including Pandas, also support the NDJSON format via the built-in read_json method: