Metadata filtering
In addition to providing an input vector to your query, you can also filter by vector metadata associated with every vector. Query results will only include vectors that match the filter
criteria, meaning that filter
is applied first, and the topK
results are taken from the filtered set.
By using metadata filtering to limit the scope of a query, you can filter by specific customer IDs, tenant, product category or any other metadata you associate with your vectors.
Vectorize supports namespace filtering by default, but to filter on another metadata property of your vectors, you'll need to create a metadata index. You can create up to 10 metadata indexes per Vectorize index.
Metadata indexes for properties of type string
, number
and boolean
are supported. Please refer to Create metadata indexes for details.
You can store up to 10KiB of metadata per vector. See Vectorize Limits for a complete list of limits.
For metadata indexes of type number
, the indexed number precision is that of float64.
For metadata indexes of type string
, each vector indexes the first 64B of the string data truncated on UTF-8 character boundaries to the longest well-formed UTF-8 substring within that limit, so vectors are filterable on the first 64B of their value for each indexed property.
An optional filter
property on query()
method specifies metadata filters:
Operator | Description |
---|---|
$eq | Equals |
$ne | Not equals |
$in | In |
$nin | Not in |
$lt | Less than |
$lte | Less than or equal to |
$gt | Greater than |
$gte | Greater than or equal to |
filter
must be non-empty object whose compact JSON representation must be less than 2048 bytes.filter
object keys cannot be empty, contain" | .
(dot is reserved for nesting), start with$
, or be longer than 512 characters.- For
$eq
and$ne
,filter
object non-nested values can bestring
,number
,boolean
, ornull
values. - For
$in
and$nin
,filter
object values can be arrays ofstring
,number
,boolean
, ornull
values. - Upper-bound range queries (i.e.
$lt
and$lte
) can be combined with lower-bound range queries (i.e.$gt
and$gte
) within the same filter. Other combinations are not allowed. - For range queries (i.e.
$lt
,$lte
,$gt
,$gte
),filter
object non-nested values can bestring
ornumber
values. Strings are ordered lexicographically. - Range queries involving a large number of vectors (~10M and above) may experience reduced accuracy.
Both namespaces and metadata filtering narrow the vector search space for a query. Consider the following when evaluating both filter types:
- A namespace filter is applied before metadata filter(s).
- A vector can only be part of a single namespace with the documented limits. Vector metadata can contain multiple key-value pairs up to metadata per vector limits. Metadata values support different types (
string
,boolean
, and others), therefore offering more flexibility.
Range queries can be used to implement prefix searching on string metadata fields. For example, the following filter matches all values starting with "net":
With the following index definition:
Create metadata indexes:
Metadata can be added when inserting or upserting vectors.
Use the query()
method:
Results without metadata filtering:
The same query()
method with a filter
property supports metadata filtering.
Results with metadata filtering:
- As of now, metadata indexes need to be created for Vectorize indexes before vectors can be inserted to support metadata filtering.
- Only indexes created on or after 2023-12-06 support metadata filtering. Previously created indexes cannot be migrated to support metadata filtering.