Metadata
Use metadata to filter documents before retrieval and provide context to guide AI responses. This page covers how to apply filters and attach optional context metadata to your files.
Metadata filtering narrows down search results based on metadata, so only relevant content is retrieved. The filter narrows down results prior to retrieval, so that you only query the scope of documents that matter.
Here is an example of metadata filtering using Workers Binding but it can be easily adapted to use the REST API instead.
const answer = await env.AI.autorag("my-autorag").search({ query: "How do I train a llama to deliver coffee?", filters: { type: "and", filters: [ { type: "eq", key: "folder", value: "llama/logistics/", }, { type: "gte", key: "timestamp", value: "1735689600000", // unix timestamp for 2025-01-01 }, ], },});
Attribute | Description | Example |
---|---|---|
filename | The name of the file. | dog.png or animals/mammals/cat.png |
folder | The folder or prefix to the object. | For the object animals/mammals/cat.png , the folder is animals/mammals/ |
timestamp | The timestamp for when the object was last modified. Comparisons are supported using a 13-digit Unix timestamp (milliseconds), but values will be rounded down to 10 digits (seconds). | The timestamp 2025-01-01 00:00:00.999 UTC is 1735689600999 and it will be rounded down to 1735689600000 , corresponding to 2025-01-01 00:00:00 UTC |
You can create simple comparison filters or an array of comparison filters using a compound filter.
You can compare a metadata attribute (for example, folder
or timestamp
) with a target value using a comparison filter.
filters: { type: "operator", key: "metadata_attribute", value: "target_value"}
The available operators for the comparison are:
Operator | Description |
---|---|
eq | Equals |
ne | Not equals |
gt | Greater than |
gte | Greater than or equals to |
lt | Less than |
lte | Less than or equals to |
You can use a compound filter to combine multiple comparison filters with a logical operator.
filters: { type: "compound_operator", filters: [...]}
The available compound operators are: and
, or
.
Note the following limitations with the compound operators:
- No nesting combinations of
and
's andor
's, meaning you can only pick 1and
or 1or
. - When using
or
:- Only the
eq
operator is allowed. - All conditions must filter on the same key (for example, all on
folder
)
- Only the
You can use "starts with" filtering on the folder
metadata attribute to search for all files and subfolders within a specific path.
For example, consider this file structure:
Directorycustomer-a
- profile.md
Directorycontracts
Directoryproperty
- contract-1.pdf
If you were to filter using an eq
(equals) operator with value: "customer-a/"
, it would only match files directly within that folder, like profile.md
. It would not include files in subfolders like customer-a/contracts/
.
To recursively filter for all items starting with the path customer-a/
, you can use the following compound filter:
filters: { type: "and", filters: [ { type: "gt", key: "folder", value: "customer-a//", }, { type: "lte", key: "folder", value: "customer-a/z", }, ], },
This filter identifies paths starting with customer-a/
by using:
- The
and
condition to combine the effects of thegt
andlte
conditions. - The
gt
condition to include paths greater than the/
ASCII character. - The
lte
condition to include paths less than and including the lower casez
ASCII character.
Together, these conditions effectively select paths that begin with the provided path value.
You can optionally include a custom metadata field named context
when uploading an object to your R2 bucket.
The context
field is attached to each chunk and passed to the LLM during an /ai-search
query. It does not affect retrieval but helps the LLM interpret and frame the answer.
The field can be used for providing document summaries, source links, or custom instructions without modifying the file content.
You can add custom metadata to an object in the /PUT
operation when uploading the object to your R2 bucket. For example if you are using the Workers binding with R2:
await env.MY_BUCKET.put("cat.png", file, { customMetadata: { context: "This is a picture of Joe's cat. His name is Max." }});
During /ai-search
, this context appears in the response under attributes.file.context
, and is included in the data passed to the LLM for generating a response.
You can see the metadata attributes of your retrieved data in the response under the property attributes
for each retrieved chunk. For example:
"data": [ { "file_id": "llama001", "filename": "llama/logistics/llama-logistics.md", "score": 0.45, "attributes": { "timestamp": 1735689600000, // unix timestamp for 2025-01-01 "folder": "llama/logistics/", "file": { "url": "www.llamasarethebest.com/logistics" "context": "This file contains information about how llamas can logistically deliver coffee." } }, "content": [ { "id": "llama001", "type": "text", "text": "Llamas can carry 3 drinks max." } ] }]
Was this helpful?
- Resources
- API
- New to Cloudflare?
- Products
- Sponsorships
- Open Source
- Support
- Help Center
- System Status
- Compliance
- GDPR
- Company
- cloudflare.com
- Our team
- Careers
- 2025 Cloudflare, Inc.
- Privacy Policy
- Terms of Use
- Report Security Issues
- Trademark
-