Metadata

Use metadata to filter documents before retrieval and provide context to guide AI responses. This page covers how to apply filters and attach optional context metadata to your files.

Metadata filtering

Metadata filtering narrows down search results based on metadata, so only relevant content is retrieved. The filter narrows down results prior to retrieval, so that you only query the scope of documents that matter.

Here is an example of metadata filtering using Workers Binding but it can be easily adapted to use the REST API instead.

const answer = await env.AI.autorag("my-autorag").search({
  query: "How do I train a llama to deliver coffee?",
  filters: {
    type: "and",
    filters: [
      {
        type: "eq",
        key: "folder",
        value: "llama/logistics/",
      },
      {
        type: "gte",
        key: "timestamp",
        value: "1735689600000", // unix timestamp for 2025-01-01
      },
    ],
  },
});

Metadata attributes

Attribute	Description	Example
`filename`	The name of the file.	`dog.png` or `animals/mammals/cat.png`
`folder`	The folder or prefix to the object.	For the object `animals/mammals/cat.png`, the folder is `animals/mammals/`
`timestamp`	The timestamp for when the object was last modified. Comparisons are supported using a 13-digit Unix timestamp (milliseconds), but values will be rounded down to 10 digits (seconds).	The timestamp `2025-01-01 00:00:00.999 UTC` is `1735689600999` and it will be rounded down to `1735689600000`, corresponding to `2025-01-01 00:00:00 UTC`

Filter schema

You can create simple comparison filters or an array of comparison filters using a compound filter.

Comparison filter

You can compare a metadata attribute (for example, folder or timestamp) with a target value using a comparison filter.

filters: {
  type: "operator",
  key: "metadata_attribute",
  value: "target_value"
}

The available operators for the comparison are:

Operator	Description
`eq`	Equals
`ne`	Not equals
`gt`	Greater than
`gte`	Greater than or equals to
`lt`	Less than
`lte`	Less than or equals to

Compound filter

You can use a compound filter to combine multiple comparison filters with a logical operator.

filters: {
  type: "compound_operator",
  filters: [...]
}

The available compound operators are: and, or.

Note the following limitations with the compound operators:

No nesting combinations of and's and or's, meaning you can only pick 1 and or 1 or.
When using or:
- Only the eq operator is allowed.
- All conditions must filter on the same key (for example, all on folder)

"Starts with" filter for folders

You can use "starts with" filtering on the folder metadata attribute to search for all files and subfolders within a specific path.

For example, consider this file structure:

Directorycustomer-a
- profile.md
- Directorycontracts
  - Directoryproperty
    contract-1.pdf

If you were to filter using an eq (equals) operator with value: "customer-a/", it would only match files directly within that folder, like profile.md. It would not include files in subfolders like customer-a/contracts/.

To recursively filter for all items starting with the path customer-a/, you can use the following compound filter:

filters: {
    type: "and",
    filters: [
      {
        type: "gt",
        key: "folder",
        value: "customer-a//",
      },
      {
        type: "lte",
        key: "folder",
        value: "customer-a/z",
      },
    ],
  },

This filter identifies paths starting with customer-a/ by using:

The and condition to combine the effects of the gt and lte conditions.
The gt condition to include paths greater than the / ASCII character.
The lte condition to include paths less than and including the lower case z ASCII character.

Together, these conditions effectively select paths that begin with the provided path value.

Add `context` field to guide AI Search

You can optionally include a custom metadata field named context when uploading an object to your R2 bucket.

The context field is attached to each chunk and passed to the LLM during an /ai-search query. It does not affect retrieval but helps the LLM interpret and frame the answer.

The field can be used for providing document summaries, source links, or custom instructions without modifying the file content.

You can add custom metadata to an object in the /PUT operation when uploading the object to your R2 bucket. For example if you are using the Workers binding with R2:

await env.MY_BUCKET.put("cat.png", file, {
  customMetadata: {
    context: "This is a picture of Joe's cat. His name is Max."
  }
});

During /ai-search, this context appears in the response under attributes.file.context, and is included in the data passed to the LLM for generating a response.

Response

You can see the metadata attributes of your retrieved data in the response under the property attributes for each retrieved chunk. For example:

"data": [
  {
    "file_id": "llama001",
    "filename": "llama/logistics/llama-logistics.md",
    "score": 0.45,
    "attributes": {
      "timestamp": 1735689600000,   // unix timestamp for 2025-01-01
      "folder": "llama/logistics/",
      "file": {
        "url": "www.llamasarethebest.com/logistics"
        "context": "This file contains information about how llamas can logistically deliver coffee."
      }
    },
    "content": [
      {
        "id": "llama001",
        "type": "text",
        "text": "Llamas can carry 3 drinks max."
      }
    ]
  }
]

Was this helpful?

Community
X
Discord
YouTube
GitHub