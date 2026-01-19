 Skip to content
Path filtering

Path filtering allows you to control which files or URLs are indexed by defining include and exclude patterns. Use this to limit indexing to specific content or to skip files you do not want searchable.

Path filtering works with both website and R2 data sources.

Configuration

You can configure path filters when creating or editing an AI Search instance. In the dashboard, open Path Filters and add your include or exclude rules. You can also update path filters at any time from the Settings page of your instance.

When using the API, specify include_items and exclude_items in the source_params of your configuration:

ParameterTypeLimitDescription
include_itemsstring[]Maximum 10 patternsOnly index items matching at least one of these patterns
exclude_itemsstring[]Maximum 10 patternsSkip items matching any of these patterns

Both parameters are optional. If neither is specified, all items from the data source are indexed.

Filtering behavior

Wildcard rules

Exclude rules take precedence over include rules. Filtering is applied in this order:

  1. Exclude check: If the item matches any exclude pattern, it is skipped.
  2. Include check: If include patterns are defined and the item does not match any of them, it is skipped.
  3. Index: The item proceeds to indexing.
ScenarioBehavior
No rules definedAll items are indexed
Only exclude_items definedAll items except those matching exclude patterns are indexed
Only include_items definedOnly items matching at least one include pattern are indexed
Both definedExclude patterns are checked first, then remaining items must match an include pattern

Pattern syntax

Patterns use a case-sensitive wildcard syntax based on micromatch:

WildcardMeaning
*Matches any characters except path separators (/)
**Matches any characters including path separators (/)

Patterns can contain:

  • Letters, numbers, and underscores (a-z, A-Z, 0-9, _)
  • Hyphens (-) and dots (.)
  • Path separators (/)
  • URL characters (?, :, =, &, %)
  • Wildcards (*, **)

Indexing job status

Items skipped by filtering rules are recorded in job logs with the reason:

  • Exclude match: Skipped by rule: {pattern}
  • No include match: Skipped by Include Rules

You can view these in the Jobs tab of your AI Search instance to verify your filters are working as expected.

Important notes

  • Case sensitivity: Pattern matching is case-sensitive. /Blog/* does not match /blog/post.html.
  • Full path matching: Patterns match the entire path or URL. Use ** at the beginning for partial matching. For example, docs/* matches docs/file.pdf but not site/docs/file.pdf, while **/docs/* matches both.
  • Single * does not cross directories: Use ** to match across path separators. For example, docs/* matches docs/file.pdf but not docs/sub/file.pdf, while docs/** matches both.
  • Trailing slashes matter: URLs are matched as-is without normalization. /blog/ does not match /blog.

Examples

R2 data source

Use casePatternIndexedSkipped
Index only PDFs in docsInclude: /docs/**/*.pdf/docs/guide.pdf, /docs/api/ref.pdf/docs/guide.md, /images/logo.png
Exclude temp and backup filesExclude: **/*.tmp, **/*.bak/docs/guide.md/data/cache.tmp, /old.bak
Exclude temp and backup foldersExclude: /temp/**, /backup/**/docs/guide.md/temp/file.txt, /backup/data.json
Index docs but exclude draftsInclude: /docs/**, Exclude: /docs/drafts/**/docs/guide.md/docs/drafts/wip.md

Website data source

Use casePatternIndexedSkipped
Index only blog pagesInclude: **/blog/**example.com/blog/post, example.com/en/blog/articleexample.com/about
Exclude admin pagesExclude: **/admin/**example.com/blog/postexample.com/admin/settings
Exclude login pagesExclude: **/login*example.com/blog/postexample.com/login, example.com/auth/login-form
Index docs but exclude draftsInclude: **/docs/**, Exclude: **/docs/drafts/**example.com/docs/guideexample.com/docs/drafts/wip

API format

When using the API, specify patterns in source_params:

{
  "source_params": {
    "include_items": ["<PATTERN_1>", "<PATTERN_2>"],
    "exclude_items": ["<PATTERN_1>", "<PATTERN_2>"]
  }
}