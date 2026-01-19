Path filtering
Path filtering allows you to control which files or URLs are indexed by defining include and exclude patterns. Use this to limit indexing to specific content or to skip files you do not want searchable.
Path filtering works with both website and R2 data sources.
You can configure path filters when creating or editing an AI Search instance. In the dashboard, open Path Filters and add your include or exclude rules. You can also update path filters at any time from the Settings page of your instance.
When using the API, specify
include_items and
exclude_items in the
source_params of your configuration:
|Parameter
|Type
|Limit
|Description
include_items
string[]
|Maximum 10 patterns
|Only index items matching at least one of these patterns
exclude_items
string[]
|Maximum 10 patterns
|Skip items matching any of these patterns
Both parameters are optional. If neither is specified, all items from the data source are indexed.
Exclude rules take precedence over include rules. Filtering is applied in this order:
- Exclude check: If the item matches any exclude pattern, it is skipped.
- Include check: If include patterns are defined and the item does not match any of them, it is skipped.
- Index: The item proceeds to indexing.
|Scenario
|Behavior
|No rules defined
|All items are indexed
|Only
exclude_items defined
|All items except those matching exclude patterns are indexed
|Only
include_items defined
|Only items matching at least one include pattern are indexed
|Both defined
|Exclude patterns are checked first, then remaining items must match an include pattern
Patterns use a case-sensitive wildcard syntax based on micromatch ↗:
|Wildcard
|Meaning
*
|Matches any characters except path separators (
/)
**
|Matches any characters including path separators (
/)
Patterns can contain:
- Letters, numbers, and underscores (
a-z,
A-Z,
0-9,
_)
- Hyphens (
-) and dots (
.)
- Path separators (
/)
- URL characters (
?,
:,
=,
&,
%)
- Wildcards (
*,
**)
Items skipped by filtering rules are recorded in job logs with the reason:
- Exclude match:
Skipped by rule: {pattern}
- No include match:
Skipped by Include Rules
You can view these in the Jobs tab of your AI Search instance to verify your filters are working as expected.
- Case sensitivity: Pattern matching is case-sensitive.
/Blog/*does not match
/blog/post.html.
- Full path matching: Patterns match the entire path or URL. Use
**at the beginning for partial matching. For example,
docs/*matches
docs/file.pdfbut not
site/docs/file.pdf, while
**/docs/*matches both.
- Single
*does not cross directories: Use
**to match across path separators. For example,
docs/*matches
docs/file.pdfbut not
docs/sub/file.pdf, while
docs/**matches both.
- Trailing slashes matter: URLs are matched as-is without normalization.
/blog/does not match
/blog.
|Use case
|Pattern
|Indexed
|Skipped
|Index only PDFs in docs
|Include:
/docs/**/*.pdf
/docs/guide.pdf,
/docs/api/ref.pdf
/docs/guide.md,
/images/logo.png
|Exclude temp and backup files
|Exclude:
**/*.tmp,
**/*.bak
/docs/guide.md
/data/cache.tmp,
/old.bak
|Exclude temp and backup folders
|Exclude:
/temp/**,
/backup/**
/docs/guide.md
/temp/file.txt,
/backup/data.json
|Index docs but exclude drafts
|Include:
/docs/**, Exclude:
/docs/drafts/**
/docs/guide.md
/docs/drafts/wip.md
|Use case
|Pattern
|Indexed
|Skipped
|Index only blog pages
|Include:
**/blog/**
example.com/blog/post,
example.com/en/blog/article
example.com/about
|Exclude admin pages
|Exclude:
**/admin/**
example.com/blog/post
example.com/admin/settings
|Exclude login pages
|Exclude:
**/login*
example.com/blog/post
example.com/login,
example.com/auth/login-form
|Index docs but exclude drafts
|Include:
**/docs/**, Exclude:
**/docs/drafts/**
example.com/docs/guide
example.com/docs/drafts/wip
When using the API, specify patterns in
source_params:
