Syncing

AI Search automatically indexes your content for search. How indexing works depends on your data source.

External data sources

For instances connected to a website or R2 bucket, AI Search creates jobs to sync your data source. Jobs run automatically on a schedule, every 6 hours by default, and process new, modified, or deleted files to keep your search index up to date.

You can view job status and history in the Jobs tab in the dashboard or using the Instances API.

Sync interval

By default, AI Search runs a sync job every 6 hours. To change how often scheduled syncs run, use the Sync interval setting in the dashboard, or set the sync_interval field when you create or update an instance through the Workers binding or REST API.

The interval can be 1, 2, 4, 6, 12, or 24 hours. In the API, sync_interval is specified in seconds, so the allowed values are 3600 (1 hour), 7200 (2 hours), 14400 (4 hours), 21600 (6 hours, the default), 43200 (12 hours), and 86400 (24 hours).

Trigger syncs from automated pipelines

Sync jobs normally run on a schedule, but you can also start one programmatically whenever your source content changes. This is useful for connecting AI Search to a CMS or a content pipeline: when a publish event or a build step completes, have it trigger a sync so the index reflects the change without waiting for the next scheduled run.

Trigger a sync job with the Wrangler CLI, for example from a CI/CD step or deploy hook:

npx wrangler ai-search jobs create <INSTANCE_NAME>

Or call the Create job REST API from a CMS webhook or a Worker. Sync jobs can be triggered at most once every 30 seconds.

Built-in storage

Files uploaded to built-in storage are indexed immediately. There are no sync jobs. Each file is processed individually as it is uploaded.

Controls

Action	Description
Trigger sync	Manually start a sync job to scan your external data source for changes. Can be triggered every 30 seconds.
Cancel job	Cancel a running sync job.
Pause indexing	Temporarily stop all scheduled sync jobs.
Resume indexing	Resume scheduled sync jobs, including jobs paused automatically after inactivity.
Sync individual file	Re-index a specific file.

You can perform these actions from the dashboard, the REST API, or the Workers binding.

Performance

The total time to index depends on the number and type of files. Factors that affect performance include:

Total number of files and their sizes
File formats (for example, images take longer than plain text)
Latency of Workers AI models used for embedding and image processing

Automatic pausing for inactive instances

If an instance receives no search request for 31 days, AI Search automatically pauses its scheduled sync jobs. This applies only to external data sources (website or R2), since built-in storage has no sync jobs. This avoids unnecessary requests to your data source to rescan and sync your instance when it is not being used.

A paused instance stays fully searchable, but source changes are not picked up while sync jobs are paused. After the instance receives search or chat traffic again, AI Search automatically resumes scheduled sync jobs during its activity checks. You can also resume manually with the Resume indexing control. Refer to Controls.

Best practices

To ensure smooth and reliable indexing:

Make sure your files are within the size limit and in a supported format to avoid being skipped.
For R2-backed instances, keep your service API token valid to prevent indexing failures.
Regularly clean up outdated or unnecessary content to stay within instance limits.