Syncing
AI Search automatically indexes your content for search. How indexing works depends on your data source.
For instances connected to a website or R2 bucket, AI Search creates jobs to sync your data source. Jobs run automatically every 6 hours and process new, modified, or deleted files to keep your search index up to date.
You can view job status and history in the Jobs tab in the dashboard or using the Instances API.
Files uploaded to built-in storage are indexed immediately. There are no sync jobs. Each file is processed individually as it is uploaded.
| Action | Description |
|---|---|
| Trigger sync | Manually start a sync job to scan your external data source for changes. Can be triggered every 30 seconds. |
| Cancel job | Cancel a running sync job. |
| Pause indexing | Temporarily stop all scheduled sync jobs. |
| Resume indexing | Resume scheduled sync jobs, including jobs paused automatically after inactivity. |
| Sync individual file | Re-index a specific file. |
You can perform these actions from the dashboard, the REST API, or the Workers binding.
The total time to index depends on the number and type of files. Factors that affect performance include:
- Total number of files and their sizes
- File formats (for example, images take longer than plain text)
- Latency of Workers AI models used for embedding and image processing
If an instance receives no search request for 31 days, AI Search automatically pauses its scheduled sync jobs. This applies only to external data sources (website or R2), since built-in storage has no sync jobs. This avoids unnecessary requests to your data source to rescan and sync your instance when it is not being used.
A paused instance stays fully searchable, but source changes are not picked up while sync jobs are paused. After the instance receives search or chat traffic again, AI Search automatically resumes scheduled sync jobs during its activity checks. You can also resume manually with the Resume indexing control. Refer to Controls.
To ensure smooth and reliable indexing:
- Make sure your files are within the size limit and in a supported format to avoid being skipped.
- For R2-backed instances, keep your service API token valid to prevent indexing failures.
- Regularly clean up outdated or unnecessary content to stay within instance limits.