Similarity cache
Similarity-based caching in AutoRAG lets you serve responses from Cloudflare’s cache for queries that are similar to previous requests, rather than creating new, unique responses for every request. This speeds up response times and cuts costs by reusing answers for questions that are close in meaning.
Unlike with basic caching, which creates a new response with every request, this is what happens when a request is received using similarity-based caching:
- AutoRAG checks if a similar prompt (based on your chosen threshold) has been answered before.
- If a match is found, it returns the cached response instantly.
- If no match is found, it generates a new response and caches it.
To see if a response came from the cache, check the cf-aig-cache-status
header: HIT
for cached and MISS
for new.
Consider these behaviors when using similarity caching:
- Volatile Cache: If two similar requests hit at the same time, the first might not cache in time for the second to use it, resulting in a
MISS
. - 30-Day Cache: Cached responses last 30 days, then expire automatically. No custom durations for now.
- Data Dependency: Cached responses are tied to specific document chunks. If those chunks change or get deleted, the cache clears to keep answers fresh.
AutoRAG’s similarity cache uses MinHash and Locality-Sensitive Hashing (LSH) to find and reuse responses for prompts that are worded similarly.
Here’s how it works when a new prompt comes in:
- The prompt is split into small overlapping chunks of words (called shingles), like “what’s the” or “the weather.”
- These shingles are turned into a “fingerprint” using MinHash. The more overlap two prompts have, the more similar their fingerprints will be.
- Fingerprints are placed into LSH buckets, which help AutoRAG quickly find similar prompts without comparing every single one.
- If a past prompt in the same bucket is similar enough (based on your configured threshold), AutoRAG reuses its cached response.
The similarity threshold decides how close two prompts need to be to reuse a cached response. Here are the available thresholds:
Threshold | Description | Example Match |
---|---|---|
Exact | Near-identical matches only | "What’s the weather like today?" matches with "What is the weather like today?" |
Strong (default) | High semantic similarity | "What’s the weather like today?" matches with "How’s the weather today?" |
Broad | Moderate match, more hits | "What’s the weather like today?" matches with "Tell me today’s weather" |
Loose | Low similarity, max reuse | "What’s the weather like today?" matches with "Give me the forecast" |
Test these values to see which works best with your application.
Was this helpful?
- Resources
- API
- New to Cloudflare?
- Products
- Sponsorships
- Open Source
- Support
- Help Center
- System Status
- Compliance
- GDPR
- Company
- cloudflare.com
- Our team
- Careers
- 2025 Cloudflare, Inc.
- Privacy Policy
- Terms of Use
- Report Security Issues
- Trademark