Reranking
Reranking can help improve the quality of AI Search results by reordering retrieved documents based on semantic relevance to the user's query. It applies a secondary model after retrieval to rerank the top results before they are returned.
By default, reranking is disabled for all AI Search instances. You can enable it during creation or later from the settings page.
When enabled, AI Search will:
- Retrieve a set of relevant results from your index, constrained by your
max_num_resultsandscore_thresholdparameters. - Pass those results through a reranking model.
- Return the reranked results, which the text generation model can use for answer generation.
Reranking helps improve accuracy, especially for large or noisy datasets where vector similarity alone may not produce the optimal ordering.
When you make a /search or /chat/completions request using the Workers binding or REST API, you can enable or disable reranking per request and specify the reranking model.
const instance = env.AI_SEARCH.get("my-instance");
const results = await instance.search({ messages: [{ role: "user", content: "What is Cloudflare?" }], ai_search_options: { reranking: { enabled: true, model: "@cf/baai/bge-reranker-base", }, },});Adding reranking will include an additional step to the query request, as a result, there may be an increase in the latency of the request.