How AutoRAG works

AutoRAG sets up and manages your RAG pipeline for you. It connects the tools needed for indexing, retrieval, and generation, and keeps everything up to date by syncing with your data with the index regularly. Once set up, AutoRAG indexes your content in the background and responds to queries in real time.

AutoRAG consists of two core processes:

Indexing: An asynchronous background process that monitors your data source for changes and converts your data into vectors for search.
Querying: A synchronous process triggered by user queries. It retrieves the most relevant content and generates context-aware responses.

How indexing works

Indexing begins automatically when you create an AutoRAG instance and connect a data source.

Here is what happens during indexing:

Data ingestion: AutoRAG reads from your connected data source.
Markdown conversion: AutoRAG uses Workers AI’s Markdown Conversion to convert supported data types into structured Markdown. This ensures consistency across diverse file types. For images, Workers AI is used to perform object detection followed by vision-to-language transformation to convert images into Markdown text.
Chunking: The extracted text is chunked into smaller pieces to improve retrieval granularity.
Embedding: Each chunk is embedded using Workers AI’s embedding model to transform the content into vectors.
Vector storage: The resulting vectors, along with metadata like file name, are stored in a the Vectorize database created on your Cloudflare account.

After the initial data set is indexed, AutoRAG will regularly check for updates in your data source (e.g. additions, updates, or deletes) and index changes to ensure your vector database is up to date.

How querying works

Once indexing is complete, AutoRAG is ready to respond to end-user queries in real time.

Here is how the querying pipeline works:

Receive query from AutoRAG API: The query workflow begins when you send a request to either the AutoRAG’s AI Search or Search endpoints.
Query rewriting (optional): AutoRAG provides the option to rewrite the input query using one of Workers AI’s LLMs to improve retrieval quality by transforming the original query into a more effective search query.
Embedding the query: The rewritten (or original) query is transformed into a vector via the same embedding model used to embed your data so that it can be compared against your vectorized data to find the most relevant matches.
Querying Vectorize index: The query vector is queried against stored vectors in the associated Vectorize database for your AutoRAG.
Content retrieval: Vectorize returns the metadata of the most relevant chunks, and the original content is retrieved from the R2 bucket. If you are using the Search endpoint, the content is returned at this point.
Response generation: If you are using the AI Search endpoint, then a text-generation model from Workers AI is used to generate a response using the retrieved content and the original user’s query, combined via a system prompt. The context-aware response from the model is returned.

Was this helpful?

Community
X
Discord
YouTube
GitHub