R2
You can use Cloudflare R2 to store data for indexing. To get started, configure an R2 bucket containing your data.
AutoRAG will automatically scan and process supported files stored in that bucket. Files that are unsupported or exceed the size limit will be skipped during indexing and logged as errors.
AutoRAG has different file size limits depending on the file type:
- Plain text files: Up to 4 MB
- Rich format files: Up to 4 MB
Files that exceed these limits will not be indexed and will show up in the error logs.
AutoRAG can ingest a variety of different file types to power your RAG. The following plain text files and rich format files are supported.
AutoRAG supports the following plain text file types:
|Format
|File extensions
|Mime Type
|Text
.txt,
.rst
text/plain
|Log
.log
text/plain
|Config
.ini,
.conf,
.env,
.properties,
.gitignore,
.editorconfig,
.toml
text/plain,
text/toml
|Markdown
.markdown,
.md,
.mdx
text/markdown
|LaTeX
.tex,
.latex
application/x-tex,
application/x-latex
|Script
.sh,
.bat ,
.ps1
application/x-sh ,
application/x-msdos-batch,
text/x-powershell
|SGML
.sgml
text/sgml
|JSON
.json
application/json
|YAML
.yaml,
.yml
application/x-yaml
|CSS
.css
text/css
|JavaScript
.js
application/javascript
|PHP
.php
application/x-httpd-php
|Python
.py
text/x-python
|Ruby
.rb
text/x-ruby
|Java
.java
text/x-java-source
|C
.c
text/x-c
|C++
.cpp,
.cxx
text/x-c++
|C Header
.h,
.hpp
text/x-c-header
|Go
.go
text/x-go
|Rust
.rs
text/rust
|Swift
.swift
text/swift
|Dart
.dart
text/dart
AutoRAG uses Markdown Conversion to convert rich format files to markdown. The following table lists the supported formats that will be converted to Markdown:
Format
File extensions
Mime Types
PDF Documents
Images 1
HTML Documents
XML Documents
Microsoft Office Documents
Open Document Format
CSV
Apple Documents
1 Image conversion uses two Workers AI models for object detection and summarization. See Workers AI pricing for more details.
