Indexes enable D1 to improve query performance over the indexed columns for common (popular) queries by reducing the amount of data (number of rows) the database has to scan when running a query.
When is an index useful?
Indexes are useful:
- When you want to improve the read performance over columns that are regularly used in predicates - for example, a
WHERE email_address = ?or
WHERE user_id = 'a793b483-df87-43a8-a057-e5286d3537c5'- email addresses, usernames, user IDs and/or dates are good choices for columns to index in typical web applications or services.
- For enforcing uniqueness constraints on a column or columns - for example, an email address or user ID via the
CREATE UNIQUE INDEX.
- In cases where you query over multiple columns together -
Indexes are automatically updated when the table and column(s) they reference are inserted, updated or deleted. You do not need to manually update an index after you write to the table it references.
Create an index
To create an index on a D1 table, use the
CREATE INDEX SQL command and specify the table and column(s) to create the index over.
For example, given the following
orders table, you may want to create an index on
customer_id. Nearly all of your queries against that table filter on
customer_id, and you would see a performance improvement by creating an index for it.
CREATE TABLE IF NOT EXISTS orders ( order_id INTEGER PRIMARY KEY, customer_id STRING NOT NULL, -- for example, a unique ID aba0e360-1e04-41b3-91a0-1f2263e1e0fb order_date STRING NOT NULL, status INTEGER NOT NULL, last_updated_date STRING NOT NULL
To create the index on the
customer_id column, execute the below statement against your database:
CREATE INDEX IF NOT EXISTS idx_orders_customer_id ON orders(customer_id)
Queries that reference the
customer_id column will now benefit from the index:
-- Uses the index: the indexed column is referenced by the query.
SELECT * FROM orders WHERE customer_id = ?
-- Does not use the index: customer_id is not in the query.
SELECT * FROM orders WHERE order_date = '2023-05-01'
List the indexes on a database, as well as the SQL definition, by querying the
sqlite_schema system table:
SELECT name, type, sql FROM sqlite_schema WHERE type IN ('index');
This will return output resembling the below:
┌──────────────────────────────────┬───────┬────────────────────────────────────────┐│ name │ type │ sql │├──────────────────────────────────┼───────┼────────────────────────────────────────┤│ idx_users_id │ index │ CREATE INDEX idx_users_id ON users(id) │└──────────────────────────────────┴───────┴────────────────────────────────────────┘
Test an index
For example, if you assume the
users table has an
email_address TEXT column and you created an index
CREATE UNIQUE INDEX idx_email_address ON users(email_address), any query with a predicate on
email_address should use your index.
EXPLAIN QUERY PLAN SELECT * FROM users WHERE email_address = '[email protected]';QUERY PLAN
`--SEARCH users USING INDEX idx_email_address (email_address=?)
USING INDEX <INDEX_NAME> output from the query planner, confirming the index was used.
This is also a fairly common use-case for an index. Finding a user based on their email address is often a very common query type for login (authentication) systems.
For a multi-column index (an index that specifies multiple columns), queries will only use the index if they specify either all of the columns, or a subset of the columns provided all columns to the “left” are also within the query.
Given an index of
CREATE INDEX idx_customer_date_transaction_date ON transactions(customer_id, transaction_date), the following table shows when the index is used (or not):
SELECT * FROM transactions WHERE customer_id = '1234' AND transaction_date = '2023-03-25'
|Yes: specifies both columns in the index.
SELECT * FROM transactions WHERE transaction_date = '2023-03-28'
|No: only specifies
transaction_date, and does not include other leftmost columns from the index.
SELECT * FROM transactions WHERE customer_id = '56789'
customer_id, which is the leftmost column in the index.
- If you created an index over three columns instead —
shipping_status— a query that uses both
transaction_datewould use the index, as you are including all columns “to the left”.
- With the same index, a query that uses only
shipping_statuswould not use the index, as you have not used
customer_id(the leftmost column) in the query.
Partial indexes are indexes over a subset of rows in a table. Partial indexes are defined by the use of a
WHERE clause when creating the index. A partial index can be useful to omit certain rows, such as those where values are
NULL or where rows with a specific value are present across queries.
- A concrete example of a partial index would be on a table with a
order_status INTEGERcolumn, where
"order complete"in your application code.
- This would allow queries against orders that are yet to be fulfilled, shipped or are in-progress, which are likely to be some of the most common users (users checking their order status).
- Partial indexes also keep the index from growing unbounded over time. The index does not need to keep a row for every completed order, and completed orders are likely to be queried far fewer times than in-progress orders.
A partial index that filters out completed orders from the index would resemble the following:
CREATE INDEX idx_order_status_not_complete ON orders(order_status) WHERE order_status != 6
DROP INDEX to remove an index. Dropped indexes cannot be restored.
Take note of the following considerations when creating indexes:
- Indexes are not always a free performance boost. You should create indexes only on columns that reflect your most-queried columns. Indexes themselves need to be maintained. When you write to an indexed column, the database needs to write to the table and the index. The performance benefit of an index and reduction in rows read will, in nearly all cases, offset this additional write.
- You cannot create indexes that reference other tables or use non-deterministic functions, since the index would not be stable.
- Indexes cannot be updated. To add or remove a column from an index, the index and then with the new columns.
- Indexes contribute to the overall storage required by your database: an index is effectively a table itself.