Database Indexing: Boost Your Query Performance
Your database query that once took milliseconds is now taking seconds. Your application is slowing down, users are complaining, and you're staring at a SELECT statement wondering what went wrong. The answer, more often than not, lies in database indexing—or the lack thereof.
In this comprehensive guide, we'll demystify database indexes, explore how they work under the hood, learn when to use different types of indexes, and discover best practices that will transform your slow queries into lightning-fast operations.
Proper indexing can improve query performance by 100x to 1000x or more. A query that takes 10 seconds without an index might take just 10 milliseconds with one. Learning indexing is one of the highest-ROI skills for any backend developer.
What is a Database Index?
Think of a database index like the index at the back of a textbook. Instead of reading every page to find a topic, you look it up in the index and jump directly to the relevant pages. A database index works the same way—it's a separate data structure that helps the database find rows quickly without scanning every single row in a table.
Without an index, the database performs a full table scan, examining every row to find matches. With an index, it can jump directly to the relevant rows, dramatically reducing the work required.
Without Index
Full table scan on 1M rows
With Index
Index lookup + row fetch
How Indexes Work: B-Tree Deep Dive
Most database indexes use a data structure called a B-Tree (Balanced Tree). Understanding how B-Trees work will help you make better indexing decisions.
B-Tree Structure Explained
A B-Tree organizes data in a hierarchical structure with multiple levels. Each node contains keys (indexed values) and pointers to either child nodes or actual data rows. The tree is kept balanced, ensuring consistent lookup times.
B-Tree structure showing how index lookups traverse from root to leaf nodes
Why B-Trees Are So Fast
The magic of B-Trees lies in their logarithmic time complexity. In a table with 1 million rows, a B-Tree index can find any row in about 20 comparisons (log₂ 1,000,000 ≈ 20), compared to potentially 1 million comparisons with a full table scan.
O(log n) - As your table grows, lookup time grows logarithmically, not linearly.
- 1,000 rows: ~10 comparisons
- 1,000,000 rows: ~20 comparisons
- 1,000,000,000 rows: ~30 comparisons
Types of Database Indexes
Different scenarios call for different types of indexes. Let's explore the most common ones and when to use each.
Primary Key Index
Automatically created on the primary key column. Unique and clustered (in InnoDB).
PRIMARY KEY (id)
Unique Index
Ensures all values in the indexed column(s) are unique. Great for email, username.
UNIQUE INDEX (email)
Regular (Secondary) Index
Standard B-Tree index for frequently queried columns. Most common type.
INDEX (status)
Composite Index
Index on multiple columns. Order matters! Follows the leftmost prefix rule.
INDEX (country, city, zip)
Full-Text Index
Optimized for searching text content. Supports natural language queries.
FULLTEXT INDEX (content)
Spatial Index
For geographic data types. Used in location-based queries.
SPATIAL INDEX (location)
Creating Indexes in MySQL
Let's look at the practical syntax for creating different types of indexes in MySQL.
Basic Index Creation
-- Create index when creating table
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
email VARCHAR(255) UNIQUE, -- Unique index created automatically
username VARCHAR(100),
status ENUM('active', 'inactive', 'banned'),
country VARCHAR(50),
city VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_status (status), -- Single column index
INDEX idx_location (country, city), -- Composite index
INDEX idx_created (created_at) -- Index for date range queries
);
-- Add index to existing table
CREATE INDEX idx_username ON users (username);
-- Add unique index
CREATE UNIQUE INDEX idx_email ON users (email);
-- Add composite index
CREATE INDEX idx_status_country ON users (status, country);
-- Drop an index
DROP INDEX idx_username ON users;
-- Show indexes on a table
SHOW INDEX FROM users;
Analyzing Query Performance with EXPLAIN
The EXPLAIN command is your best friend for understanding how MySQL executes queries and whether indexes are being used.
-- Basic EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';
-- EXPLAIN with FORMAT=JSON for detailed info
EXPLAIN FORMAT=JSON SELECT * FROM users WHERE status = 'active' AND country = 'US';
-- EXPLAIN ANALYZE (MySQL 8.0+) - Actually runs the query
EXPLAIN ANALYZE SELECT * FROM users WHERE created_at > '2024-01-01';
Understanding EXPLAIN Output
| Column | Meaning | What to Look For |
|---|---|---|
type |
Join type / access method | const, ref, range = Good ALL = Full table scan |
possible_keys |
Indexes that could be used | Should list relevant indexes |
key |
Index actually used | Should not be NULL for indexed queries |
rows |
Estimated rows to examine | Lower is better |
Extra |
Additional information | Using index = Covering index Using filesort = Extra sorting needed |
Composite Indexes: The Leftmost Prefix Rule
Composite indexes (indexes on multiple columns) are incredibly powerful but come with an important rule: the leftmost prefix rule. Understanding this is crucial for effective indexing.
The leftmost prefix rule determines which queries can effectively use a composite index
When creating composite indexes, place the most selective column (the one that filters out the most rows) first, unless query patterns suggest otherwise. Consider which WHERE clauses you'll use most often.
Covering Indexes
A covering index is an index that contains all the columns needed to satisfy a query. When a query can be answered entirely from the index without accessing the actual table rows, it's significantly faster.
-- Create a covering index for a common query
CREATE INDEX idx_covering ON orders (customer_id, status, total, created_at);
-- This query is "covered" - no table access needed!
SELECT customer_id, status, total, created_at
FROM orders
WHERE customer_id = 123 AND status = 'completed';
-- EXPLAIN will show "Using index" in the Extra column
EXPLAIN SELECT customer_id, status, total, created_at
FROM orders
WHERE customer_id = 123 AND status = 'completed';
-- This query is NOT covered (needs 'shipping_address')
SELECT customer_id, status, total, shipping_address
FROM orders
WHERE customer_id = 123;
-- MySQL must look up rows in the table for shipping_address
Covering indexes eliminate the need for additional table lookups
Index Trade-offs: The Cost of Indexes
Indexes aren't free. Every index you create has costs that you need to consider.
Storage Space
Each index consumes disk space. A table with many indexes can have index data larger than the actual table data.
Write Performance
Every INSERT, UPDATE, and DELETE must also update all relevant indexes. More indexes = slower writes.
Maintenance Overhead
Indexes can become fragmented over time, requiring periodic optimization. More indexes = more maintenance.
A common mistake is adding indexes for every column "just in case." This slows down writes significantly and wastes storage. Only create indexes for columns that are:
- Frequently used in WHERE clauses
- Used in JOIN conditions
- Used in ORDER BY or GROUP BY
- Used in queries that need optimization
Best Practices for Database Indexing
1. Index Columns in WHERE Clauses
-- If you frequently run this query:
SELECT * FROM orders WHERE customer_id = ? AND status = 'pending';
-- Create an index to support it:
CREATE INDEX idx_customer_status ON orders (customer_id, status);
2. Index Foreign Key Columns
Always index foreign key columns. They're used in JOINs and can significantly speed up related queries.
CREATE TABLE order_items (
id INT PRIMARY KEY AUTO_INCREMENT,
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT,
price DECIMAL(10,2),
-- Always index foreign keys!
INDEX idx_order_id (order_id),
INDEX idx_product_id (product_id),
FOREIGN KEY (order_id) REFERENCES orders(id),
FOREIGN KEY (product_id) REFERENCES products(id)
);
3. Avoid Indexing Low-Cardinality Columns
Cardinality refers to the number of unique values in a column. Indexes are most effective on high-cardinality columns.
| Column | Cardinality | Index Value |
|---|---|---|
email (unique per user) |
High (1M unique values) | Excellent |
created_at (timestamps) |
High (many unique values) | Excellent |
country (~200 values) |
Medium | Good for filtering subsets |
status (active/inactive) |
Low (2 values) | Poor alone |
is_deleted (0/1) |
Very Low (2 values) | Usually not worth it |
4. Use Partial Indexes for Large Text Columns
-- For long text columns, index only a prefix
CREATE INDEX idx_title_prefix ON articles (title(50));
-- For URLs, the first 100 chars are usually enough
CREATE INDEX idx_url_prefix ON links (url(100));
-- This reduces index size while maintaining usefulness
5. Monitor and Remove Unused Indexes
-- Check index usage statistics
SELECT
object_schema AS database_name,
object_name AS table_name,
index_name,
count_star AS total_accesses,
count_read AS reads,
count_write AS writes
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE object_schema = 'your_database'
AND index_name IS NOT NULL
ORDER BY count_star ASC;
-- Indexes with 0 reads might be candidates for removal
-- But verify first - they might be used by rare but important queries!
Real-World Optimization Example
Let's walk through a complete optimization scenario for an e-commerce orders table.
Step 1: Analyze the Schema and Queries
CREATE TABLE orders (
id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT NOT NULL,
status ENUM('pending', 'processing', 'shipped', 'delivered', 'cancelled'),
total DECIMAL(10,2),
shipping_country VARCHAR(50),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
-- Common queries that are running slow:
-- 1. Get all orders for a customer
SELECT * FROM orders WHERE customer_id = 12345;
-- 2. Get recent pending orders
SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at DESC;
-- 3. Get orders by country and status for reporting
SELECT shipping_country, COUNT(*), SUM(total)
FROM orders
WHERE status = 'delivered' AND created_at > '2024-01-01'
GROUP BY shipping_country;
Step 2: Use EXPLAIN to Identify Problems
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
-- Output shows:
-- type: ALL (full table scan!)
-- rows: 1000000 (scanning all rows)
-- Extra: Using where
-- This is BAD - every query scans the entire table!
Step 3: Add Strategic Indexes
-- For customer lookups
CREATE INDEX idx_customer ON orders (customer_id);
-- For status + date queries (composite, ordered by selectivity and query pattern)
CREATE INDEX idx_status_created ON orders (status, created_at);
-- For reporting queries
CREATE INDEX idx_status_created_country ON orders (status, created_at, shipping_country);
-- Verify indexes are created
SHOW INDEX FROM orders;
Step 4: Verify Improvement
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
-- Output now shows:
-- type: ref (using index!)
-- possible_keys: idx_customer
-- key: idx_customer
-- rows: 47 (only scanning matching rows!)
-- Extra: NULL
-- Query time improved from 2.4s to 3ms - 800x faster!
Common Indexing Mistakes to Avoid
Mistake 1: Functions on Indexed Columns
Using functions on indexed columns prevents index usage:
-- BAD: Index on created_at won't be used
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
-- GOOD: Rewrite to allow index usage
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
Mistake 2: Leading Wildcards in LIKE
Wildcards at the start of a LIKE pattern prevent index usage:
-- BAD: Index won't be used (leading wildcard)
SELECT * FROM users WHERE email LIKE '%@gmail.com';
-- GOOD: Trailing wildcard can use index
SELECT * FROM users WHERE email LIKE 'john%';
-- For email domain searches, consider a separate indexed column
-- or use a full-text index
Mistake 3: Implicit Type Conversion
When the query value type doesn't match the column type, indexes may not be used:
-- If phone_number is VARCHAR:
-- BAD: Numeric comparison causes type conversion
SELECT * FROM users WHERE phone_number = 5551234567;
-- GOOD: String comparison matches column type
SELECT * FROM users WHERE phone_number = '5551234567';
Mistake 4: OR Conditions Without Proper Indexes
OR conditions can be tricky for index optimization:
-- This might not use indexes efficiently
SELECT * FROM orders WHERE customer_id = 123 OR status = 'pending';
-- Solution 1: Ensure both columns are indexed
CREATE INDEX idx_customer ON orders (customer_id);
CREATE INDEX idx_status ON orders (status);
-- Solution 2: Use UNION for complex cases
SELECT * FROM orders WHERE customer_id = 123
UNION
SELECT * FROM orders WHERE status = 'pending';
Key Takeaways
Indexes Trade Write Speed for Read Speed
Every index speeds up reads but slows down writes. Find the right balance for your workload.
Use EXPLAIN Before and After
Always measure query performance with EXPLAIN. Don't guess—verify that your indexes are actually being used.
Composite Index Order Matters
The leftmost prefix rule determines which queries benefit. Plan your composite indexes based on your query patterns.
Covering Indexes Are Powerful
When an index contains all columns a query needs, performance can improve dramatically.
Monitor and Maintain
Regularly review index usage statistics. Remove unused indexes and add new ones as query patterns evolve.
The best index strategy depends on your specific workload. Profile your actual queries, understand your data distribution, and iterate. There's no one-size-fits-all solution—but with these fundamentals, you're equipped to make informed decisions that dramatically improve your database performance.
