SQL Query Optimization: Make Queries Fast
Why Queries Are Slow
Your query returns correct results but takes 10 minutes. Users complain. Dashboards timeout. Pipelines miss SLAs.
Slow queries have predictable causes:
- No indexes
- Wrong indexes
- Full table scans
- Inefficient joins
- Bad query structure
Fix these, queries get fast.
Finding Slow Queries
PostgreSQL tracks query performance:
-- Enable query stats (run once)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find slowest queries
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
This shows what’s actually slow in production.
Understanding EXPLAIN
EXPLAIN shows how PostgreSQL executes your query:
EXPLAIN ANALYZE
SELECT *
FROM orders
WHERE customer_id = 123;
Output:
Seq Scan on orders (cost=0.00..1000.00 rows=50 width=100) (actual time=0.05..150.23 rows=47 loops=1)
Filter: (customer_id = 123)
Rows Removed by Filter: 999953
Planning Time: 0.134 ms
Execution Time: 150.421 ms
Seq Scan = full table scan = slow.
Rows Removed by Filter: 999953 = checked 1 million rows, kept 47 = very inefficient.
Fix 1: Add an Index
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Run EXPLAIN ANALYZE again:
Index Scan using idx_orders_customer_id on orders (cost=0.42..8.44 rows=50 width=100) (actual time=0.03..0.15 rows=47 loops=1)
Index Cond: (customer_id = 123)
Planning Time: 0.085 ms
Execution Time: 0.189 ms
Index Scan = used index = fast. Execution Time: 0.189 ms (was 150ms) = 800x faster.
Common Slow Query Patterns
Pattern 1: Missing Index on Foreign Key
-- Slow: No index on customer_id
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.country = 'France';
Fix:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_customers_country ON customers(country);
Pattern 2: Function on Indexed Column
-- Slow: Function prevents index usage
SELECT * FROM orders WHERE DATE(created_at) = '2025-01-01';
This full scans orders even if created_at is indexed.
Fix:
-- Fast: Range comparison uses index
SELECT * FROM orders
WHERE created_at >= '2025-01-01'
AND created_at < '2025-01-02';
Pattern 3: OR Conditions
-- Slow: OR prevents efficient index usage
SELECT * FROM orders WHERE customer_id = 123 OR status = 'pending';
Fix (if both WHERE clauses are selective):
-- Fast: UNION uses indexes on both
SELECT * FROM orders WHERE customer_id = 123
UNION
SELECT * FROM orders WHERE status = 'pending';
Pattern 4: SELECT *
-- Slow: Fetches unnecessary columns
SELECT * FROM orders WHERE customer_id = 123;
Fix:
-- Fast: Only fetch needed columns
SELECT id, amount, created_at FROM orders WHERE customer_id = 123;
Smaller result sets = less memory = faster.
Pattern 5: Implicit Type Conversion
-- Slow: customer_id is INTEGER but '123' is string
SELECT * FROM orders WHERE customer_id = '123';
PostgreSQL converts customer_id to text for each row = index not used.
Fix:
-- Fast: Correct type
SELECT * FROM orders WHERE customer_id = 123;
Composite Indexes
When you filter on multiple columns:
-- Query filters on customer_id AND created_at
SELECT * FROM orders
WHERE customer_id = 123
AND created_at >= '2025-01-01';
Single column indexes help, but composite index is better:
CREATE INDEX idx_orders_customer_created ON orders(customer_id, created_at);
Order matters: (customer_id, created_at) works for:
WHERE customer_id = XWHERE customer_id = X AND created_at > Y
But not for:
WHERE created_at > Y(doesn’t use index)
Indexes Have a Cost
Benefits:
- Fast SELECT queries
- Fast JOIN operations
Costs:
- INSERT/UPDATE/DELETE slower (index must update)
- Disk space
- Maintenance overhead
Don’t index everything. Index what your queries actually use.
Analyzing a Slow Query: Real Example
-- This query takes 30 seconds
SELECT
c.name,
COUNT(o.id) as order_count,
SUM(o.amount) as total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE c.country = 'France'
AND o.created_at >= '2025-01-01'
GROUP BY c.id, c.name;
Run EXPLAIN ANALYZE:
Hash Join (cost=50000.00..100000.00 rows=10000 width=50) (actual time=5000.23..29542.18 rows=8437 loops=1)
Hash Cond: (o.customer_id = c.id)
-> Seq Scan on orders o (cost=0.00..45000.00 rows=500000 width=20) (actual time=0.05..15234.12 rows=500000 loops=1)
Filter: (created_at >= '2025-01-01'::date)
Rows Removed by Filter: 1500000
-> Hash (cost=35000.00..35000.00 rows=5000 width=30) (actual time=250.45..250.45 rows=4972 loops=1)
-> Seq Scan on customers c (cost=0.00..35000.00 rows=5000 width=30) (actual time=0.12..245.67 rows=4972 loops=1)
Filter: (country = 'France'::text)
Rows Removed by Filter: 95028
Problems:
- Seq Scan on orders (checked 2M rows, kept 500k)
- Seq Scan on customers (checked 100k rows, kept 5k)
Fix:
CREATE INDEX idx_customers_country ON customers(country);
CREATE INDEX idx_orders_customer_created ON orders(customer_id, created_at);
Run again:
Hash Join (cost=250.00..1500.00 rows=10000 width=50) (actual time=12.34..145.67 rows=8437 loops=1)
Hash Cond: (o.customer_id = c.id)
-> Index Scan using idx_orders_customer_created on orders o (cost=0.56..1200.00 rows=500000 width=20) (actual time=0.03..98.45 rows=500000 loops=1)
Index Cond: (created_at >= '2025-01-01'::date)
-> Hash (cost=200.00..200.00 rows=5000 width=30) (actual time=2.45..2.45 rows=4972 loops=1)
-> Index Scan using idx_customers_country on customers c (cost=0.42..200.00 rows=5000 width=30) (actual time=0.02..1.89 rows=4972 loops=1)
Index Cond: (country = 'France'::text)
Result: 29s → 0.15s (200x faster).
When to Analyze
-- After bulk inserts/updates, update statistics
ANALYZE orders;
PostgreSQL’s query planner uses statistics to choose execution plans. Outdated statistics = bad plans = slow queries.
Partitioning Large Tables
For tables with time-series data:
-- Partition by month
CREATE TABLE orders (
id SERIAL,
customer_id INTEGER,
amount DECIMAL,
created_at DATE
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2025_01 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE orders_2025_02 PARTITION OF orders
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
Queries filtering on created_at only scan relevant partitions.
Summary
Slow queries have predictable fixes:
- Find slow queries (
pg_stat_statements) - Understand execution plan (
EXPLAIN ANALYZE) - Add indexes where full scans happen
- Rewrite queries to use indexes
- Update statistics (
ANALYZE)
Most queries can be 10-100x faster with correct indexes.
Need help implementing this in your company?
For delivery-focused missions (Data Engineering, Architecture Data, Data Product Owner), visit ISData Consulting.