What changed
- From Rules to Cost:Early databases followed simple rules. Modern ones use 'cost-based' models that look at actual data sizes.
- Smart Indexing:We moved beyond simple lists to complex structures like B-trees and bitmap indexes that speed up searches.
- Better Guesses:Computers have become much better at 'cardinality estimation,' which means they are better at guessing the size of a result set.
- Parallel Processing:Databases can now split a single query into parts and run them at the same time.
The Join Ordering Problem
One of the hardest things a database has to do is decide the order in which to 'join' tables together. If you are joining three tables—Users, Orders, and Products—the order matters a lot. If you join Users and Orders first, you might end up with a million rows. But if you filter the Products first, you might only have ten rows to deal with. This is called 'join ordering dependencies.' The database tries thousands of combinations in a split second to find the one that uses the least memory. It is a bit like playing a high-speed game of Tetris where the pieces are millions of rows of data.The Statistics Trap
To make these choices, the database relies on statistics. It keeps a small summary of what the data looks like—how many unique values there are, or how the numbers are spread out. But here is the catch: if the statistics are old, the database makes bad choices. Imagine using a map from 1950 to drive through a modern city. You are going to get lost. In the database world, 'lost' means a query that should take one second takes ten minutes. This is why keeping 'statistical estimator accuracy' high is a major part of a database admin's job. They have to make sure the computer always has a fresh map of the data.The Legacy of Selinger
'The goal of the optimizer is to find a plan that minimizes the total cost of the query, where cost is a combination of I/O, CPU, and other resources.'This idea comes from the 1970s, specifically from the work of P.G. Selinger at IBM. Her work laid the foundation for how every modern SQL engine works. She introduced the idea of 'dynamic programming' for query optimization. This means the computer breaks a big problem into smaller pieces, solves those, and remembers the answers. It is why your favorite apps can show you your notifications, your friends' posts, and your local weather all at once without crashing.