What changed
Back in the early days of computers, people had to write out every step the computer took. It was like giving a chef a recipe that told them exactly how many times to stir the pot. But in the late 1970s, a researcher named Patricia Selinger and her team changed everything. They figured out that the computer could actually figure out the best path on its own. Instead of following a rigid script, the database could look at several different 'recipes' for the same request and pick the one that cost the least.
The Art of the Pushdown
One of the smartest tricks a database uses is called a 'predicate pushdown.' Imagine you are making a salad and you only want the red tomatoes. Would you chop up every single tomato first and then throw away the green ones? Of course not. You would pick out the red ones first and only chop those. In a database, a 'predicate' is just a filter, like 'where the price is over fifty dollars.' A pushdown means the database applies that filter as early as possible. It avoids loading data it knows it won't need. This saves a massive amount of work for the hard drive and the CPU. It seems simple, but when you are dealing with billions of rows, these little shortcuts are what keep systems from crashing. Have you ever tried to find one specific email in a folder with ten thousand messages? You'd probably search for the sender first, right? That is basically a pushdown in action.
Counting the Cost
How does the database know which plan is 'cheapest'? It doesn't use money. It uses a 'cost model.' This is a mathematical way of guessing how much time a task will take. It looks at I/O, which is how many times it has to read from the disk, and CPU cycles, which is how much brainpower the computer needs. The goal is always to minimize these two things. The engine builds a 'query graph,' which is like a flowchart of the request. It looks at all the different ways to join tables together. It's a bit like a puzzle. If you have three tables—A, B, and C—you could join A and B first, then add C. Or you could join B and C first, then add A. The results are the same, but one way might be a thousand times faster depending on how much data is in each one. The optimizer runs the numbers and picks the winner.
The Accuracy Problem
Even the best chef can fail if they don't know how many guests are coming. This is the biggest challenge in query optimization: cardinality estimation. This is just a fancy way of saying 'guessing how many rows will come back.' If the database thinks only ten people are coming to the party, it might use a small, fast method. If ten thousand show up instead, that method will fail miserably. This is why database experts spend so much time looking at statistics. They want to make sure the engine has a good idea of what the data looks like. They look at things like histograms, which show how the data is spread out. If the data is weird—like if almost everyone in your database lives in the same town—the optimizer needs to know that so it doesn't make a bad guess. In the end, query optimization is a blend of math, logic, and a bit of guesswork. It is a fascinating field that stays hidden from most people, but it is what makes our high-speed world possible.