Reverse Engineering the Key Lookup Cost Formula

During a recent investigation into Key Lookup costing, I discovered something interesting. Perhaps a warning is due: I’m a nerd, and things I find “interesting” may cure insomnia in others. But, dear reader, if you’re taking time to read a database blog, you just might be a nerd as well.

Examining a key lookup, they appear to have a cost of 0.0001581 CPU and 0.003125 IO. However, running the math over several experiments showed that the per lookup cost decreased with the number of lookups. But how much? What’s the formula?

Gathering Data

I didn’t know if I would need a complicated curve fitting approach, but I did know the more data points the better. I also knew that working through hundreds of queries and manually recording query buck costs would suck and I didn’t want to do it. But how could I automate pulling costs from execution plans of hundreds of queries? Oh. Right. It’s PowerShell time isn’t it?

Step 1 of course had to be SQL setup. I used a thousand-page table with one row per page.

CREATE TABLE Testable (
ID1 INT IDENTITY PRIMARY KEY,
ID2 INT,
filler CHAR(8000)
INDEX IX_testable_ptooie (ID2)
)

INSERT Testable
SELECT TOP(1000)
ROW_NUMBER() OVER(ORDER BY 1/0),
'junk'
FROM sys.columns

Step 2 was getting the estimated execution plan using PowerShell. Step 3 was getting the PowerShell to actually work. Step 4 was using XPath inside a PowerShell script, so you can guess what steps 5-20 were debugging. Blech…XML schemas. Special thanks to Rob Farley and Jonathan Keyhias for sharing useful code.

function Get-SqlPlan([string] $query){
return ([xml](invoke-sqlcmd -MaxCharLength 9999999 -Server "." -Database "TestDB" -Query ";set showplan_xml on;`ngo`n $query;`ngo").Item(0));
}

$results = @{}

for ($i = 1; $i -le 1000; $i++) {

$query = "SELECT *
FROM Testable WITH(INDEX(IX_testable_ptooie))
WHERE ID2 <= $i
OPTION(MAXDOP 1)"

$t = Get-SqlPlan($query)
$ns = new-object 'System.Xml.XmlNamespaceManager' $t.NameTable;		
$ns.AddNamespace("ns", "http://schemas.microsoft.com/sqlserver/2004/07/showplan");
$results.add($i,$t.SelectNodes('//ns:RelOp[@LogicalOp="Clustered Index Seek"]',$ns).EstimatedTotalSubtreeCost)
}

$results.GetEnumerator()| export-csv -NoTypeInformation -Path "C:\Temp\Key Test.csv"

Reverse Engineering

I was left with a CSV full of costing info for a thousand key lookups.

Time for some Excel analysis. (Look Ma, I’m a Data Scientist!) First, I calculated the additional cost of each key lookup. Quick tests confirmed that additional CPU cost was constant, but only IO cost decreased. This makes sense, because as key lookups continue, there’s a chance that the page needed was already pulled into the buffer pool.

Yup, weird results for the second through fourth lookups

The decrease obviously wasn’t linear, so I decided to try an approach from statistics. There’s a class of related problems including buckets and balls, and matching birthdays. In the case of a thousand-page table, the second lookup should occur on a random page with a 999/1000 chance to be different from the first lookup’s page. The third lookup should have 999/1000 chance to be different from the first, and 999/1000 chance to be different from the second, so (999/1000)^2. Generalizing, I’d expect the Nth lookup to have a (999/1000)^(N-1) chance of needing a new page (and thus an IO).

Let’s plug in the formula and see…

Almost…but for some reason, the empirical exponent turned out to be N-2 instead of N-1. I can’t come up with why, but whenever I see weird exponents in statistics the explanation is something something *handwaving* degrees of freedom.

There are still two other quirks. The first is that lookups 2, 3, and 4 don’t follow the pattern – not even close. I have no idea why this is the case. The other quirk shows up when looking at formula versus empirical lookup costs for high lookups.

Marginal IO costs only approximately follow the formula here, with a lot of repeated values. My guess is that doing a heap of math isn’t very efficient when SQL Server is trying to quickly determine a plan, so it uses saved values, approximations, or other tricks.

Math

Anyways, here’s what I learned about Key Lookup costs: the CPU component is always 0.0001581 per lookup. IO costs are capped at 0.003125 times the number of pages in the table. The extra IO cost of the Nth lookup is approximately (see prior caveats) (PageCount-1/PageCount)^(N-2). Or, using my fancy math font program:

fancy program ran out of room to write the part about max page costs…

So there you have it, something about SQL Server with no practical value. Can you believe I spent vacation time doing this? Actually, maybe I can troll some MS folks by saying they got their estimation math wrong. I do believe the people who wrote the estimation (and who are much smarter than I am) had some really good reasons, but hey, why let that get in the way of inflammatory tweeting?

One thought on “Reverse Engineering the Key Lookup Cost Formula”

gordy says:

March 15, 2021 at 9:04 pm

I made a CLR that’s handy for this sort of thing – it just sets showplan_xml, runs the query, and returns the plan in an xml column. I use it to find performance regressions and sometimes to test a thousand variations on some query:

select n.value(‘@EstimatedTotalSubtreeCost’, ‘float’)
from ExecuteSqlProfile(‘select * from testable …’) p
cross apply p.query_plan.nodes(‘//*:RelOp[@LogicalOp=”Clustered Index Seek”]’) x(n)

..hit me up if you’re interested I can send you the code. Great post btw!

Gathering Data

Reverse Engineering

Math

One thought on “Reverse Engineering the Key Lookup Cost Formula”

Leave a Reply Cancel reply