Reasoning about Indexed View Locking

Edit: Paul has pointed out some significant gaps, so I will be reworking this post in the future. Locks are accurate for this particular scenario, but apparently there are other behaviors to consider.

Sometimes I hit a weird query where I wonder if, just maybe, an indexed view would help. Then a tingling sensation crawls down my spine and I run screaming in terror, because indexed views have a reputation. Ghastly deadlocks, phantom issues, and sinister edge cases haunt my DBA sensibilities, scaring me away.

But I prefer understanding – I prefer science. So I investigated and experimented, poked and prodded. Great thanks to Paul White and Erik Darling. Erik somewhere spelled out the key idea: one/many-to-many joins in the view cause serializable locks.

My setup script (full script at the bottom) creates two identical tables A and B of even numbers 2 to 10, with an indexed view AB that joins on A>B in a one-to-many relationship.

CREATE OR ALTER VIEW dbo.AB
WITH SCHEMABINDING
AS
SELECT A.ID AS AID, COUNT_BIG(*) AS bigcount
FROM dbo.A
JOIN dbo.B
	ON B.ID < A.ID
GROUP BY A.ID

A modification to a table here has three objects involved, with associated locks: the modified table, the joined table, and the view. First, the locks associated with the original modification are taken, then, the locks on the other table as the engine figures out which rows in the view are affected, and finally the rows in the view.

But that was hard to keep all in my head, so I drew it out. Here are the objects (keys only).

A is the “one” side of the one-to-many, so a modification there involves “normal” locks. But a modification to B, since it can be referenced by multiple rows of A, will involve the super special magical serializable locks.

Let’s look at an insert:

The row inserted into B has an X lock. Next, it will need to search through A to find matching rows of AB, meaning transient S locks in A.

The S locks are released after rows are read, but when the engine arrives at AB, we see range locks (and please note the order of S and Range locks can be interleaved, but I separate them for simplification).

But why range locks? Because the many-ness of the relationship means there are potentially conflicting inserts that wouldn’t be covered by just key locks.

Yikes, no wonder these produce deadlocks! Also, the engine is taking more range locks on the the view than it strictly needs to – probably to prevent complicated programming around edge cases.

Maybe it’s just how my brain works, but drawing these out is what finally made the locking strategy click for me. I don’t consider indexed view locking as mysterious anymore (yay science!), but I do find it monstrous. Be warned!

--Setup
CREATE TABLE A (ID INT PRIMARY KEY, junk CHAR(10))
CREATE TABLE B (ID INT PRIMARY KEY, junk CHAR(10))

INSERT A
SELECT TOP (5) 
(ROW_NUMBER() OVER(ORDER BY 1/0))*2,
'data'
FROM sys.columns

INSERT B
SELECT TOP (5) 
(ROW_NUMBER() OVER(ORDER BY 1/0))*2,
'data'
FROM sys.columns
GO

CREATE OR ALTER VIEW dbo.AB
WITH SCHEMABINDING
AS
SELECT A.ID AS AID, COUNT_BIG(*) AS bigcount
FROM dbo.A
JOIN dbo.B
	ON B.ID < A.ID
GROUP BY A.ID
GO

CREATE UNIQUE CLUSTERED INDEX CX_AIDs ON dbo.AB (AID)
GO

Reverse Engineering the Key Lookup Cost Formula

During a recent investigation into Key Lookup costing, I discovered something interesting. Perhaps a warning is due: I’m a nerd, and things I find “interesting” may cure insomnia in others. But, dear reader, if you’re taking time to read a database blog, you just might be a nerd as well.

Examining a key lookup, they appear to have a cost of 0.0001581 CPU and 0.003125 IO. However, running the math over several experiments showed that the per lookup cost decreased with the number of lookups. But how much? What’s the formula?

Gathering Data

I didn’t know if I would need a complicated curve fitting approach, but I did know the more data points the better. I also knew that working through hundreds of queries and manually recording query buck costs would suck and I didn’t want to do it. But how could I automate pulling costs from execution plans of hundreds of queries? Oh. Right. It’s PowerShell time isn’t it?

Step 1 of course had to be SQL setup. I used a thousand-page table with one row per page.

CREATE TABLE Testable (
ID1 INT IDENTITY PRIMARY KEY,
ID2 INT,
filler CHAR(8000)
INDEX IX_testable_ptooie (ID2)
)

INSERT Testable
SELECT TOP(1000)
ROW_NUMBER() OVER(ORDER BY 1/0),
'junk'
FROM sys.columns

Step 2 was getting the estimated execution plan using PowerShell. Step 3 was getting the PowerShell to actually work. Step 4 was using XPath inside a PowerShell script, so you can guess what steps 5-20 were debugging. Blech…XML schemas. Special thanks to Rob Farley and Jonathan Keyhias for sharing useful code.

function Get-SqlPlan([string] $query){
return ([xml](invoke-sqlcmd -MaxCharLength 9999999 -Server "." -Database "TestDB" -Query ";set showplan_xml on;`ngo`n $query;`ngo").Item(0));
}

$results = @{}

for ($i = 1; $i -le 1000; $i++) {

$query = "SELECT *
FROM Testable WITH(INDEX(IX_testable_ptooie))
WHERE ID2 <= $i
OPTION(MAXDOP 1)"

$t = Get-SqlPlan($query)
$ns = new-object 'System.Xml.XmlNamespaceManager' $t.NameTable;		
$ns.AddNamespace("ns", "http://schemas.microsoft.com/sqlserver/2004/07/showplan");
$results.add($i,$t.SelectNodes('//ns:RelOp[@LogicalOp="Clustered Index Seek"]',$ns).EstimatedTotalSubtreeCost)
}

$results.GetEnumerator()| export-csv -NoTypeInformation -Path "C:\Temp\Key Test.csv" 

Reverse Engineering

I was left with a CSV full of costing info for a thousand key lookups.

Time for some Excel analysis. (Look Ma, I’m a Data Scientist!) First, I calculated the additional cost of each key lookup. Quick tests confirmed that additional CPU cost was constant, but only IO cost decreased. This makes sense, because as key lookups continue, there’s a chance that the page needed was already pulled into the buffer pool.

Yup, weird results for the second through fourth lookups

The decrease obviously wasn’t linear, so I decided to try an approach from statistics. There’s a class of related problems including buckets and balls, and matching birthdays. In the case of a thousand-page table, the second lookup should occur on a random page with a 999/1000 chance to be different from the first lookup’s page. The third lookup should have 999/1000 chance to be different from the first, and 999/1000 chance to be different from the second, so (999/1000)^2. Generalizing, I’d expect the Nth lookup to have a (999/1000)^(N-1) chance of needing a new page (and thus an IO).

Let’s plug in the formula and see…

Almost…but for some reason, the empirical exponent turned out to be N-2 instead of N-1. I can’t come up with why, but whenever I see weird exponents in statistics the explanation is something something *handwaving* degrees of freedom.

There are still two other quirks. The first is that lookups 2, 3, and 4 don’t follow the pattern – not even close. I have no idea why this is the case. The other quirk shows up when looking at formula versus empirical lookup costs for high lookups.

Marginal IO costs only approximately follow the formula here, with a lot of repeated values. My guess is that doing a heap of math isn’t very efficient when SQL Server is trying to quickly determine a plan, so it uses saved values, approximations, or other tricks.

Math

Anyways, here’s what I learned about Key Lookup costs: the CPU component is always 0.0001581 per lookup. IO costs are capped at 0.003125 times the number of pages in the table. The extra IO cost of the Nth lookup is approximately (see prior caveats) (PageCount-1/PageCount)^(N-2). Or, using my fancy math font program:

fancy program ran out of room to write the part about max page costs…

So there you have it, something about SQL Server with no practical value. Can you believe I spent vacation time doing this? Actually, maybe I can troll some MS folks by saying they got their estimation math wrong. I do believe the people who wrote the estimation (and who are much smarter than I am) had some really good reasons, but hey, why let that get in the way of inflammatory tweeting?