The Curious Case of the Insert Deadlock

My favorite deadlock story actually isn’t about SQL Server, but comes from the American Airlines pilot who taught me math. Slightly embellished, it goes like this.

In the distant past some freshly minted MBAs claimed they could save money by implementing a new scheduling system, and for some reason the company’s leadership listened to them. Several months later this pilot lands at his destination, but his gate is occupied by another plane and the control tower tells him to wait – its pilot is running late. Time passes with no progress, and he gets more and more frustrated about the delay. Determined to chew out the tardy pilot, he radios in again to find out the name of the offender holding him up. The pilot they were waiting on…was him!

This was a deadlock. He couldn’t dock his plane until the other one moved, and the other one couldn’t move until he docked and transferred. SQL Server deadlocks follow the same pattern of two (or more) processes needing the other to progress before they can continue, but only sometimes have angry passengers.

A lot of deadlocks are simple, but I found an unusual one recently – two INSERTs deadlocking – that forced me to dig deeper. As with most things horrible in SQL Server, it began with Entity Framework/LINQ, and was compounded by developer ignorance. As much as I hate LINQ for performance, it *is* a perpetual fountain of dumb, i.e., blog post material.

Here’s my recreation of the underlying table:

CREATE TABLE dbo.DLtest1 (ID INT identity, junk char(4)) 
CREATE CLUSTERED INDEX CX_DLtest1_ID ON dbo.DLtest1(ID)

And here’s what the query looked like:

INSERT dbo.DLtest1 (junk) VALUES ('junk') 

SELECT ID FROM dbo.DLtest1 
WHERE ID = SCOPE_IDENTITY()

Guess what isolation level this was running in? That’s right, the good ol’ EF TransactionScope() special, Serializable. Besides that, note the SELECT clause not stopping at SCOPE_IDENTITY, but actually joining to the table on it. Yeah…

Eagle-eyed readers might have noticed another problem: the clustered index wasn’t defined as unique. Making it unique is actually enough to fix the issue, and what I did in production. Functional code is boring though, so let’s keep investigating the deadlock as is.

The first mystery was the sequence of locks. Running an extended event on the batch, I saw LAST_MODE lock released…but it had never been acquired! Special thanks to Paul White for pointing out that the events for lock_acquired are incorrect. First, LAST_MODE is RX-X, and second (mind-blowingly), SQL Server may request a certain lock, but the lock granted could be different!

A quick aside on range locks in SQL Server. We mostly see these with Serializable isolation level. They have two components: the range and the key it is attached to. The range is the potential space between the key and next lower key. I find it helps to imagine these on a number line. An RX-S lock on 5 would be an X lock on the range between 4 (assuming that’s the next lower value) and 5, with an S lock on the value of 5 itself.

Pretty pictures!

After Paul’s answer (and some XE wrangling), I was finally able to piece together the sequence of locks in an INSERT + SELECT. The INSERT begins with the RI-N lock (the I-lock preventing S and X on the range section, but with no lock on the key). This transitions to an X lock on the key when it is inserted. With the SELECT section, RS-S locks are taken on the key and the next higher key (in this case, usually infinity). But with the existing X lock on the key, the requested RS-S lock is instead granted as RX-X.

Easy, right? Who am I kidding, this took me days of poking and prodding. Here, have an animation:

Time to tackle the deadlock…here’s what I ran in two different sessions to reproduce it:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
SET NOCOUNT ON

DECLARE @stop datetime = dateadd(second,6,GETDATE())
DECLARE @trash int


WHILE @stop > GETDATE()
BEGIN
	BEGIN TRAN
	INSERT dbo.DLtest1 (junk)
	VALUES ('junk')

	SELECT @trash = ID
	FROM dbo.DLtest1
	WHERE ID = SCOPE_IDENTITY()
	COMMIT
END

But there was something weird still going on – many of the deadlock graphs (did I mention this batch produces different types of deadlocks? hurray for LINQ!) had RI-N involved. No matter what I modeled on the whiteboard, I couldn’t see how that was possible.

I finally managed to catch one by using an XE tracking both locks and waits, linking back to the table key by %%lockres%%.

Well, that was fun to read through. I had to map everything out on the whiteboard, and eventually figured out the source of the RI-N deadlock. It all starts while the second session is blocked on its INSERT. The first session releases its locks and continues to its next INSERT, but the value it gets from the cache leapfrogs the second session’s value. Eventually, the second session actually takes a second RI-N lock, holding onto the original RI-N lock all the while!

Blah blah blah, it would take a lot more words to explain. Here, have an animation instead, it’s probably a lot more helpful:

There you have it, another deadlock courtesy of freshly-minted MBAs, err, well-meaning developers. I’m still curious why the RI-N ∞ isn’t released, but perhaps now’s the time to stop and let my readers disembark. Hope you enjoyed the ride!