I had an interesting conversation with a developer the other day – it went something like this:
Developer: “Why does it take three minutes to delete a single row from this table?”
Me: “Hold on while I look at the plan. Oh, it’s because SQL Server is checking the 460 foreign keys that reference that table, 447 of which don’t have a supporting index.”
I know the pains of foreign keys without indexes. Oh, do I know. So when I saw a cascading single-row delete plan with a scan on the child table, I thought, “Aha! I know how to fix this, an index!”
But the index was already there, a perfect one for supporting that foreign key with the abominable ON DELETE CASCADE . Something was obviously wrong, and I immediately suspected our recent switch to 2016 compatibility level – meaning a new cardinality estimator.
Sure enough, in 2012 mode the query optimizer produced a seek.
I was able to come up with a repro of the issue (which is what the above plans are from).
--set up two tables, one with a foreign key with cascading delete DROP TABLE IF EXISTS dbo.Child DROP TABLE IF EXISTS dbo.Parent GO CREATE TABLE dbo.Parent ( ID INT IDENTITY PRIMARY KEY CLUSTERED, junk CHAR(1) ) INSERT dbo.Parent SELECT TOP 1000000 'a' FROM master..spt_values a CROSS JOIN master..spt_values b CREATE TABLE dbo.Child ( ID INT IDENTITY PRIMARY KEY CLUSTERED, morejunk CHAR(5), ParentID INT ) ALTER TABLE dbo.Child ADD CONSTRAINT [💩] FOREIGN KEY (ParentID) REFERENCES dbo.Parent (ID) ON DELETE CASCADE ON UPDATE CASCADE CREATE INDEX IX_agggghhhhhh ON dbo.Child(ParentID) INSERT dbo.Child SELECT TOP 1100000 'bah!', 42 --repro also works with random numbers for FK column FROM master..spt_values a CROSS JOIN master..spt_values b GO ----------------------------------------------------------------- --Just need estimated plan here. Does it scan on Child? DELETE dbo.Parent WHERE ID = -1 --any value here too low or too high will cause a scan, often including values that do exist in Parent!
The short of it is, with any value too far outside of the histogram of the parent table, the new cardinality estimator assumes that EVERY SINGLE ROW of the child table will get deleted.
This can be seen even when a seek is forced through a USE PLAN hint.
Interestingly enough, running the new cardinality estimator in 2014 compatibility level with Query Optimizer Fixes off will also produce a seek plan. I found a few other ways to get the good plan, and stuck them in a bonus script.
DELETE dbo.Parent WHERE ID = -1 OPTION( USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION') ) DELETE dbo.Parent WHERE ID = -1 OPTION( USE HINT('ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS') ) --legacy CE with base containment (TF2301) still seeks though --Repro is dependent on statistics SELECT * FROM sys.dm_db_stats_properties_internal(OBJECT_ID('dbo.Parent'),1) SELECT * FROM sys.dm_db_stats_histogram(OBJECT_ID('dbo.Parent'),1) SELECT * FROM sys.dm_db_stats_histogram(OBJECT_ID('dbo.Child'),2) UPDATE STATISTICS dbo.Parent WITH SAMPLE 5 PERCENT --allows problem UPDATE STATISTICS dbo.Parent WITH FULLSCAN --changes histogram and prevents problem UPDATE STATISTICS dbo.Parent WITH SAMPLE 99 PERCENT --still allows problem
There’s something I don’t understand happening with the cardinality estimator here (as is usual). It seems when the value being deleted from the Parent table is within its histogram, the optimizer uses stats from the Child. Otherwise, it assumes the entire Child table matches. Ouch.
Remember how my demo only has two tables? The real scenario I encountered was a cascading delete hitting forty tables, and, you guessed it, scanning each one. Block party!
Forcing plans with the help of Query Store only goes so far, especially when it’s playing whack-a-mole with ORM-generated SQL. The best solution for now might be to retreat to 2012 compatibility and complain on the internet. Bah!
Great Post and great timing, because we are about to switch to compatibility level 2016.
But I am not able to reproduce the issue with your example.
– SQL Server 2016
– Compat lvl 130
– Legacy Cardinality Estimation OFF
– Query Optimizer Fixes OFF
I always get a seek, never a scan. Am I missing something?
Bonus Question:
We have a lot of tables using uniqueidentifier columns as primary key / clustered index (I know). I am worried that a delete ‘random guid’ on a parent table will very likely be outside the histogram and cause the described issue.
Am I right in assuming that guid-columns could be affected by this a lot more?
– Daniel L
I only see this scan plan with sufficiently high instance memory – 52GB+ or so. The extra memory means SQL Server gives the optimizer a larger search budget, and this is when it finds the scan/merge plan. Cardinality is messed up either way though. I’m still investigating what’s going on in my spare time.
Bonus answer:
My guess would actually be the opposite. For my situation in prod, the identity-column values needing to be deleted are usually recent (and therefore somehow far enough outside the histogram to choke the estimator). With random GUIDs and a large table, I think it becomes less and less likely that a recent value will be higher than max. I don’t know though…this will make for an interesting demo!
Recently noticed this problem in SQL Server 2017. End up turning on the legacy cardinality estimator. Until microsoft comes up with a fix for it. It is a problem with cardinality estimator when you upgrade.
I stumbled into the very same problem on the following setup.
– SQL Server 2019 CU 17
– Compat lvl 150
– Legacy Cardinality Estimation OFF
– Query Optimizer Fixes OFF
This delete is currently the slowest operation in the application and really drives me crazy.
Instead of switching to a legacy cardinality estimator, I’ll try the “turn off cascade delete and manually send all commands from the bottom up inside a transaction”-route and see if that performs better.
Child row count is no that bad, so I don’t expect to hit table locks anywhere.