Who knew cascading deletes could be so horribly entertaining (emphasis on horrible)? The problems I wrote about traced back to cardinality, but I wasn’t ready to write about it. So, uh, I might have spent most of a perfectly good weekend reading Dmitry and Microsoft patents.
The scan plan was coming from a really bad cardinality assumption: given a Parent ID sufficiently outside the stats histogram, *every* Child table row would join to it. But why was I getting such a ridiculous estimate?
Time for an undocumented trace flag (gosh, I love these)! Dmitry’s post introduced me to trace flag 2363, which outputs cardinality estimation info for the new estimator. I tossed in a QUERYTRACEON, and got the following.
Coooooooool! The optimizer considers stats on Parent.ID to be a badly-formed histogram, whatever that is. Moreover, it just gives up, moves to CSelCalcGuessComparisonJoin, and gives selectivity of 1 (aka everything). Nice guess.
What happens when the ID is within range?
Some huge amount of testing later, I’m able to determine that the bad bailout only occurs when 1) Parent stats are “badly-formed” and 2) the ID is out of range. Stats updated with a FULLSCAN aren’t badly-formed, which seems to be why the problem disappears in that circumstance.
Also interesting is what happens in 2014 compatibility with QO Fixes off.
It ejects, but uses CSelCalcSimpleJoin, with a much more reasonable guess.
I kind of have my answer at this point, which that the reason for crappy cardinality is buried deep within the proprietary bowels of SQL Server’s estimator design. Luckily there was a holiday sale on unsatisfying answers, and I got a new question for free.
What the heck is a “badly-formed histogram?”
Spelunking with windbg left me in the dark, so I resorted to empirical testing with stats donating. The best I can tell, a badly-formed histogram only occurs for a column that is unique and has decimal-valued range_rows. In turn, decimal-valued range_rows only seem to occur for large-enough tables with sampled statistics. How’s that for a summary of weeks of investigation?
To be honest, I’m not really happy with the state of my understanding here. My answers are more anecdote than explanation, only looking at a lower layer of turtle. Maybe if I booze up a Microsoft employee enough I can get them to share info outside of NDA…
Anyways, I’m off to read more patents.
I have the same issue with Cascading Deletings and Indexed View updates on SQL Server 2017 (TRM-CU17), compatability 140, Query Optimizer Hotfixes on. I have not seen this issue with select or delete statements though, only something occurring as part of a sequence.
I am having the same problem – I submitted it as a bug to MS: https://feedback.azure.com/forums/908035-sql-server/suggestions/39359128-cascade-deletes-and-indexed-view-updates-causing-f
This has been patched in SQL Server 2017 CU22: https://support.microsoft.com/en-us/help/4577933/fix-cascade-delete-on-key-values-outside-of-leading-table-histogram-bo