Gotta Cache ‘Em All: Query Store Survey

I have a friend who’s really into Pokemon. Really into Pokemon. I had no idea there was such a thing as a “shiny” pokemon until he, uh, graciously spent a whole evening lecturing me on their value and capture.

It’s a passion I don’t really get, but have to tolerate, maybe even respect. It turns out I’m the same, just with different monsters. You see, I really like plan operators. I like knowing their behavior, I like learning their natural habitat, and I love hunting for exotics. If I could find a way to stuff and mount Flow Distinct on my wall, I probably would.

There a handful of ways to see what operators roam your production environment. If you’re lucky enough to be working with 2016 or later
(I’ve seen the stats though, condolences for those of you still on 2005), you can peek into Query Store. Actually, with how long the query takes to run, it might be more of an expedition than a peek. Caveat executor. 

Here’s the script – again, mind that this can be an expensive query.

The most common operator for me, at a whopping quarter-million occurrences, is the Clustered Index Seek. That’s probably a good thing, since I do OLTP. I still see plenty of scans, because developers, and a decent range of not-rare-but-not-common operators, like the Table Spool (hmm, I should probably look into that query…).

Then there’s the esoteric bottom of the list, where operators are only seen a couple of places instead of a quarter-million. One of these was new to me: Inserted Scan, which comes from the Inserted pseudo-table in a trigger. I have a couple oddities like Parameter Table Scan and Hash Match for a Full Outer Join.

Of course, hiding down there are three shiny Flow Distincts. Time for me to hunt down a repro!

Cascading Deletes – Cardinality

Who knew cascading deletes could be so horribly entertaining (emphasis on horrible)? The problems I wrote about traced back to cardinality, but I wasn’t ready to write about it. So, uh, I might have spent most of a perfectly good weekend reading Dmitry and Microsoft patents.

The scan plan was coming from a really bad cardinality assumption: given a Parent ID sufficiently outside the stats histogram, *every* Child table row would join to it. But why was I getting such a ridiculous estimate?

Time for an undocumented trace flag (gosh, I love these)! Dmitry’s post introduced me to trace flag 2363, which outputs cardinality estimation info for the new estimator. I tossed in a QUERYTRACEON, and got the following.

Coooooooool! The optimizer considers stats on Parent.ID to be a badly-formed histogram, whatever that is. Moreover, it just gives up, moves to CSelCalcGuessComparisonJoin, and gives selectivity of 1 (aka everything). Nice guess.

What happens when the ID is within range?

Some huge amount of testing later, I’m able to determine that the bad bailout only occurs when 1) Parent stats are “badly-formed” and 2) the ID is out of range. Stats updated with a FULLSCAN aren’t badly-formed, which seems to be why the problem disappears in that circumstance.

Also interesting is what happens in 2014 compatibility with QO Fixes off.

It ejects, but uses CSelCalcSimpleJoin, with a much more reasonable guess.

I kind of have my answer at this point, which that the reason for crappy cardinality is buried deep within the proprietary bowels of SQL Server’s estimator design. Luckily there was a holiday sale on unsatisfying answers, and I got a new question for free.

What the heck is a “badly-formed histogram?”

Spelunking with windbg left me in the dark, so I resorted to empirical testing with stats donating. The best I can tell, a badly-formed histogram only occurs for a column that is unique and has decimal-valued range_rows. In turn, decimal-valued range_rows only seem to occur for large-enough tables with sampled statistics. How’s that for a summary of weeks of investigation?

To be honest, I’m not really happy with the state of my understanding here. My answers are more anecdote than explanation, only looking at a lower layer of turtle. Maybe if I booze up a Microsoft employee enough I can get them to share info outside of NDA…

Anyways, I’m off to read more patents.

Cascading Delete Troubles

I had an interesting conversation with a developer the other day – it went something like this:

Developer: “Why does it take three minutes to delete a single row from this table?”

Me: “Hold on while I look at the plan. Oh, it’s because SQL Server is checking the 460 foreign keys that reference that table, 447 of which don’t have a supporting index.”

I know the pains of foreign keys without indexes. Oh, do I know. So when I saw a cascading single-row delete plan with a scan on the child table, I thought, “Aha! I know how to fix this, an index!”

But the index was already there, a perfect one for supporting that foreign key with the abominable ON DELETE CASCADE . Something was obviously wrong, and I immediately suspected our recent switch to 2016 compatibility level – meaning a new cardinality estimator.

Sure enough, in 2012 mode the query optimizer produced a seek.

I was able to come up with a repro of the issue (which is what the above plans are from).

The short of it is, with any value too far outside of the histogram of the parent table, the new cardinality estimator assumes that EVERY SINGLE ROW of the child table will get deleted.

This can be seen even when a seek is forced through a USE PLAN hint.

Interestingly enough, running the new cardinality estimator in 2014 compatibility level with Query Optimizer Fixes off will also produce a seek plan. I found a few other ways to get the good plan, and stuck them in a bonus script.

There’s something I don’t understand happening with the cardinality estimator here (as is usual). It seems when the value being deleted from the Parent table is within its histogram, the optimizer uses stats from the Child. Otherwise, it assumes the entire Child table matches. Ouch.

Remember how my demo only has two tables? The real scenario I encountered was a cascading delete hitting forty tables, and, you guessed it, scanning each one. Block party!

Forcing plans with the help of Query Store only goes so far, especially when it’s playing whack-a-mole with ORM-generated SQL. The best solution for now might be to retreat to 2012 compatibility and complain on the internet. Bah!

Debugging for Trace Flags

Good thing I have a test machine, because I just crashed SQL Server again. Prepare yourselves for (yet another) post about doing horrible things to the database for fun and profit more fun.

Obligatory “blah blah serious warning blah blah DON’T DO THIS IN PRODUCTION blah blah or you’ll go blind blah blah.” Ok, let’s get started! I have a documented obsession with undocumented trace flags and have long wanted to use the debugger to discover new ones. I just didn’t comprehend what I saw in windbg until I learned about registers and how a program actually runs. Thanks, Crash Course Computing!

Armed with an incomplete understanding of fundamentals and a googled list of windbg commands, I hypothesized that if I turned on some trace flags for a query, they would eventually be passed through one of the registers. From there, I could work out the mechanisms involved and maybe even how to find new flags!

Running a query with a ridiculously long compile time let me capture optimization call stacks with PerfView. The common root of those seemed to be sqllang!CSQLSource::Transform, so I set a breakpoint on it. F11 let me step into the various classes [I don’t even know if I’m using the right terms here] called, and I eventually hit a lucky break.

Behold, sqldk!CSessionTraceFlags::CheckSessionTraceInternal

Clues just don’t get more obvious than that.

Setting a breakpoint on that function [module? class?] revealed that it gets called a lot. I kept F11-ing through the function while watching the registers. I had turned on several trace flags and was looking for their hex forms, but didn’t see them. Eventually, I ran across a call to CheckSessionTraceInternal originating from sqllang!CExecLvlIntCtxt::FDontExpandPersistedCC. “Aha!” I thought (and might have said out loud) “I know what trace flag prevents expanding persisted computed columns.” A glance at my favorite list of trace flags confirmed what I remembered; I was looking at TF 176. The windows calculator has a handy hex converter in its “Progammer” setting, and I quickly determined I would be looking for “b0” in a register.


The pattern soon became obvious. CheckSessionTraceInternal is called whenever a function checks whether a trace flag is turned on. Luckily, when CheckSessionTraceInternal is called, the value of the trace flag it’s checking is passed through register rdx (I think technically edx, but don’t ask me the difference yet) and then stored in r10.

I spent a dozen-plus hours hitting breakpoints on CheckSessionTraceInternal, and made a long list of interesting flags along with the function [method?] checking the flag. I even saw my new friend 8672.

Well, that was the long rambling version of how I discovered this. TLDR is that a breakpoint on sqldk!CSessionTraceFlags ::CheckSessionTraceInternal exposes the trace flag being checked in a register. And of course, Paul White found it first.

Too Many TempDB Datafiles?

Maybe you’ve seen the Microsoft recommendations on tempdb, that multiple datafiles help reduce contention. They don’t simply say 1 file per CPU though, but to stop at 8 and slowly increase from there. Of course, my curiosity was piqued. Why increase slowly and carefully? Can we run into problems with too many datafiles? How many is too many?

Since there’s no official Microsoft guidance on breaking tempdb (geez, guys…), I looked to the wisdom of an ancient meme, and modified tempdb to have over 9000 datafiles.

That should be enough.

40 minutes later, I’m ready to see if anything breaks.

The obvious starting point is creating a temp table, and sure enough, it’s slower than before, about 30X slower.

Also interesting is that scans are really slow. Realllllllly slow.

Even though there’s obviously a problem here, nothing shows up in the wait stats. (Below is a simple differential script I use).

What’s SQL Server burning CPU on? To figure this out I turn to PerfView (ht Josh).

GetNextScanIAM? The engine appears to be spending most of its time traversing the IAM chain, which means it’s using an Allocation Order Scan. In short, SQL Server has multiple access methods. An Allocation Order Scan is only available under certain conditions, which which my query on a large local temp table apparently qualifies for. The pages are located from the mapping pages (IAMs), instead of going page-to-page along the leaf level. In normal circumstances, I find an allocation order scan to be faster, but here, it chokes, because there are 5k+ IAMs.

I want to test my hypothesis though. I know that allocation order scans don’t occur when there’s an ORDER BY, so what happens when I add that?

Wow. From 22387ms of CPU down to 60.

The fun doesn’t stop here of course (by “fun” I mean the horrible performance problems you might expect from having thousands of datafiles).

It turns out MAX-length variables start having issues after a certain size, since they can be stored in tempdb. Again, the wait stats are silent, but PerfView makes me suspect these variables are written to tempdb as part of a LOB.

The version store (used for optimistic isolation levels) lives in tempdb, so I thought I’d test it out. I crafted a decent workload, confirmed it added significantly to the version store, and then watched it have no problems whatsoever. In fact, it appears to be faster! Oh well…I did learn that the version store usually (for my test) doesn’t cause physical writes to tempdb, but just hangs out in memory until cleaned up.

Spills get written to tempdb, but I didn’t see any performance issues there either, well, other than the fact that there’s a spill happening.

Oh hey, remember how SQL Server rebuilds tempdb every time it restarts? And that I mentioned creating over 9000 files took 40 minutes? You can guess what happens next. Though the SQL Server service restarted within seconds, logins failed for a half hour while tempdb was rebuilding. Fun!

Even a developer would have recognized (probably) that adding thousands of datafiles is a Bad Idea. But this is a Hold My Beer demo, and ended up teaching me a few things about SQL Server. I hope it helped you too!

Powershell to Search for Trace Flags

I think of trace flags like buttons in a Bond car. Find the right one and you have a missile launcher. Press a different one and you’re now frantically trying to stop the self-destruct mechanism. Still, I don’t think I’d be able to to resist pushing buttons, and that’s what I get to do with SQL Server.

There are some really interesting trace flags that expose the internal workings of SQL Server. And there are many more that change the internal workings. Although nobody’s found the self-destruct flag yet, there are some that are pretty close.

So of course, I’m going to try every trace flag.

I’m doing this on a test server of course, one that I don’t mind crashing and burning. My goal is to find flags that change output to the message tab. If you want to see an impressive search method that looks for plan-affecting flags, my SQL senpai Joe Obbish has a post here.

Obviously, testing 11k+ flags one at a time in SSMS is not performant, so I needed a way to grab message output programmatically. Stealing code from here and here, I hacked together a powershell script that I fully expect someone to critique. Even I, a DBA, know that global variables smell funny.

There’s a really cool trace flag that adds info on hash joins to the message output, so maybe there’s something that adds info on spills? I’ll use my spilling query (slightly modified) from a previous post to check.

Alas, though a number of interesting trace flags show up, none of them seem to include spill details. :'(

Some trace flags modify the output of other trace flags. I’m really interested in 8615, which shows the final memo. Let’s see if anything modifies its output (I get the length of the plain 8615 message, then see if any flag changes the length of message output).

Success! I get a number of known trace flags, but also 8672, which I can’t find good info on. That one’s going in the backlog of things to blog about (a backblog?).

I have quite a few variations I want to try with this method, as well as potential improvements. Using DBCC TRACEON instead of QUERYTRACEON seems an obvious one, along with only getting estimated plans (when applicable) to save time.

Still looking for the missile launcher though.

Do Spills Get Logged?

Ever see a SELECT query with WRITELOG waits? I did, and exclaimed, “Oooh, interesting!” to the annoyance of all my cube neighbors. There was no data changing to require logging, so where was it coming from? After a little thought, I hypothesized that a spill to tempdb generates log activity.

But can I prove it?

The first ingredient for this experiment is a spill, and probably the most common operator to spill would be the sort, so let’s build a table to sort:

Ok, now we need a query that will spill. A spill occurs when SQL Server doesn’t provide enough memory to an operator, and it has to use space in tempdb to complete the operation. So we need a way to get SQL Server to under-provision memory. The most common way this occurs in the wild is when the actual number of rows passed to a sort is far higher than what the optimizer estimated. But this is SCIENCE[!] so let’s screw with the memory grant instead.

Now we need to read the log generated. Start with a CHECKPOINT command. Since tempdb is in simple recovery (and there are no other ongoing transactions because this is a test instance…right?), this will clear the logs. Then run the spilling query (I use a trash variable to prevent displaying results) and follow it up with the handy-dandy undocumented fn_dblog().



I see a lot of activity on PFS, GAM, and IAM pages that makes me think this is minimally efficiently logged – only the allocations without the content of the spill. It’s even possible to find one of these sort pages by checking allocated pages mentioned in the description column. You can go digging for one yourself, if it’s the kind of thing to make you exclaim, “Oooh, interesting!”

The Longest Row

It’s hard to ask a question about SQL Server to which the answer isn’t, “It depends.” Unless of course you’re asking if you should use Auto-Shrink (the answer is NO! and an angry glare).  Edge cases are everywhere, and hidden gotchas lurk around the corner even for well-known features.

But what about the maximum size of a row on page? Microsoft and even error messages say that you can’t go past 8060 bytes. Surely we can trust that this is an accurate limit?

Hold my beer.

Start with a basic setup, then try to go past 8060 bytes.

Yup, there’s the error message.

Right-size a column to get exactly 8060 bytes. We can even use DBCC PAGE to confirm its length.

Make a couple changes, add another row…

And now we have 8082 bytes!

Once you stop applauding this incredible feat, you might stop to ask how I managed it. What’s going on is combination of two tricks. The first has to do with enabling Snapshot Isolation. Snapshot is an isolation level that allows users to view data consistent to a point in time without locking the row. SQL Server manages this by storing old versions of the row in tempdb, and keeping a pointer to those versions in a 14-byte versioning tag added to the row.

The last 8 bytes come from the clustered index I added. Notice how there’s a clustered index, but no primary key? It’s how I was able to add two rows with the same ID of “1”. SQL Server has to uniquely identify rows though (it’s how it points back to the source row from non-clustered indexes), so it adds a “uniquefier” taking up a total of 8 bytes to the second row.

Like any good Hold My Beer stunt, this one ends with a big error. Watch what happens if you try to update that second row.

I wouldn’t be surprised if there’s a way to push this limit even higher. Spend some time investigating row length for yourself, and see if you can break my record of 8082 bytes. Then, when someone asks how you maintain your grand DBA physique, just say “competitive rowing.”

USE PLAN and Compile Time

Joe Obbish shared an especially interesting problem with me recently. He had a query on a table with a single column, but introducing a USE PLAN hint would cause a compilation time of 40+ seconds. Here’s a simplified repro of his issue in case you want to stop reading here and work on this puzzle yourself.


On my 2016 test instance, I see a compilation time of 4 ms without the USE PLAN hint, and 3708 ms with it. Adding more to the SELECT statement quickly increases the time taken, but only with USE PLAN. What’s going on?

Before diving in, a quick note. While I try to use simple explanations, I’m also going to get down and nerdy here, using some technical terms and undocumented trace flags. Pretty much everything I know comes from Paul the Magnificent’s series here, so if you don’t know what a “memo group” or “input tree” is and actually want to learn, start there.

The first clue comes from Microsoft, who says that “You can only force plans with USE PLAN that can be produced by the optimizer’s normal search strategy.” So, apparently the optimizer is used to validate the hint. Let’s prove this.

Start with a basic setup and query that would be trivial (a query so simple that SQL Server says “Screw it, I’m not optimizing that. Have an obvious plan.”)

Add a USE PLAN hint and it gets a full optimization! We can even use a trace flag like 8619 to see that SQL Server really is doing the optimization work it bypassed earlier.

OK, so if USE PLAN needs the optimizer for validation, does it at least change the starting point of the search that the optimizer does? Nope!

Using the example of changing table order from here with 8605 and 8608, we can see that the input tree and initial memo remain the same. (Again, these are undocumented trace flags, which means everyone gets to point and laugh at you for causing a production outage if you use them there.)

Alright, so the optimizer always runs to validate the hint, but why does it take so much longer for Joe’s query?

That last bit of information comes from trace flag 8675, which exposes information about phases. Basically, SQL Server has to make tradeoffs between finding an excellent plan and actually getting it to you before the eventual heat death of the universe. Because of this, it does cost-based optimization in three phases. Each phase adds new options for SQL Server to test out, and it only goes to the next phase if the query looks expensive and complicated enough.

Here’s a query with no hint.

And here’s the same query with USE PLAN.

You can see this occur for any plan with a USE PLAN hint, even one that would be otherwise Trivial. They go straight into Search(2) optimization, the most comprehensive search space. This makes sense, because the goal is to check that the plan you supply is at all possible, even if it takes some strange rules to get there.

So what does the optimizer spend so much time on in Joe’s query? It’s testing hundreds of thousands of different ways to order the case statements.

How to Use the USE PLAN Hint

There’s a lot of good SQL Server information out there, so much so that I was very surprised I couldn’t find a good write-up of using the USE PLAN hint. I even checked the second page of Google results! So, in hopes of being the first to write a guide, or more likely to join the other dozen bloggers who wrote about this but didn’t bother with SEO, I present to you How to Use the USE PLAN Hint.

Say you have a simple query: you want to join two single-row heaps on their ID columns (I was going to say something about this being an unrealistic demo, then I realized that you probably work with developers too).


Simple Query:

As you can see, this produces a hash join, even with such a low rowcount.

“But wait,” you say, “I can use the exotic USE PLAN hint to force a nested loop join! This kind of skill will finally demonstrate to my boss that I’m ready to be a Senior DBA!”

So here’s what you do: add the LOOP hint and get the estimated plan XML from right-clicking the plan. (Actual plan XML will work too).

Add the following to the end of your query, and paste the XML inside the single quotes. Mind the whitespace though, a linebreak at the beginning causes an error.

Voila! You have a loop join!

You can even manually edit the XML to switch the order of the table join!

There are a number of important caveats and gotchas with USE PLAN, but who cares? Go impress your boss!