Do Spills Get Logged?

Ever see a SELECT query with WRITELOG waits? I did, and exclaimed, “Oooh, interesting!” to the annoyance of all my cube neighbors. There was no data changing to require logging, so where was it coming from? After a little thought, I hypothesized that a spill to tempdb generates log activity.

But can I prove it?

The first ingredient for this experiment is a spill, and probably the most common operator to spill would be the sort, so let’s build a table to sort:

Ok, now we need a query that will spill. A spill occurs when SQL Server doesn’t provide enough memory to an operator, and it has to use space in tempdb to complete the operation. So we need a way to get SQL Server to under-provision memory. The most common way this occurs in the wild is when the actual number of rows passed to a sort is far higher than what the optimizer estimated. But this is SCIENCE[!] so let’s screw with the memory grant instead.

Now we need to read the log generated. Start with a CHECKPOINT command. Since tempdb is in simple recovery (and there are no other ongoing transactions because this is a test instance…right?), this will clear the logs. Then run the spilling query (I use a trash variable to prevent displaying results) and follow it up with the handy-dandy undocumented fn_dblog().

 

QED

I see a lot of activity on PFS, GAM, and IAM pages that makes me think this is minimally efficiently logged – only the allocations without the content of the spill. It’s even possible to find one of these sort pages by checking allocated pages mentioned in the description column. You can go digging for one yourself, if it’s the kind of thing to make you exclaim, “Oooh, interesting!”

The Longest Row

It’s hard to ask a question about SQL Server to which the answer isn’t, “It depends.” Unless of course you’re asking if you should use Auto-Shrink (the answer is NO! and an angry glare).  Edge cases are everywhere, and hidden gotchas lurk around the corner even for well-known features.

But what about the maximum size of a row on page? Microsoft and even error messages say that you can’t go past 8060 bytes. Surely we can trust that this is an accurate limit?

Hold my beer.

Start with a basic setup, then try to go past 8060 bytes.

Yup, there’s the error message.

Right-size a column to get exactly 8060 bytes. We can even use DBCC PAGE to confirm its length.

Make a couple changes, add another row…

And now we have 8082 bytes!

Once you stop applauding this incredible feat, you might stop to ask how I managed it. What’s going on is combination of two tricks. The first has to do with enabling Snapshot Isolation. Snapshot is an isolation level that allows users to view data consistent to a point in time without locking the row. SQL Server manages this by storing old versions of the row in tempdb, and keeping a pointer to those versions in a 14-byte versioning tag added to the row.

The last 8 bytes come from the clustered index I added. Notice how there’s a clustered index, but no primary key? It’s how I was able to add two rows with the same ID of “1”. SQL Server has to uniquely identify rows though (it’s how it points back to the source row from non-clustered indexes), so it adds a “uniquefier” taking up a total of 8 bytes to the second row.

Like any good Hold My Beer stunt, this one ends with a big error. Watch what happens if you try to update that second row.

I wouldn’t be surprised if there’s a way to push this limit even higher. Spend some time investigating row length for yourself, and see if you can break my record of 8082 bytes. Then, when someone asks how you maintain your grand DBA physique, just say “competitive rowing.”