Do Spills Get Logged?

Ever see a SELECT query with WRITELOG waits? I did, and exclaimed, “Oooh, interesting!” to the annoyance of all my cube neighbors. There was no data changing to require logging, so where was it coming from? After a little thought, I hypothesized that a spill to tempdb generates log activity.

But can I prove it?

The first ingredient for this experiment is a spill, and probably the most common operator to spill would be the sort, so let’s build a table to sort:

DROP TABLE IF EXISTS #Valdez
CREATE TABLE #Valdez (
ID INT IDENTITY PRIMARY KEY,
gooey BINARY(16)
)

INSERT #Valdez
SELECT TOP 10000 NEWID()
FROM master..spt_values a
CROSS JOIN master..spt_values b

Ok, now we need a query that will spill. A spill occurs when SQL Server doesn’t provide enough memory to an operator, and it has to use space in tempdb to complete the operation. So we need a way to get SQL Server to under-provision memory. The most common way this occurs in the wild is when the actual number of rows passed to a sort is far higher than what the optimizer estimated. But this is SCIENCE[!] so let’s screw with the memory grant instead.

SELECT ID
FROM #Valdez
ORDER BY gooey
OPTION(
MAX_GRANT_PERCENT = .000001
)

Now we need to read the log generated. Start with a CHECKPOINT command. Since tempdb is in simple recovery (and there are no other ongoing transactions because this is a test instance…right?), this will clear the logs. Then run the spilling query (I use a trash variable to prevent displaying results) and follow it up with the handy-dandy undocumented fn_dblog().

USE tempdb
GO

CHECKPOINT
GO

DECLARE @trash int

SELECT @trash = ID
FROM #Valdez
ORDER BY gooey
OPTION(
MAX_GRANT_PERCENT = .000001
)

SELECT Operation, Context, [Page ID], [Transaction Name], Description, [Log Record Length]
FROM sys.fn_dblog(NULL,NULL)

 

QED

I see a lot of activity on PFS, GAM, and IAM pages that makes me think this is minimally efficiently logged – only the allocations without the content of the spill. It’s even possible to find one of these sort pages by checking allocated pages mentioned in the description column. You can go digging for one yourself, if it’s the kind of thing to make you exclaim, “Oooh, interesting!”

The Longest Row

It’s hard to ask a question about SQL Server to which the answer isn’t, “It depends.” Unless of course you’re asking if you should use Auto-Shrink (the answer is NO! and an angry glare).  Edge cases are everywhere, and hidden gotchas lurk around the corner even for well-known features.

But what about the maximum size of a row on page? Microsoft and even error messages say that you can’t go past 8060 bytes. Surely we can trust that this is an accurate limit?

Hold my beer.

Start with a basic setup, then try to go past 8060 bytes.

USE master
GO

DROP DATABASE IF EXISTS ForrestSaysHi
CREATE DATABASE ForrestSaysHi
GO

USE ForrestSaysHi
GO

--too wide
CREATE TABLE dbo.TooWideTable (
ID int,
junk CHAR(8000),
filler CHAR(100)
)

Yup, there’s the error message.

Right-size a column to get exactly 8060 bytes. We can even use DBCC PAGE to confirm its length.

--just right
CREATE TABLE dbo.WideTable (
ID int,
junk CHAR(8000),
filler CHAR(49)
)
INSERT dbo.WideTable
VALUES (1,REPLICATE('a',8000),REPLICATE('b',49))
GO

--check the row size with DBCC PAGE
DECLARE @page INT

SELECT TOP 1 @page = allocated_page_page_id
FROM sys.dm_db_database_page_allocations(DB_ID('ForrestSaysHi'),OBJECT_ID('ForrestSaysHi.dbo.WideTable'),NULL,NULL,'Detailed')
WHERE is_allocated = 1
AND is_iam_page = 0

DBCC TRACEON(3604)

DBCC PAGE('ForrestSaysHi',1,@page,1)

Make a couple changes, add another row…

USE master
GO

ALTER DATABASE ForrestSaysHi
SET ALLOW_SNAPSHOT_ISOLATION ON
GO

USE ForrestSaysHi
GO

CREATE CLUSTERED INDEX CX_blahblah ON dbo.WideTable(ID)
GO

INSERT dbo.WideTable
VALUES (1,REPLICATE('a',8000),REPLICATE('b',49))
GO

--check the row size with DBCC PAGE
DECLARE @page INT

--grab the last row
SELECT TOP 1 @page = allocated_page_page_id
FROM sys.dm_db_database_page_allocations(DB_ID('ForrestSaysHi'),OBJECT_ID('ForrestSaysHi.dbo.WideTable'),NULL,NULL,'Detailed')
WHERE is_allocated = 1
AND is_iam_page = 0
ORDER BY allocated_page_page_id DESC

DBCC TRACEON(3604)

DBCC PAGE('ForrestSaysHi',1,@page,1)

And now we have 8082 bytes!

Once you stop applauding this incredible feat, you might stop to ask how I managed it. What’s going on is combination of two tricks. The first has to do with enabling Snapshot Isolation. Snapshot is an isolation level that allows users to view data consistent to a point in time without locking the row. SQL Server manages this by storing old versions of the row in tempdb, and keeping a pointer to those versions in a 14-byte versioning tag added to the row.

The last 8 bytes come from the clustered index I added. Notice how there’s a clustered index, but no primary key? It’s how I was able to add two rows with the same ID of “1”. SQL Server has to uniquely identify rows though (it’s how it points back to the source row from non-clustered indexes), so it adds a “uniquefier” taking up a total of 8 bytes to the second row.

Like any good Hold My Beer stunt, this one ends with a big error. Watch what happens if you try to update that second row.

--succeeds
UPDATE TOP (1) ForrestSaysHi.dbo.WideTable
SET filler = REPLICATE('z',49)

--fails (don't try this at home, kids!)
UPDATE ForrestSaysHi.dbo.WideTable
SET filler = REPLICATE('z',49)

I wouldn’t be surprised if there’s a way to push this limit even higher. Spend some time investigating row length for yourself, and see if you can break my record of 8082 bytes. Then, when someone asks how you maintain your grand DBA physique, just say “competitive rowing.”