April 2024 – Forrest Shares Stuff

Did you know that indexes with included columns have less efficient logging? Let’s update a single column in a single row.

Here’s what the log record for the clustered index looks like:

And here’s the log record for a nonclustered index. Notice a difference?

I’ll present the conclusion now: included columns in an index always get included in the log record, even if they weren’t modified. The same is not true for the clustered index. This can really matter if log generation is important to your environment, e.g. using an AG.

Proof

There are two ways to prove this: by examining log details and by comparing size after changes. We’ll start with the fn_dblog method.

USE TestDB
GO

ALTER DATABASE TestDB SET RECOVERY SIMPLE
GO

CREATE TABLE dbo.AllInclusive
(
    ID INT IDENTITY,
    Num INT NOT NULL,
    JunkA BINARY(4) NOT NULL,
    JunkB BINARY(4) NOT NULL,
    JunkC BINARY(4) NOT NULL,
    JunkD BINARY(4) NOT NULL,
    JunkE BINARY(4) NOT NULL,
    JunkF BINARY(4) NOT NULL,
    INDEX PK_AllInclusive UNIQUE CLUSTERED (ID)
);

CREATE INDEX IX_LotsaJunk ON dbo.AllInclusive (ID)
INCLUDE(Num,JunkA,JunkB,JunkC,JunkD,JunkE,JunkF)

INSERT dbo.AllInclusive (Num,JunkA,JunkB,JunkC,JunkD,JunkE,JunkF)
VALUES (42,0xAAAAAAAA,0xBBBBBBBB,0xCCCCCCCC,0xDDDDDDDD,0xEEEEEEEE,0xFFFFFFFF)

--Tab 1
CHECKPOINT
GO

BEGIN TRAN

UPDATE dbo.AllInclusive
SET Num += 1
WHERE ID = 1

--rollback

--Tab 2
SELECT * --you can see the binary contents in RowLog Contents 0
FROM sys.fn_dblog(NULL,NULL)

And within the messy guts of undocumented output, you can see how uninvolved columns show up for the nonclustered index but not for the clustered.

But maybe divining details from an undocumented function isn’t enough for you. Empiricism, you declare, not haruspicy. Ok then – let’s make a million updates then measure log size, with and without included columns.

Run the setup again, but with larger columns to make the difference clear.

DROP TABLE IF EXISTS dbo.AllInclusive

CREATE TABLE dbo.AllInclusive
(
    ID INT IDENTITY,
    Num INT NOT NULL,
    JunkA BINARY(400) NOT NULL,
    JunkB BINARY(400) NOT NULL,
    JunkC BINARY(400) NOT NULL,
    JunkD BINARY(400) NOT NULL,
    JunkE BINARY(400) NOT NULL,
    JunkF BINARY(400) NOT NULL,
    INDEX PK_AllInclusive UNIQUE CLUSTERED (ID)
);

CREATE INDEX IX_LotsaJunk ON dbo.AllInclusive (ID)
INCLUDE(Num,JunkA,JunkB,JunkC,JunkD,JunkE,JunkF)

INSERT dbo.AllInclusive (Num,JunkA,JunkB,JunkC,JunkD,JunkE,JunkF)
VALUES (42,0xAAAAAAAA,0xBBBBBBBB,0xCCCCCCCC,0xDDDDDDDD,0xEEEEEEEE,0xFFFFFFFF)

CHECKPOINT --step 2
GO

SET NOCOUNT ON

BEGIN TRAN

DECLARE @i INT = 0

WHILE @i < 1000000
BEGIN
	UPDATE dbo.AllInclusive
	SET Num += 1
	WHERE ID = 1

	SET @i += 1
END

--COMMIT --step 4, don't forget this

Make sure to check log size before and after, and only commit after measuring.

--step 1 and 3
SELECT *
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Log File(s) Used Size (KB)'
AND instance_name = 'TestDB'

Now change the index to skip the unchanged included columns, and run the test again.

DROP INDEX IX_LotsaJunk ON dbo.AllInclusive

CREATE INDEX IX_NoJunk ON dbo.AllInclusive (ID)
INCLUDE(Num)

I get 7.8GB of logs for the index full of included columns, and 0.75GB of logs for IX_NoJunk. Note that this includes reserved space, so it isn’t a precise measurement of impact, just a proof of difference.

This Matters

Of all the potential bottlenecks for SQL Server, I would say that log generation is easily in the top 5. It matters because logging = disk writes, and it really matters with AGs. Did you know that Azure SQL DB will throttle your log throughput unless you pay (even more) money?

So here you go, a reason to choose key lookups over covering indexes, another reason to say “it depends” when designing a database, and yet another reason to yell at missing index suggestions.

Month: April 2024

Included Columns Cause Log Bloat

Proof

This Matters