Included Columns Cause Log Bloat

Did you know that indexes with included columns have less efficient logging? Let’s update a single column in a single row.

Here’s what the log record for the clustered index looks like:

And here’s the log record for a nonclustered index. Notice a difference?

I’ll present the conclusion now: included columns in an index always get included in the log record, even if they weren’t modified. The same is not true for the clustered index. This can really matter if log generation is important to your environment, e.g. using an AG.

Proof

There are two ways to prove this: by examining log details and by comparing size after changes. We’ll start with the fn_dblog method.

USE TestDB
GO

ALTER DATABASE TestDB SET RECOVERY SIMPLE
GO
CREATE TABLE dbo.AllInclusive
(
    ID INT IDENTITY,
    Num INT NOT NULL,
    JunkA BINARY(4) NOT NULL,
    JunkB BINARY(4) NOT NULL,
    JunkC BINARY(4) NOT NULL,
    JunkD BINARY(4) NOT NULL,
    JunkE BINARY(4) NOT NULL,
    JunkF BINARY(4) NOT NULL,
    INDEX PK_AllInclusive UNIQUE CLUSTERED (ID)
);

CREATE INDEX IX_LotsaJunk ON dbo.AllInclusive (ID)
INCLUDE(Num,JunkA,JunkB,JunkC,JunkD,JunkE,JunkF)

INSERT dbo.AllInclusive (Num,JunkA,JunkB,JunkC,JunkD,JunkE,JunkF)
VALUES (42,0xAAAAAAAA,0xBBBBBBBB,0xCCCCCCCC,0xDDDDDDDD,0xEEEEEEEE,0xFFFFFFFF)
--Tab 1
CHECKPOINT
GO

BEGIN TRAN

UPDATE dbo.AllInclusive
SET Num += 1
WHERE ID = 1

--rollback
--Tab 2
SELECT * --you can see the binary contents in RowLog Contents 0
FROM sys.fn_dblog(NULL,NULL)

And within the messy guts of undocumented output, you can see how uninvolved columns show up for the nonclustered index but not for the clustered.

But maybe divining details from an undocumented function isn’t enough for you. Empiricism, you declare, not haruspicy. Ok then – let’s make a million updates then measure log size, with and without included columns.

Run the setup again, but with larger columns to make the difference clear.

DROP TABLE IF EXISTS dbo.AllInclusive

CREATE TABLE dbo.AllInclusive
(
    ID INT IDENTITY,
    Num INT NOT NULL,
    JunkA BINARY(400) NOT NULL,
    JunkB BINARY(400) NOT NULL,
    JunkC BINARY(400) NOT NULL,
    JunkD BINARY(400) NOT NULL,
    JunkE BINARY(400) NOT NULL,
    JunkF BINARY(400) NOT NULL,
    INDEX PK_AllInclusive UNIQUE CLUSTERED (ID)
);

CREATE INDEX IX_LotsaJunk ON dbo.AllInclusive (ID)
INCLUDE(Num,JunkA,JunkB,JunkC,JunkD,JunkE,JunkF)

INSERT dbo.AllInclusive (Num,JunkA,JunkB,JunkC,JunkD,JunkE,JunkF)
VALUES (42,0xAAAAAAAA,0xBBBBBBBB,0xCCCCCCCC,0xDDDDDDDD,0xEEEEEEEE,0xFFFFFFFF)
CHECKPOINT --step 2
GO

SET NOCOUNT ON

BEGIN TRAN

DECLARE @i INT = 0

WHILE @i < 1000000
BEGIN
	UPDATE dbo.AllInclusive
	SET Num += 1
	WHERE ID = 1

	SET @i += 1
END

--COMMIT --step 4, don't forget this

Make sure to check log size before and after, and only commit after measuring.

--step 1 and 3
SELECT *
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Log File(s) Used Size (KB)'
AND instance_name = 'TestDB'

Now change the index to skip the unchanged included columns, and run the test again.

DROP INDEX IX_LotsaJunk ON dbo.AllInclusive

CREATE INDEX IX_NoJunk ON dbo.AllInclusive (ID)
INCLUDE(Num)

I get 7.8GB of logs for the index full of included columns, and 0.75GB of logs for IX_NoJunk. Note that this includes reserved space, so it isn’t a precise measurement of impact, just a proof of difference.

This Matters

Of all the potential bottlenecks for SQL Server, I would say that log generation is easily in the top 5. It matters because logging = disk writes, and it really matters with AGs. Did you know that Azure SQL DB will throttle your log throughput unless you pay (even more) money?

So here you go, a reason to choose key lookups over covering indexes, another reason to say “it depends” when designing a database, and yet another reason to yell at missing index suggestions.

5 thoughts on “Included Columns Cause Log Bloat”

  1. Well if you think about it the included columns are nothing but data hanging on the end of the NC indexes Key and Row Indicator. How is that data maintained, is it structured like a normal page for a clustered index row? Likely not since it is just extra data hanging off the end it has no intrinsic value to the key value and row indicator itself, the only purpose it serves is to provide data that would be returned from a leaf page of the clustered index for that key value in a query. Now it should be semi-structured in some sense, it will have data types defined and has to have some structure to delimit the various columns in the data but otherwise it just extra baggage in the index. So make a change to a column in the included columns. How do you do that, well you have to pull the entire chunk of data into memory and then find the chunk you want and make a change to it, in your case to fixed data type so a fixed size but if could just as well be non-fixed in size. So then you have made the change in the column of choice but you have the whole data chunk in memory and so you have to write it all back out to the NC index page that you got it from. That means the whole data chunk gets written out and that has to be logged. Trying to manage a chunk of data like a table row is not feasible, the index itself is not a table but just the key value and row indicator so that included data should be handled as chunk of data in it’s entirety not as columns in a table.

  2. Wow! Thank you for this, @Forrest. I have an AG environment and excessive log generation is always a concern. Your conclusion is right on: “another reason to say ‘it depends’ when designing a database”

  3. A covering index has all of the columns in the keys. An index with included columns is not necessarily a covering index. What I mean is a covering idea with all of the columns in the key is still okay.
    [Admin merged comments]

  4. I have more than just a casual interest in indexes so thank you VERY much for this post, Forrest.

    I see the note below says your “previous” post was also on indexes. I’ll have a look at that, as well.

    Thanks, again.

  5. Interesting. Did you take into account that every nonclustered index must also include the clustered index? I didn’t see that in your proof. Cheers!

    -Kevin

Leave a Reply

Your email address will not be published. Required fields are marked *