Forrest Shares Stuff

Are Bigint Splits a GUID Idea?

I like fast databases. Naturally, this puts me at odds with GUIDs. Alas, developers like to use them, sometimes for good reasons (“we need ids generated on different servers to be unique”) and often for bad (“what if we run out of bigints”).

The Columnstore Problem

You may have run into issues with GUIDs as clustering keys, but another major problem is in columnstore. Smart people at Microsoft wrote columnstore in a way to take advantage of modern CPU features, but those CPU features don’t play well with datatypes larger than 8 bytes. Which includes GUIDs.

Let me show you: here’s a select on a GUID

And here’s a select on a bigint

We Need to Split Up

Crazy idea time: a uniqueidentifier is 16 bytes…what if we just turned that into two 8 byte bigint columns?

create table #t (
UID uniqueidentifier not null
)

create table dbo.GUID_IDEA (
UID uniqueidentifier not null,
b1 bigint,
b2 bigint,
index CCX_GUID clustered columnstore
)

insert #t
select top (5000000) newid()
from sys.all_columns a, sys.all_columns b, sys.all_columns c

insert dbo.GUID_IDEA (UID, b1, b2)
select UID
,convert(bigint,RIGHT_SHIFT(convert(binary(16),UID),64)) --there are several ways to split - this one is 2022+
,convert(bigint,RIGHT_SHIFT(LEFT_SHIFT(convert(binary(16),UID),64),64))
from #t

The Demo

Split done. We have a 5m row table storing both the full GUID and its decomposed bigints. Time to test it. Load a temp table with some sample values.

create table #driver(
UID uniqueidentifier,
b1 bigint,
b2 bigint
)
go

insert #driver
select *
from dbo.GUID_IDEA tablesample(5 rows)
go 5 --this lets us grab rows from different locations

Now, how does a GUID-only join perform?

select *
from #driver d
join dbo.GUID_IDEA gi
on gi.UID = d.UID

And what if we include the bigints on the join?

select *
from #driver d
join dbo.GUID_IDEA gi
on gi.UID = d.UID
and gi.b1 = d.b1
and gi.b2 = d.b2

According to the plan details, that’s 268ms CPU down to 21ms, huge win! A 16 byte predicate can’t be pushed down here, but two separate 8 byte predicates (or rather, their bitmap filters) can.

This also works for where clauses.

Side Notes:

See this? Should be fast, right?

select *
from dbo.GUID_IDEA gi
where b1 = 3511787043328731458

Except…

Whereas this one is fast:

select *
from dbo.GUID_IDEA gi
where b1 = 42

This happens because the parser is just looking at the number of digits, and if there are as many digits as max bigint (and thus at risk of being larger than a bigint), it stops treating the number like a bigint.

Thankfully the fix is easy:

select *
from dbo.GUID_IDEA gi
where b1 = convert(bigint,3511787043328731458)

Another question you might have is why bigints? Why not just use two binary(8) columns.

Alas…

create table dbo.BETTER_IDEA (
UID uniqueidentifier not null
,p1 binary(8) not null
,p2 binary(8) not null
,index ccx_better clustered columnstore 
)

...blah blah load it and split it, you can figure it out

select *
from BETTER_IDEA
where p1 = 0x97E6A3BAC4E4A64F

This is one I don’t have an explanation for, and may need to send some wine to New Zealand for an answer. Edit: the Wizard from the Other Down Under provided an answer: Columnstore never stores binary data in a pushdown friendly format. Actual explanation here.

Conclusion

I have never done this in production, and I’m not sure I want to even with a 20X speedup – simple designs and getting data right the first time tend to outperform fancy tricks. Maybe, just maybe, it’s worth bolting on to a calcified schema where it’s the only way. If you do try this yourself, let me know how it goes!

Cardinality in a Shared Membership Query

Take a look at this. Pay attention to the estimated rows. See the problem?

Even at a full cartesian join, 49 by 49 rows does not produce 27670.

Usually, when the optimizer applies transformations it updates cardinality estimates, but I ran into a particular construction and transformation where it doesn’t (and I don’t know why).

Here’s the basic idea for my demo query. You have three tables, Students, Towns, and Fraternities. Each student has a hometown and a fraternity. For a given list of hometowns, what fraternities have members from those towns?

One possible way to write the query is like this:

--using a temp table to take the place of a TVP 
--as encountered in a real production scenario
SELECT TOP(2) TownId, TownName
INTO #t
FROM dbo.Towns
WHERE TownName LIKE '%Dallas%'

SELECT f.Fraternity, t.TownName
FROM #t t
CROSS JOIN Fraternities f
WHERE EXISTS (
	SELECT 1
	FROM dbo.Students s
	WHERE t.TownId = s.HometownId
	AND s.FraternityId = f.FraternityId
	)

If the cross join weirds you out, note that it was disguised when I first found it with a one-sided ON filter. E.g.

SELECT f.Fraternity, t.TownName
FROM #t t
JOIN Fraternities f --this is still a cross join
ON f.RandomColumn = @Param

This is the full plan produced, across two pictures. SQL Server doesn’t actually do a cross join, thankully, but starts with the small temp table to find existing Students with matching hometown.

It then uses the FraternityIds from these students to look up data in the Fraternity table, and finally aggregates the data to remove duplicates (to preserve the semi join requirements).

So What?

As encountered in my real production issue, the bad cardinality estimate caused the optimzer to shove inappropriately large tables up before the join, causing timeouts. Cartesian cardinalities can get pretty high, so what should have been a good, filtering join was getting delayed. And no, I didn’t bother with a demo for this part, sorry.

A Fix

There’s another way to write the query that’s extremely similar to the above plan SQL Server decided to use anyways:

SELECT f.Fraternity, t.TownName
FROM Fraternities f
join (
	SELECT DISTINCT t.TownName, s.FraternityId
	FROM dbo.Students s
	JOIN #t t
	ON t.TownId = s.HometownId
	) t
ON t.FraternityId = f.FraternityId

The difference here is that the aggregation is done before the join to Fraternity

What’s Going On?

The high estimate comes from the cartesian product of the temp table (2 rows) and Fraternities (13824 rows) because of the cross join. It just sticks around.

Looking into the used transformations, I identified LSJNtoJNonDist as key. Here’s the plan with OPTION(queryruleoff LSJNtoJNonDist)

This transformation failed to update the cardinality, yet others, like commuting joins, did. Here’s an example where adding a one row table to the query gets it moved earlier and updates cardinality.

SELECT 1 AS Id
INTO #onerow

SELECT f.Fraternity
FROM #t t
CROSS JOIN Fraternities f
JOIN #onerow p
ON f.FraternityId = p.Id
WHERE EXISTS (
	SELECT 1
	FROM dbo.Students s
	WHERE t.TownId = s.HometownId
	AND s.FraternityId = f.FraternityId
	)

And here is approximately where I’m stuck, barring a large further investment of time. I’d like to find another query that relies on LSJNtoJNonDist and see if it fails to update cardinality as well, but my first few attempts ran into different issues (like an early aggregate instead of after the join).

It’s unsatisfying to me – I don’t know whether this is truly a bug or just an expected artifact of the optimization process, but I have a workaround and I’d rather continue yelling at the AI I’m teaching to read query plans.

Setup Script:

USE TestDB
GO

CREATE TABLE dbo.Towns (
TownID INT IDENTITY PRIMARY KEY
,TownName VARCHAR(500) NOT NULL
)

CREATE TABLE dbo.Students (
StudentId INT IDENTITY PRIMARY KEY
,StudentName VARCHAR(100) NOT null
,FraternityId INT NULL
,HometownId INT NOT NULL
)

CREATE INDEX IX_Students_FraternityId
ON dbo.Students (FraternityId)

CREATE INDEX IX_Students_HometownId
ON dbo.Students (HometownId)

CREATE TABLE dbo.Fraternities (
FraternityId INT IDENTITY PRIMARY KEY
,Fraternity NCHAR(3) NOT null
,FraternityName VARCHAR(100) NOT null
)

INSERT dbo.Towns (TownName)
SELECT DISTINCT Location
FROM StackOverflow2010.dbo.Users 
WHERE TRY_CONVERT(VARCHAR(100),Location) IS NOT NULL

;WITH GreekLetters AS (
    SELECT 'Alpha' AS LetterName, N'Α' AS UpperCase, 'α' AS LowerCase, 1 AS SortOrder
    UNION ALL SELECT 'Beta', N'Β', 'β', 2
    UNION ALL SELECT 'Gamma', N'Γ', 'γ', 3
    UNION ALL SELECT 'Delta', N'Δ', 'δ', 4
    UNION ALL SELECT 'Epsilon', N'Ε', 'ε', 5
    UNION ALL SELECT 'Zeta', N'Ζ', 'ζ', 6
    UNION ALL SELECT 'Eta', N'Η', 'η', 7
    UNION ALL SELECT 'Theta', N'Θ', 'θ', 8
    UNION ALL SELECT 'Iota', N'Ι', 'ι', 9
    UNION ALL SELECT 'Kappa', N'Κ', 'κ', 10
    UNION ALL SELECT 'Lambda', N'Λ', 'λ', 11
    UNION ALL SELECT 'Mu', N'Μ', 'μ', 12
    UNION ALL SELECT 'Nu', N'Ν', 'ν', 13
    UNION ALL SELECT 'Xi', N'Ξ', 'ξ', 14
    UNION ALL SELECT 'Omicron', N'Ο', 'ο', 15
    UNION ALL SELECT 'Pi', N'Π', 'π', 16
    UNION ALL SELECT 'Rho', N'Ρ', 'ρ', 17
    UNION ALL SELECT 'Sigma', N'Σ', 'σ', 18
    UNION ALL SELECT 'Tau', N'Τ', 'τ', 19
    UNION ALL SELECT 'Upsilon', N'Υ', 'υ', 20
    UNION ALL SELECT 'Phi', N'Φ', 'φ', 21
    UNION ALL SELECT 'Chi', N'Χ', 'χ', 22
    UNION ALL SELECT 'Psi', N'Ψ', 'ψ', 23
    UNION ALL SELECT 'Omega', N'Ω', 'ω', 24
),

-- Generate all combinations using cross joins
AllCombinations AS (
    SELECT 
        L1.UpperCase + L2.UpperCase + L3.UpperCase AS GreekLetters,
        L1.LetterName + ' ' + L2.LetterName + ' ' + L3.LetterName AS FullName
    FROM GreekLetters L1
    CROSS JOIN GreekLetters L2
    CROSS JOIN GreekLetters L3
)

INSERT dbo.Fraternities(Fraternity,FraternityName)
SELECT GreekLetters,FullName
FROM AllCombinations

INSERT dbo.Students(StudentName,FraternityId,HometownId)
SELECT DisplayName
,FLOOR(RAND(CONVERT(VARBINARY(20),NEWID()))*13824+1)
,FLOOR(RAND(CONVERT(VARBINARY(20),NEWID()))*12265+1)
FROM StackOverflow2010.dbo.Users 
WHERE TRY_CONVERT(VARCHAR(100),DisplayName) IS NOT null

PSA: Adaptive Joins are Immoral

In case you haven’t heard, moral decline is everywhere. The latest generation of query operators is a prime example of this rot. “But this is programming” you say, “what do morals have to do with anything?” No. This is SQL. And for us, we have the term morally equivalent plans which allows me to finally have an ethical excuse for sprinkling a post with degenerate puns.

“Morally equivalent” describes plans that have the same shape but with differing details. For adaptive joins, Threshold Rows is one of these details. Alas, moral relativism fails again, because this is a detail that matters.

An adaptive join tipping to a scan after 2 rows is morally equivalent to an adaptive join scanning after a 10000 row threshold. You are not allowed to use Query Store to force your will on this plan – SQL Server only sells certified cage-free adaptive joins. The plan shape will be forced, but the threshold can and will change as the plan gets recompiled.

Moreover, Query Store often (but not always) treats different threshold values as separate plans, so you might force a single plan and still end up with skittles splattered across your GUI.

All of those are identically shaped plans. All of them. Oh, and that 2-row vs 10000-row threshold was not a hypothetical example and absolutely matters for performance.

So do not force a 1000-row threshold plan and expect stable performance. According the Copenhagen Interpretation of Ethics (aka normal DBAing), it’s your fault if the plan changes and starts timing out.

Why are Columnstore Key Lookups Bad

I’ve read it repeatedly, columnstore key lookups are extra slow. The question of course, is why?

In my mental model, it makes sense. A normal key lookup adds about 3 reads.

While a columnstore lookup should add at least a read per column, since each column lives in its own segments.

Logically the next step is to test it. I made a copy of the StackOverflow2013 Posts table and slapped clustered columnstore and a nonclustered index on it.

SELECT TOP (1) *
FROM dbo.PostsCS WITH(FORCESEEK)
WHERE ID > 0

There are 20 columns, so I should expect at least 20 extra reads on top of the inital seek, right?

That’s…a lot more than 20 extra reads. What’s going on?

Segment Lookups, not Row Lookups

First off, each key lookup per column doesn’t just read the page it’s on, but seems to read the entire segment. This can be tested by comparing lob reads to segment size – here are some sample scripts:

SELECT c.name, s.on_disk_size/1024.0/8 AS approx_pages
FROM sys.column_store_segments s
JOIN sys.partitions p
ON p.partition_id = s.partition_id
JOIN sys.columns c
ON c.object_id = p.object_id
AND c.column_id = s.column_id
WHERE p.object_id = OBJECT_ID('dbo.PostsCS')
AND s.segment_id = 11 --I used segment min/max values to confirm this is the correct one

SET STATISTICS IO ON
SELECT TOP (1) LastEditDate --lob logical reads 49, segment size 48
FROM dbo.PostsCS WITH(FORCESEEK)
WHERE ID > 0

So it’s more like this:

Dictionaries

It gets worse.

The following query has 7135 lob reads

SELECT TOP (1) Body
FROM dbo.PostsCS WITH(FORCESEEK)
WHERE ID > 0

Which should make some sense, given that body is nvarchar(max), but the segment is only 43 pages. 43 pages is rather small for lots of nvarchar(max) data isn’t it? That’s because SQL Server is actually storing the data in dictionaries.

So when the key lookup occurs for a column with a dictionary, not only does the whole segment get read, but the dictionary gets read too. And there might be more than one dictionary.

This is a lot of reads.

And a Mystery

Something seems to happen with dictionary-backed columns where there are more reads than necessary, and a lot of read-ahead reads. I haven’t figured out why, and plan to tackle this next.

Recommendations

Columnstore key lookups can have the disastrous performance that people mistakenly attribute to normal key lookups. When you do need to use them, try to 1) pull as few columns as possible, 2) as few rows as possible, and 3) avoid dictionary-backed columns when possible.

Solving Timeouts by Making Queries…Worse?

I love how there are lots of little things to SQL Server – mostly ignored details that only sometimes matter but make sense when you stare at them. Actually, I’m not sure I love them. I forget about them, and then stub my toe on them. Occasionally though, there’s a quirky combination that leads to bloggable shenanigans.

Let’s start with Detail Number One, which has most definitely tripped me up: queries that are returning rows won’t time out. You might be familiar with connections using a default 30s timeout, but as long the query is returning a row every 29s, the client won’t time it out. You can even test this in SSMS.

WHILE 1=1
BEGIN

SELECT 'tro'+REPLICATE('lo',4000)

WAITFOR DELAY '00:00:29'

END

Now for a puzzle which will lead us to little detail number two: will the below query time out? The scalar function takes just under 3s per row, and there are 15 rows.

SELECT TOP 15 ID, dbo.Take3s(1)
FROM StackOverflow2010.dbo.Users
OPTION(USE HINT('DISABLE_TSQL_SCALAR_UDF_INLINING'))

Here’s the function and plan if you’re curious:

CREATE OR ALTER FUNCTION dbo.Take3s
(@useless BIT)
RETURNS bigint
AS
BEGIN

DECLARE @c bigint
SELECT @c = COUNT(*)
FROM StackOverflow2013.dbo.Comments
OPTION(MAXDOP 1)

RETURN @c

END

And the answer:

It times out, despite being a (mostly) streaming plan. And the reason comes down to Detail #2: rows aren’t delivered to the client one-at-a-time, but in packets.

Even though the rows are produced every 3s, they aren’t delivered every 3s, and the client is left waiting up to the 30s mark where it cancels the query. But wait, we can fix this! All we have to do is fatten up the packet with junk.

SELECT TOP 15 ID, dbo.Take3s(1), REPLICATE('junk',2000)
FROM StackOverflow2010.dbo.Users
OPTION(USE HINT('DISABLE_TSQL_SCALAR_UDF_INLINING'))

And it works!

If you have a slow but streaming query that times out, you can return extra (junk) data in each row to fill up packets sooner and get those packets to the client before the deadline. So yes, you can fix timeouts by making queries worse.

Not that you should do that. This is a lolpost, not permission. I hope you enjoyed it.

Included Columns Cause Log Bloat

Did you know that indexes with included columns have less efficient logging? Let’s update a single column in a single row.

Here’s what the log record for the clustered index looks like:

And here’s the log record for a nonclustered index. Notice a difference?

I’ll present the conclusion now: included columns in an index always get included in the log record, even if they weren’t modified. The same is not true for the clustered index. This can really matter if log generation is important to your environment, e.g. using an AG.

Proof

There are two ways to prove this: by examining log details and by comparing size after changes. We’ll start with the fn_dblog method.

USE TestDB
GO

ALTER DATABASE TestDB SET RECOVERY SIMPLE
GO

CREATE TABLE dbo.AllInclusive
(
    ID INT IDENTITY,
    Num INT NOT NULL,
    JunkA BINARY(4) NOT NULL,
    JunkB BINARY(4) NOT NULL,
    JunkC BINARY(4) NOT NULL,
    JunkD BINARY(4) NOT NULL,
    JunkE BINARY(4) NOT NULL,
    JunkF BINARY(4) NOT NULL,
    INDEX PK_AllInclusive UNIQUE CLUSTERED (ID)
);

CREATE INDEX IX_LotsaJunk ON dbo.AllInclusive (ID)
INCLUDE(Num,JunkA,JunkB,JunkC,JunkD,JunkE,JunkF)

INSERT dbo.AllInclusive (Num,JunkA,JunkB,JunkC,JunkD,JunkE,JunkF)
VALUES (42,0xAAAAAAAA,0xBBBBBBBB,0xCCCCCCCC,0xDDDDDDDD,0xEEEEEEEE,0xFFFFFFFF)

--Tab 1
CHECKPOINT
GO

BEGIN TRAN

UPDATE dbo.AllInclusive
SET Num += 1
WHERE ID = 1

--rollback

--Tab 2
SELECT * --you can see the binary contents in RowLog Contents 0
FROM sys.fn_dblog(NULL,NULL)

And within the messy guts of undocumented output, you can see how uninvolved columns show up for the nonclustered index but not for the clustered.

But maybe divining details from an undocumented function isn’t enough for you. Empiricism, you declare, not haruspicy. Ok then – let’s make a million updates then measure log size, with and without included columns.

Run the setup again, but with larger columns to make the difference clear.

DROP TABLE IF EXISTS dbo.AllInclusive

CREATE TABLE dbo.AllInclusive
(
    ID INT IDENTITY,
    Num INT NOT NULL,
    JunkA BINARY(400) NOT NULL,
    JunkB BINARY(400) NOT NULL,
    JunkC BINARY(400) NOT NULL,
    JunkD BINARY(400) NOT NULL,
    JunkE BINARY(400) NOT NULL,
    JunkF BINARY(400) NOT NULL,
    INDEX PK_AllInclusive UNIQUE CLUSTERED (ID)
);

CREATE INDEX IX_LotsaJunk ON dbo.AllInclusive (ID)
INCLUDE(Num,JunkA,JunkB,JunkC,JunkD,JunkE,JunkF)

INSERT dbo.AllInclusive (Num,JunkA,JunkB,JunkC,JunkD,JunkE,JunkF)
VALUES (42,0xAAAAAAAA,0xBBBBBBBB,0xCCCCCCCC,0xDDDDDDDD,0xEEEEEEEE,0xFFFFFFFF)

CHECKPOINT --step 2
GO

SET NOCOUNT ON

BEGIN TRAN

DECLARE @i INT = 0

WHILE @i < 1000000
BEGIN
	UPDATE dbo.AllInclusive
	SET Num += 1
	WHERE ID = 1

	SET @i += 1
END

--COMMIT --step 4, don't forget this

Make sure to check log size before and after, and only commit after measuring.

--step 1 and 3
SELECT *
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Log File(s) Used Size (KB)'
AND instance_name = 'TestDB'

Now change the index to skip the unchanged included columns, and run the test again.

DROP INDEX IX_LotsaJunk ON dbo.AllInclusive

CREATE INDEX IX_NoJunk ON dbo.AllInclusive (ID)
INCLUDE(Num)

I get 7.8GB of logs for the index full of included columns, and 0.75GB of logs for IX_NoJunk. Note that this includes reserved space, so it isn’t a precise measurement of impact, just a proof of difference.

This Matters

Of all the potential bottlenecks for SQL Server, I would say that log generation is easily in the top 5. It matters because logging = disk writes, and it really matters with AGs. Did you know that Azure SQL DB will throttle your log throughput unless you pay (even more) money?

So here you go, a reason to choose key lookups over covering indexes, another reason to say “it depends” when designing a database, and yet another reason to yell at missing index suggestions.

Missing Index Hints and the Abominable Rebuild

“It’s a very short blocking phase” they said. “Metadata only” they said.

Meanwhile, here’s my online rebuild of a zero row table.

Yes, it’s been running for 30s, blocking the whole time. For extra fun- it’s unkillable.

Findings first: SQL Server stores details about missing index suggestions. The dmv pertinent to screwing up index operations is sys.dm_db_missing_index_group_stats_query. Should this DMV accumulate a very large number of entries, any index create/rebuild on any table with a missing index hint will take extra time to process those entries. Yes, that’s any table anywhere – doesn’t matter if it’s in a separate database with a single hinted query.

Troubleshooting

The prod version of this issue manifested as 1-3 minutes of SCH-M blocking for index ops, yet only on a single server; I couldn’t replicate it anywhere else. Of course, bad blocking meant nasty timeouts, and that in turn meant I was reluctant to run live diagnostics. Better to just stop index rebuilds. One does not take down production in order to test weird SQL behavior.

An important clue came from a tiny table with a rebuild, and is visible in the whoisactive screenshot above. Why would such a small rebuild need so much CPU? That worker was definitely doing something – the question was what.

My lucky break came with a bad proc that needed emergency index help – in this case the fix was worth the blocking, and the table was non-critical anyways. I readied the index creation script, and prepared an extended event to capture callstacks on waits and spinlock backoffs for my spid.

And…jackpot. The worker was hitting SOSY waits and for 3 minutes every callstack was topped with:

sqllang!CQSIndexStatsMgr::DropIndexStatsWithWildcards
sqllang!CQSIndexRecMgr::InvalidateMissingIndexes
sqllang!QPEvent::CreateIndex
sqllang!CIndexDef::CreateIndex

Missing indexes huh? I started poking around the DMVs, and discovered that sys.dm_db_missing_index_group_stats_query took several minutes to query – it was even unkillable, just like the index rebuilds.

Repro

I finally had enough clues to cause the issue on purpose. I’m going to show you how to do this at home, because breaking SQL Server is fun…in a test environment.

First, let’s create a table. No rows necessary, just tell SQL Server that there are.

drop table if exists dbo.RebuildTest

create table dbo.RebuildTest (
	ID int identity primary key,
	Num int,
	Filler char(100)
)

UPDATE STATISTICS dbo.RebuildTest WITH ROWCOUNT = 1000000, PAGECOUNT = 10000

Now we need to generate missing index hints….lots of them. After some experimenting, I combined two features known to play nicely with caching: temp tables and dynamic SQL. (Dear junior DBA reading this: the prior sentence is sarcasm. Those features are notorious for caching issues.)

SET NOCOUNT ON

DECLARE @sql NVARCHAR(4000)
DECLARE @stop DATETIME = DATEADD(MINUTE,60,GETDATE())

WHILE GETDATE() < @stop
begin
	
	SET @sql =
	'select *
	into #t'+CONVERT(CHAR(36),CONVERT(BINARY(16),NEWID()),1)+'
	from dbo.RebuildTest
	where Num = (select -1)'

	EXEC(@sql)

end

I ran that from ten sessions for a really long time – I think I accumulated about 15 core hours of spam. Rinse and repeat as needed to get the desired slowdown on your own system.

Consequences of above pollution:

Let’s create an unrelated table, also zero rows.

drop table if exists dbo.OtherTable

create table dbo.OtherTable (
	ID int identity primary key,
	Num int,
	Filler char(100)
)

UPDATE STATISTICS dbo.OtherTable WITH ROWCOUNT = 1000000, PAGECOUNT = 10000

If you rebuild now, it will be super fast.

Run a single query that generates a missing index hint however…

select *
from dbo.OtherTable
where Num = (select 42)

And your rebuild transforms into a lumbering unkillable monstrosity

ALTER INDEX ALL ON dbo.OtherTable REBUILD WITH(ONLINE=ON,MAXDOP=1)

--need to tell SQL fake stats again if you want to repeat the tests
UPDATE STATISTICS dbo.OtherTable WITH ROWCOUNT = 1000000, PAGECOUNT = 10000

Resolution

Ironically enough, this can be fixed by, ugh, restarting SQL Server, adding the appropriate index, or rebuilding the table behind the missing index spam. Due to the nature of the pollution, none of the SQL handles from the DMV resolved to text, but I was able to work up a way to see which objects were behind the growth.

SELECT TOP 10 mid.database_id, mid.object_id, COUNT(*) c
FROM sys.dm_db_missing_index_details mid
JOIN sys.dm_db_missing_index_groups mig
ON mig.index_handle = mid.index_handle
JOIN sys.dm_db_missing_index_group_stats_query migs
ON migs.group_handle = mig.index_group_handle
GROUP BY mid.database_id,
         mid.object_id
ORDER BY c desc

There’s another detail I haven’t figured out: for my production system, the DMV had 3 million entries and was adding minutes of extra time. For my laptop, 9 million was about 45 seconds. I’m kinda curious where the discepancy comes from, but not enough that I’m willing to spend hours melting my personal machine or digging through disassembly. Besides, disassembly problems belong to ~~Paul White~~ the magical SQL hermit on a remote island who solves weird code issues for booze. ^{_{and who I secretly aspire to be if I grow up}}

It’s probably a rare issue to hit as well, needing a server that hasn’t been restarted in a long enough time and unique queries that both generate a hint and don’t get combined with similar SQL. I mean, how ridiculous would it be for people to dynamically fill global temp tables named with guids? Nobody does that, right? Crazytown.

But this is the way of things – do weird stuff in SQL, get weird issues. At least weird issues lead to entertaining blog posts, which I sincerely hope does not incentivize you to keep doing the weird stuff. Especially Service Borker. Seriously, knock that out.

Thoughts on RLS for Data Migration

The common corporate amoeba floats around, grasping at revenue, customers, and other nutrients with ciliated pseudopods. You, dear gainfully-employed DBA, are an organelle in this subsentient blob. Sometimes your employer-host catches another amoeba, and it becomes your duty to ingest their delicious data.

This was the situation I found myself in earlier this year – our company had absorbed another, and it was time to slurp up their tables. There were a lot of decisions to make and tradeoffs to weigh, and we ended up choosing to trickle-insert their data, but make it invisible to normal use until the moment of cutover.

The way we implemented this was with Row Level Security. Using an appropriate predicate, we could make sure ETL processes only saw migrated data, apps saw unmigrated data, and admins saw everything. To give a spoiler: it worked, but there were issues.

Let’s consider a simplistic RLS filter, with the StackOverflow2010 database for examples.

CREATE FUNCTION dbo.RLSfunctionSimple(@Migrated AS bit)
    RETURNS TABLE
WITH SCHEMABINDING
AS
    RETURN 
	SELECT 1 AS Pass
    WHERE (@Migrated = 1 AND USER_NAME() IN ('ETLProcess','BubbaTheAdmin'))
	OR (@Migrated = 0 AND USER_NAME() NOT IN ('ETLProcess'))
GO

Great! This works. We add an IsMigrated column to the Users table, set the security policy, and bam, we’re done. Yeah no.

ALTER TABLE dbo.Users 
ADD IsMigrated BIT NOT NULL DEFAULT(0)

CREATE SECURITY POLICY RLS_Users
ADD FILTER PREDICATE dbo.RLSfunctionSimple(IsMigrated) 
ON dbo.Users
WITH (STATE = ON);

More people than just Bubba want to see migrated data. A migration is a big deal, and everyone, everyone, is going to ask to see migrated data for reasons that are honestly quite legitimate. You will constantly be modifying who gets access.

There’s another issue: adding RLS to a table takes a SCH-M lock on that table. Have you experienced the joys of frequent SCH-M locks on critical tables in a highly active database?

So out of a desire to easily make access changes and avoid SCH-M locks (and because you can’t alter the function while it’s used by the security policy), you decide to make the RLS function reference tables. Let’s add one table for the ETL processes that should only see migrated rows, and one table for the folks who want to read everything. Everyone else by default only views unmigrated data.

CREATE TABLE dbo.SeeEverything (
UserName NVARCHAR(256) PRIMARY KEY
)
CREATE TABLE dbo.SeeMigrated (
UserName NVARCHAR(256) PRIMARY KEY
)

And here’s what the RLS function might look like:

CREATE FUNCTION dbo.RLSfunctionComplex(@Migrated AS bit)
    RETURNS TABLE
WITH SCHEMABINDING
AS
    RETURN 
	SELECT 1 AS Pass
    WHERE (@Migrated = 1 
		AND EXISTS (SELECT 1 FROM dbo.SeeMigrated WHERE UserName = USER_NAME())
		OR EXISTS (SELECT 1 FROM dbo.SeeEverything WHERE UserName = USER_NAME())
		)
	OR (@Migrated = 0 
		AND NOT EXISTS (SELECT 1 FROM dbo.SeeMigrated WHERE UserName = USER_NAME())
		)

And now this

SELECT DisplayName
FROM dbo.Users
WHERE ID = 12345

Becomes this

RLS that references tables will break queries. Sometimes, the added complexity just made the optimizer fall over. Other times useful query transformations broke.

Note that the above query plan has nine extra nodes. Imagine a relatively complex query, where most of the involved tables have RLS. Let me tell you, things break. You might say, oh don’t worry, we’ll only add to core tables. No, that’s not how it will work. Bad data is expensive. The business, at least during a migration, cares about data quality. If you have RLS functioning fine on N tables, there will be pressure to make it N+1, just to be safe.

There’s a specific and common transformation that breaks when RLS reads other tables, the TOP-MAX transformation.

Here’s the normal version, at 0ms.

SELECT MAX(ID)
FROM dbo.Users

And here it is after RLS, scanning your tables and taking 116ms

Any place you rely on the TOP-MAX transformation, you will have to rewrite as an explicit TOP 1.

Recommendations

Try as hard as you reasonably can to avoid external references in your RLS function. We still ended up with one – we landed on a single look-up-everything table, and it didn’t break as much as other multi-table functions did.

Expect your coverage to grow beyond initial estimates, whether users who need access or tables that need protection.

You can hot-swap your RLS function with ALTER SECURITY POLICY. It still takes a SCH-M lock on the affected table, but it’s a lot better than dropping your security policy in order to alter the function.

All-in-all, I’m mostly ok with our decision. It meant a lot of work ahead of time (including adding the migration column to tables – don’t forget to include in the indexes as well). But when the migration day came, we turned RLS off, and everything just worked. Given typical cutover craziness, that’s not a bad tradeoff to make.

A Distinct (Query) Murder Mystery

A #sqlhelp member runs through the Slack channel, “Help! Help! My insert has been killed!” Bystanders gather around the still-warm body to stare at the soulless consolas note fixed to its chest.

Rough way to go, that.

Clamor erupts immediately. Accusations fly. Anyone remotely resembling a butler starts edging away. The professionals wait for evidence, but what they see next raises a line of eyebrows.

INSERT [Target] (Id)
SELECT DISTINCT Id
FROM [Source] s
WHERE SomeCondition = 1
AND NOT EXISTS (
	SELECT 1
	FROM [Target] t
	WHERE t.Id = s.Id
	)

“A duplicate despite a DISTINCT bodyguard…” one mutters. “This isn’t the normal level of foul play.”

Mods arrive to cordon off the scene. Twelve different butlers have already been accused, but each has a solid alibi, even MAXDOP.

Generic Protagonist paces back and forth. They’re waiting for the forensics report. Finally it arrives.

“Intriguing.” Protagonist mutters under their breath. Hapless Policeman #5 is forced to listen to an analysis. “There are two possible culprits here: a concurrent session inserting the duplicate value after this one has performed the NOT EXISTS check, or the query is pulling duplicate values somehow from the Source table.”

Generic leans closer and puts on their glasses. “And the DISTINCT bodyguard is missing! No operator here performs that duty, even though the query requested a protection detail.” A quick check with the witness confirms that the column has a unique constraint on the Source table. “Looks like Concurrent Inserts just became our number one suspect.”

A lone, sputtering lightbulb hangs over a table. “Did you do it?” Generic Protagonist asks from opposite Concurrent Insert Theory. “Did you insert a row into the target table while another query was trying to insert the same row?”

Being a barely-anthropomorphized query pattern, it says nothing. Protagonist scoffs, “I knew you’d say that. That’s what they all say. But what about…EVIDENCE!”

Generic Protagonist is unable to contain themselves and starts pacing the small room. “We both know that Serializable Isolation Level is required to prevent new rows entering the queried range before a transaction is finished. The range locks on the target table will last until the query is finished, preventing problematic inserts and stopping your duplicate PK crime spree.”

Protagonist waves a sheet of paper, “And guess what I just had the lab check? That’s right, Serializable stops the duplicate PK error.” Concurrent Insert Theory says nothing. Generic leans back smugly, “I thought so. Guards! Take it away.”

Generic is about to file away the notes when a witness walks in. “More evidence for you,” he says, “Repeatable Read also prevents the duplicate insert.” Protagonist freezes, processing. “But Repeatable Read only holds locks on existing rows until the end of a transaction – it doesn’t prevent new inserts…” A pause follows. “I may have made a terrible mistake.”

It takes a late night call and some favors, but a lab tech is found who can run another test. What about a test version of the query on a new target that is guaranteed to run by itself? It fails. It fails! They thank the lab tech, who responds, “Of course! Anything when a barely-anthropomorphized query pattern is at stake!” It is not Concurrent Inserts. It must be Duplicates from Source. But how?

Generic Protagonist turns to the witness. What isolation level was the query running under? “Read Committed” is the answer. “Ah,” Generic replies, “are you aware of the criminological paper, Read Committed is a Garbage Isolation Level? I think I know how the murder happened.”

Various suspects, witnesses, and even a butler or two are gathered into a room. “I’m sure you’re wondering why I called you here.” Generic proclaims. “Actually no,” says one of the butlers, “every generic detective has a scene where they reveal the culprit to a room of people. Just get to the point.” Protagonist sighs, and gestures to two attendees. “Fine. They did it. It was Duplicates from Source along with its accomplice, Migrating Data.”

They continue, “Read Committed is what allowed them to enact the crime. You see, there were concurrent updates in the Source table that would move rows around, causing some to be read twice. And no operator enforced distinct results from the query, because there already was a constraint and the engine didn’t think it was necessary.”

Duplicates from Source stares blankly as officers take it away, destined to spend the rest of its days under Repeatable Read.

Satisfied with a case well-solved, Generic Protagonist adds a final entry to the case file, titled “Evidence: Repro”

————————————————————-

Create a copy of the Users table from StackOverflow2013 in your test database, a target table, and an index.

USE TestDB

CREATE TABLE dbo.UsersCopy
(
[Id] [int] NOT NULL PRIMARY KEY CLUSTERED,
[AboutMe] [nvarchar] (max) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Age] [int] NULL,
[CreationDate] [datetime] NOT NULL,
[DisplayName] [nvarchar] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[DownVotes] [int] NOT NULL,
[EmailHash] [nvarchar] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[LastAccessDate] [datetime] NOT NULL,
[Location] [nvarchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Reputation] [int] NOT NULL,
[UpVotes] [int] NOT NULL,
[Views] [int] NOT NULL,
[WebsiteUrl] [nvarchar] (200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[AccountId] [int] NULL
)
GO

CREATE INDEX IX_UsersCopy_SomeColumns
ON dbo.UsersCopy (Age, DownVotes)
INCLUDE (CreationDate)
GO

CREATE TABLE dbo.UsersTarget (
[Id] [int] NOT NULL PRIMARY KEY CLUSTERED
)
GO

INSERT dbo.UsersCopy
SELECT *
FROM StackOverflow2013.dbo.Users
GO

--is rcsi off?
USE [master]
GO
ALTER DATABASE [TestDB] SET READ_COMMITTED_SNAPSHOT OFF WITH NO_WAIT
GO

Scanning the index takes ~80ms. Let’s use WAITFOR TIME statements to run an update 10ms after the insert started reading from the source table.

WAITFOR TIME '06:16:25.462'

INSERT dbo.UsersTarget
SELECT DISTINCT ID
FROM dbo.UsersCopy u
WHERE CreationDate = '2008-07-31 21:57:06.240'
AND NOT EXISTS (
	SELECT 1
	FROM dbo.UsersTarget ut
	WHERE ut.Id = u.Id
	)

WAITFOR TIME '06:16:25.472'

UPDATE dbo.UsersCopy
SET DownVotes = 1000000
WHERE ID = 10

And a helper script for the WAITFORs

DECLARE @dt TIME(3) = DATEADD(SECOND,40,SYSDATETIME())

PRINT 'WAITFOR TIME '''+CONVERT(VARCHAR(40),@dt)+''''
PRINT 'WAITFOR TIME '''+CONVERT(VARCHAR(40),DATEADD(MILLISECOND,10,@dt))+''''

Cleanup/reset in case you missed something or want to test multiple times.

UPDATE dbo.UsersCopy
SET DownVotes = 0
WHERE ID = 10

truncate table dbo.UsersTarget

The prosecution rests.

Left and Right Deep Hash Joins

There’s a lot already written about left versus right deep hash joins. Unfortunately for us SQL Server nerds, “left” and “right” don’t make as much sense in SSMS query plans – those have a different orientation than the trees of database theory.

But if you rotate plans, you can see left and right that make sense (even if they still don’t match canonical shapes). Just follow the join operators.

The great thing about left deep plans is that only a couple of hash tables need to exist at a time, saving memory.

Whereas a right-deep hash join plan needs to keep all the hash tables around until it’s done.

So why would SQL Server ever choose a right-deep plan? There are two key ideas to this. First, each shape has just one privileged position, a single table that is only part of probes, never needing to fill up a hash table, never adding to the memory overhead. Review the two plans above – can you spot which in each never contributes to a build?

Second, SQL Server has to pick just two tables to start with (more than two is an n-ary join, which may theoretically exist). Certain join patterns let you choose any table to start with, while some severely limit choices. Star joins, a common data warehouse pattern, require the central table to be part of the initial join.

I had a vague intuition of this part, but drawing it out really helped me – here are three basic patterns I tend to see:

First, an N-ary join, where all tables join on an equivalent value.

SELECT *
FROM Users A
JOIN BannedUsers B
   ON A.UserID = B.UserID
JOIN Donors C
   ON A.UserID = C.UserID
JOIN Achievements D
   ON C.UserID = D.UserID

Then there are what I call chained joins (I assume there’s an official name, but I don’t know it).

SELECT *
FROM Users A
JOIN Vacations B
   ON A.UserID = B.UserID
JOIN Itineraries C
   ON B.VacationID = C.VacationID
JOIN Destinations D
   ON C.DestinationID = D.DestinationID

Finally, and most importantly for this topic, there are star joins, where everything relates back to a central table.

SELECT *
FROM UserFacts A
JOIN NameDim B
   ON A.NameID = B.NameID
JOIN AddressDim C
   ON A.AddressID = c.AddressID
JOIN DemographicsDim D
   ON A.DemographicsID = D.DemographicsID

In a star join, that central table is usually massive. Billions of rows into a hash table is typically a Bad Idea. And since all joins involve that table, SQL Server can’t delay it until the privileged position of a left deep plan. Instead, there will be a right deep hash join plan, and the massive table will be placed in the privileged position that never fills up a hash table.

So right-deep hash join plans exist because sticking gajillions of rows into a hash build tends to go poorly, and the optimizer picks a right deep plan to avoid blowing out your tempdb. Hurray for the optimizer!…this time.