Giant IN Lists of Doom

I like to make fun of both Entity Framework and junior developers for the terrible SQL they often produce. Sometimes, though, there’s the rampaging Godzilla of an ORM written by a junior developer baked into irreplaceable stage 4 legacy code. Guess where I got my inspiration?

Testing Limits

Today’s antipattern is the enormous IN list (or WHERE clause), and the first and obvious (and entertaining) question is, is there a limit to the number of IN values. Thankfully this is easy to test, and the answer is…

Aww dangit I broke SSMS.

paging @SSMSCrashedAgain

Ok let’s try this a different way – the limit answer is…

Kind of. There’s not a hard coded ceiling, but rather, you can run the parser/compiler out of memory. For me the limit is somewhere above a million values.

In the spirit of Hold My Beer demos, I created a proc to do the testing for me:

CREATE OR ALTER PROC dbo.Hardcoded_Doom(@i INT)
AS
BEGIN

DECLARE @sql NVARCHAR(MAX)
DECLARE @InList NVARCHAR(MAX)

;WITH CTE AS (
	SELECT TOP(@i) 
		1 Num --(ROW_NUMBER() OVER(ORDER BY 1/0)) AS Num
	FROM dbo.Votes a,
	dbo.Votes b
	)

SELECT @InList = STRING_AGG(CONVERT(VARCHAR(MAX),Num),',')
FROM CTE

SET @sql = '
SELECT COUNT(*)
FROM dbo.Votes
WHERE ID IN ('+@InList+')
OPTION(RECOMPILE)
'

EXEC sp_executesql @sql

END

To minimize query size, the IN list is nothing but the value 1 repeated, but there’s something interesting here: SQL Server takes the time to remove duplicates from the IN list. Also, at a certain number of distinct values (65 for me), SQL Server starts separating them into a Constant Scan.

Duplicate removal implies sorting the values, which in turn implies memory consumption.

Literal values aren’t the only way to make IN lists however – I once encountered an EF turd that ran into a system limit by making every single IN list value its own variable. Let’s do that too!

CREATE OR ALTER PROC dbo.Variable_Doom(@i INT)
AS
BEGIN

DECLARE @sql NVARCHAR(MAX)
DECLARE @VarList NVARCHAR(MAX)
DECLARE @InVarList NVARCHAR(MAX)


;WITH CTE AS (
	SELECT TOP(@i) 
		'@'+CONVERT(VARCHAR(10),(ROW_NUMBER() OVER(ORDER BY 1/0))) AS VarNum,
		1 AS Num
	FROM dbo.Votes a,
	dbo.Votes b
	)

SELECT @VarList = STRING_AGG(CONVERT(VARCHAR(MAX),VarNum),' INT = 1,'),
@InVarList = STRING_AGG(CONVERT(VARCHAR(MAX),VarNum),',')
FROM CTE

SET @VarList = 'DECLARE '+@VarList + ' INT = 1'

SET @sql = @VarList + N'
SELECT COUNT(*)
FROM dbo.Votes
WHERE ID IN ('+@InVarList+')
OPTION(RECOMPILE)
'

EXEC sp_executesql @sql

END
Wheeeeeeee!!!

Compilation Time

Good DBAs will remind you that just because you can do something doesn’t mean you should. Ignore them, this is science. Another fun side effect of these bad queries is high compilation cost. A simple query against Votes with a 100 value IN list (using distinct values) took 3ms to compile. 1000 distinct values took 39ms and 10000 values 640ms. Interestingly, 10000 identical IN list values took only 115ms to compile. It looks like the number of distinct values affects compilation time.

Large IN lists make already complicated views even worse to compile. sys.tables is actually a complicated view, so let’s demo with it. The below compiles in under 50ms for me:

SELECT *
FROM sys.tables t
JOIN sys.columns c
	ON c.object_id = t.object_id

But add a 1000 value IN list, and compilation is now over 200ms. I’ve seen cases where a really complicated view had an enormous IN list and compilation took 10 minutes. (And took locks that prevented AG redo the whole time)

What happens when half a dozen threads try to run Hardcoded_Doom at the same time? (Special thanks to Adam Machanic’s SQLQueryStress). Compilation contention!

Isn’t science fun?

WHERE Clause of Even More Doom

Giant OR lists? Yeah, we can do those too. Guess what super special failure mode they have? Combining a single OR with another condition.

Instead of a fast thousand seeks, we now have a Filter operator taking a full minute. And yes, I’ve seen this one in production too.

Conclusion

Giant IN lists are a giant wrecking ball of dumb. Let me recount, these are things I’ve broken or seen broken with large IN lists: memory limit, SSMS, variable limit, compilation limit, AG redo, Query Store, plan cache, and query performance. Please teach your developers to use table types and temp tables instead, and for the love of all the SQL Bobs, don’t let them roll their own ORM.

Leave a Reply

Your email address will not be published. Required fields are marked *