I like to make fun of both Entity Framework and junior developers for the terrible SQL they often produce. Sometimes, though, there’s the rampaging Godzilla of an ORM written by a junior developer baked into irreplaceable stage 4 legacy code. Guess where I got my inspiration?
Testing Limits
Today’s antipattern is the enormous IN list (or WHERE clause), and the first and obvious (and entertaining) question is, is there a limit to the number of IN values. Thankfully this is easy to test, and the answer is…
Aww dangit I broke SSMS.
Ok let’s try this a different way – the limit answer is…
Kind of. There’s not a hard coded ceiling, but rather, you can run the parser/compiler out of memory. For me the limit is somewhere above a million values.
In the spirit of Hold My Beer demos, I created a proc to do the testing for me:
CREATE OR ALTER PROC dbo.Hardcoded_Doom(@i INT) AS BEGIN DECLARE @sql NVARCHAR(MAX) DECLARE @InList NVARCHAR(MAX) ;WITH CTE AS ( SELECT TOP(@i) 1 Num --(ROW_NUMBER() OVER(ORDER BY 1/0)) AS Num FROM dbo.Votes a, dbo.Votes b ) SELECT @InList = STRING_AGG(CONVERT(VARCHAR(MAX),Num),',') FROM CTE SET @sql = ' SELECT COUNT(*) FROM dbo.Votes WHERE ID IN ('+@InList+') OPTION(RECOMPILE) ' EXEC sp_executesql @sql END
To minimize query size, the IN list is nothing but the value 1 repeated, but there’s something interesting here: SQL Server takes the time to remove duplicates from the IN list. Also, at a certain number of distinct values (65 for me), SQL Server starts separating them into a Constant Scan.
Duplicate removal implies sorting the values, which in turn implies memory consumption.
Literal values aren’t the only way to make IN lists however – I once encountered an EF turd that ran into a system limit by making every single IN list value its own variable. Let’s do that too!
CREATE OR ALTER PROC dbo.Variable_Doom(@i INT) AS BEGIN DECLARE @sql NVARCHAR(MAX) DECLARE @VarList NVARCHAR(MAX) DECLARE @InVarList NVARCHAR(MAX) ;WITH CTE AS ( SELECT TOP(@i) '@'+CONVERT(VARCHAR(10),(ROW_NUMBER() OVER(ORDER BY 1/0))) AS VarNum, 1 AS Num FROM dbo.Votes a, dbo.Votes b ) SELECT @VarList = STRING_AGG(CONVERT(VARCHAR(MAX),VarNum),' INT = 1,'), @InVarList = STRING_AGG(CONVERT(VARCHAR(MAX),VarNum),',') FROM CTE SET @VarList = 'DECLARE '+@VarList + ' INT = 1' SET @sql = @VarList + N' SELECT COUNT(*) FROM dbo.Votes WHERE ID IN ('+@InVarList+') OPTION(RECOMPILE) ' EXEC sp_executesql @sql END
Compilation Time
Good DBAs will remind you that just because you can do something doesn’t mean you should. Ignore them, this is science. Another fun side effect of these bad queries is high compilation cost. A simple query against Votes with a 100 value IN list (using distinct values) took 3ms to compile. 1000 distinct values took 39ms and 10000 values 640ms. Interestingly, 10000 identical IN list values took only 115ms to compile. It looks like the number of distinct values affects compilation time.
Large IN lists make already complicated views even worse to compile. sys.tables
is actually a complicated view, so let’s demo with it. The below compiles in under 50ms for me:
SELECT * FROM sys.tables t JOIN sys.columns c ON c.object_id = t.object_id
But add a 1000 value IN list, and compilation is now over 200ms. I’ve seen cases where a really complicated view had an enormous IN list and compilation took 10 minutes. (And took locks that prevented AG redo the whole time)
What happens when half a dozen threads try to run Hardcoded_Doom
at the same time? (Special thanks to Adam Machanic’s SQLQueryStress). Compilation contention!
WHERE Clause of Even More Doom
Giant OR lists? Yeah, we can do those too. Guess what super special failure mode they have? Combining a single OR with another condition.
Instead of a fast thousand seeks, we now have a Filter operator taking a full minute. And yes, I’ve seen this one in production too.
Conclusion
Giant IN lists are a giant wrecking ball of dumb. Let me recount, these are things I’ve broken or seen broken with large IN lists: memory limit, SSMS, variable limit, compilation limit, AG redo, Query Store, plan cache, and query performance. Please teach your developers to use table types and temp tables instead, and for the love of all the SQL Bobs, don’t let them roll their own ORM.