January 2021 – Forrest Shares Stuff

Of the many instances I’ve seen, it’s a rather rare scenario, where sustained compiles/recompiles contribute substantially to CPU load. The path from clues to conviction was a long one, so buckle up if you want to follow along – this is a lengthier than average post.

First Signs of a Compilation Problem

I’ve found the real-world issue twice, and it’s been two different sources of information that convinced me to look at compilations. The easiest, of course, is the compilations counter. Most monitoring tools will track it for you, but in case you want to check for yourself, here’s a sampling script:

DECLARE @comp TABLE (counter_name varchar(100), cntr_value bigint)

INSERT @comp
SELECT counter_name, cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name LIKE '%compila%'

WAITFOR DELAY '00:00:30'

SELECT pc.counter_name, (pc.cntr_value-c.cntr_value)/30 AS val_per_sec
FROM sys.dm_os_performance_counters pc
JOIN @comp c ON c.counter_name = pc.counter_name
WHERE pc.counter_name LIKE '%compila%'

Another type of clue actually comes from absence. My experience here is from an Azure SQL DB with high CPU but only moderate compilations. However, the top CPU consuming queries were only accounting for a fraction of observed compute. Let me give an example – suppose you have a 4 vCore SQL DB running at around 80% CPU utilization. You’d expect that to be 80% * (available core seconds) = .8*4*3600 = 11520 seconds of CPU time in the last hour. You can check Query Store to get a sense of how much execution compute there is.

You can also query the query store tables directly to find total CPU, but this is left as an exercise for the reader.

Now, the CPU gap could be coming from several sources missed in a cursory analysis, such as single-use plans that Query Store fails to aggregate. But if you’re only finding a fraction of the CPU you expect, there’s something worth investigating.

Of course, I would be remiss in ignoring the plan cache as corroborating evidence. Huge numbers of single-use plans (or duplicate query hashes) are the thing to look for, but it’s not definitive. Some joker may have decided the best approach to parameter sniffing is to plaster OPTION(RECOMPILE) on everything and keep it from being saved in the cache. Yes I’m bitter.

Proving It

Alright – there’s smoke, but is there fire? This is where it gets hard. Here were my real-world scenarios: a 2014 physical server with 5-10 thousand compilations/sec (!) and a 6 vCore SQL DB with a huge amount of missing CPU in Query Store but only(?) about 50 compiles/sec.

Dead End: Query Store

Query Store actually includes some info about compilation in sys.query_store_query. Of course, this wasn’t even an option for the archaic 2014 instance, but on Azure, I tried checking the most recently executed queries and adding up their compilation impact. Unfortunately, this didn’t work – it was orders of magnitude too low. It might have been somewhat due to an AUTO vs ALL capture mode but….it was still a little bit hacky, and the table clearly wasn’t designed for this.

Also, in the case of my Azure SQL DB, the heavy compilation was regularly knocking Query Store into READ ONLY mode, so when I went to find out the problem, there was no recent history. Oops.

XE Struggles

The answer will have to be found with Extended Events, but this is far from trivial. In 2014, there is a *single* event that can capture all compilation CPU, and it’s a chunky one – I mean, we’re talking morbidly obese, mounds of blubber draped over a WalMart scooter chunky: query_post_compilation_showplan.

It’s even one of a handful that MS stapled a warning to: “significant performance overhead.” And of course there’s no way to exclude the query plan from the output, so with each event you could be dealing with huge XML (especially for those complex queries taking a while to compile).

Oh, and the XE description is wrong as well – cpu_time is in milliseconds, not micro.

Azure Efforts

Still, it was the only option available. I went ahead and tried it for the Azure DB since compilations were lower. I didn’t want to wait weeks for approval to set up a storage account, so I just dumped the output into the ring buffer. Oh look, there’s no way to easily view XE output on an Azure SQL DB…

I didn’t want to leave a high-impact XE running, so I just saved the XML into a file, then used PowerShell to parse and aggregate it. It…worked. Mostly. And was painful. Don’t do it this way. It also turned out that the plans were so large that the ring buffer could only hold a couple seconds worth at a time. The data was still enough to prove a very high compilation impact, and let me send likely culprits to a developer buddy.

XE On-Premise

Given my lukewarm success in Azure, I decided to try a similar approach for my on-premise 2014 server, except I would be able to view live data for the XE, so maybe I could push it out to 10+ seconds for a better sample?

I waited until a low activity time, with “just” a couple thousand compiles a second, fired up the XE, and…

SSMS crashed. I tried an even smaller dataset and other tricks, but SSMS just couldn’t handle the sample size needed. I had to find a new approach.

My nemesis was the enormous query plan…if only I could ignore it. My “Aha!” moment came when I realized I could.

The histogram target bucketizes output by a single attribute, so my cheat was to aggregate on cpu_time for a thirty second sample, then math the total CPU spent compiling. For example, if there were 2 queries that took 10s to compile, and 10000 that took 20ms, this would mean that 2*10+10000*.02 = 220s were spent compiling, or 7.3 cores on average in a 30s sample. Here’s the script I came up with. (Special thanks to Kendra Little for the TSQL to query a histogram)

--CREATE EVENT SESSION [Comp_hist] ON SERVER
--ADD EVENT sqlserver.query_post_compilation_showplan
--ADD TARGET package0.histogram(SET filtering_event_name=N'sqlserver.query_post_compilation_showplan',slots=(1024),source=N'cpu_time',source_type=(0))
--WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_MULTIPLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
--GO

SET XACT_ABORT ON

IF OBJECT_ID('tempdb..#hist') IS NOT NULL DROP TABLE #hist
--DROP TABLE IF EXISTS #hist

ALTER EVENT SESSION [Comp_hist] ON SERVER  
STATE = start;  
GO  

DECLARE @start DATETIME2(1) = GETDATE()

WAITFOR DELAY '00:00:30'

--thanks kendra!
SELECT
    xed.slot_data.value('(value)[1]', 'int') AS cpu_ms,
    xed.slot_data.value('(@count)[1]', 'int') AS slotcount
INTO #hist
FROM (
    SELECT
        CAST(xet.target_data AS xml)  as target_data
    FROM sys.dm_xe_session_targets AS xet  
    JOIN sys.dm_xe_sessions AS xe  
       ON (xe.address = xet.event_session_address)  
    WHERE xe.name = 'Comp_hist'
        and target_name='histogram'
    ) as t
CROSS APPLY t.target_data.nodes('//HistogramTarget/Slot') AS xed (slot_data);

DECLARE @end DATETIME2(1) = GETDATE()

ALTER EVENT SESSION [Comp_hist] ON SERVER  
STATE = stop;

SELECT SUM(cpu_ms*slotcount)*1.0/DATEDIFF(millisecond,@start,@end) AS [Avg cores compiling]
FROM #hist

The script was an utter success – not only was the overhead minimal, it captured the data I needed to prove that our largest, priciest server was spending 40% of its compute on plan generation.

Happy Ending

My developer contact really ran (sprinted?) with the info from the Azure DB, and fixed the issues behind the biggest compilation offenders (from Entity Framework). Average CPU dropped from 70% to 20%, such that we can scale down and save money. For the on-premise server, the 40% number was enough to convince developers to don their hazmat suits and make changes to the problematic legacy code. I haven’t heard anything back yet, which might be because the legacy code ate them – it’s that bad.

Better Solution?

I’m still not content with my current approach. For one, Azure SQL DB doesn’t allow histogram targets, so it just doesn’t work there. Also, the XE I use is normally high impact, so I’m still wary of using it in a histogram. After some digging, I found a Debug XE (remember, Debug is the type you have to specifically check in SSMS) that looks promising: sql_statement_post_compile. I’m not sure when it was added, but I see it for Azure and 2019, but not 2014. Next time I’m checking Azure, I’ll test that one out instead.

So in summary, use sql_statement_post_compile if you have access to it, use a histogram if you’re not Azure SQL DB, and don’t let EF generate giant IN lists of hardcoded values that take 300ms to compile. Ick, why did I end with something about EF? Happy thoughts. Happy thoughts! WHY EF WHYYYYYYY????!!!!!!

I like to make fun of both Entity Framework and junior developers for the terrible SQL they often produce. Sometimes, though, there’s the rampaging Godzilla of an ORM written by a junior developer baked into irreplaceable stage 4 legacy code. Guess where I got my inspiration?

Testing Limits

Today’s antipattern is the enormous IN list (or WHERE clause), and the first and obvious (and entertaining) question is, is there a limit to the number of IN values. Thankfully this is easy to test, and the answer is…

Aww dangit I broke SSMS.

Ok let’s try this a different way – the limit answer is…

Kind of. There’s not a hard coded ceiling, but rather, you can run the parser/compiler out of memory. For me the limit is somewhere above a million values.

In the spirit of Hold My Beer demos, I created a proc to do the testing for me:

CREATE OR ALTER PROC dbo.Hardcoded_Doom(@i INT)
AS
BEGIN

DECLARE @sql NVARCHAR(MAX)
DECLARE @InList NVARCHAR(MAX)

;WITH CTE AS (
	SELECT TOP(@i) 
		1 Num --(ROW_NUMBER() OVER(ORDER BY 1/0)) AS Num
	FROM dbo.Votes a,
	dbo.Votes b
	)

SELECT @InList = STRING_AGG(CONVERT(VARCHAR(MAX),Num),',')
FROM CTE

SET @sql = '
SELECT COUNT(*)
FROM dbo.Votes
WHERE ID IN ('+@InList+')
OPTION(RECOMPILE)
'

EXEC sp_executesql @sql

END

To minimize query size, the IN list is nothing but the value 1 repeated, but there’s something interesting here: SQL Server takes the time to remove duplicates from the IN list. Also, at a certain number of distinct values (65 for me), SQL Server starts separating them into a Constant Scan.

Duplicate removal implies sorting the values, which in turn implies memory consumption.

Literal values aren’t the only way to make IN lists however – I once encountered an EF turd that ran into a system limit by making every single IN list value its own variable. Let’s do that too!

CREATE OR ALTER PROC dbo.Variable_Doom(@i INT)
AS
BEGIN

DECLARE @sql NVARCHAR(MAX)
DECLARE @VarList NVARCHAR(MAX)
DECLARE @InVarList NVARCHAR(MAX)


;WITH CTE AS (
	SELECT TOP(@i) 
		'@'+CONVERT(VARCHAR(10),(ROW_NUMBER() OVER(ORDER BY 1/0))) AS VarNum,
		1 AS Num
	FROM dbo.Votes a,
	dbo.Votes b
	)

SELECT @VarList = STRING_AGG(CONVERT(VARCHAR(MAX),VarNum),' INT = 1,'),
@InVarList = STRING_AGG(CONVERT(VARCHAR(MAX),VarNum),',')
FROM CTE

SET @VarList = 'DECLARE '+@VarList + ' INT = 1'

SET @sql = @VarList + N'
SELECT COUNT(*)
FROM dbo.Votes
WHERE ID IN ('+@InVarList+')
OPTION(RECOMPILE)
'

EXEC sp_executesql @sql

END

Compilation Time

Good DBAs will remind you that just because you can do something doesn’t mean you should. Ignore them, this is science. Another fun side effect of these bad queries is high compilation cost. A simple query against Votes with a 100 value IN list (using distinct values) took 3ms to compile. 1000 distinct values took 39ms and 10000 values 640ms. Interestingly, 10000 identical IN list values took only 115ms to compile. It looks like the number of distinct values affects compilation time.

Large IN lists make already complicated views even worse to compile. sys.tables is actually a complicated view, so let’s demo with it. The below compiles in under 50ms for me:

SELECT *
FROM sys.tables t
JOIN sys.columns c
	ON c.object_id = t.object_id

But add a 1000 value IN list, and compilation is now over 200ms. I’ve seen cases where a really complicated view had an enormous IN list and compilation took 10 minutes. (And took locks that prevented AG redo the whole time)

What happens when half a dozen threads try to run Hardcoded_Doom at the same time? (Special thanks to Adam Machanic’s SQLQueryStress). Compilation contention!

WHERE Clause of Even More Doom

Giant OR lists? Yeah, we can do those too. Guess what super special failure mode they have? Combining a single OR with another condition.

Instead of a fast thousand seeks, we now have a Filter operator taking a full minute. And yes, I’ve seen this one in production too.

Conclusion

Giant IN lists are a giant wrecking ball of dumb. Let me recount, these are things I’ve broken or seen broken with large IN lists: memory limit, SSMS, variable limit, compilation limit, AG redo, Query Store, plan cache, and query performance. Please teach your developers to use table types and temp tables instead, and for the love of all the SQL Bobs, don’t let them roll their own ORM.

Month: January 2021

How to Measure Compilation Impact