How to Measure Compilation Impact

Of the many instances I’ve seen, it’s a rather rare scenario, where sustained compiles/recompiles contribute substantially to CPU load. The path from clues to conviction was a long one, so buckle up if you want to follow along – this is a lengthier than average post.

First Signs of a Compilation Problem

I’ve found the real-world issue twice, and it’s been two different sources of information that convinced me to look at compilations. The easiest, of course, is the compilations counter. Most monitoring tools will track it for you, but in case you want to check for yourself, here’s a sampling script:

DECLARE @comp TABLE (counter_name varchar(100), cntr_value bigint)

INSERT @comp
SELECT counter_name, cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name LIKE '%compila%'

WAITFOR DELAY '00:00:30'

SELECT pc.counter_name, (pc.cntr_value-c.cntr_value)/30 AS val_per_sec
FROM sys.dm_os_performance_counters pc
JOIN @comp c ON c.counter_name = pc.counter_name
WHERE pc.counter_name LIKE '%compila%'

Another type of clue actually comes from absence. My experience here is from an Azure SQL DB with high CPU but only moderate compilations. However, the top CPU consuming queries were only accounting for a fraction of observed compute. Let me give an example – suppose you have a 4 vCore SQL DB running at around 80% CPU utilization. You’d expect that to be 80% * (available core seconds) = .8*4*3600 = 11520 seconds of CPU time in the last hour. You can check Query Store to get a sense of how much execution compute there is.

You can also query the query store tables directly to find total CPU, but this is left as an exercise for the reader.

Now, the CPU gap could be coming from several sources missed in a cursory analysis, such as single-use plans that Query Store fails to aggregate. But if you’re only finding a fraction of the CPU you expect, there’s something worth investigating.

Of course, I would be remiss in ignoring the plan cache as corroborating evidence. Huge numbers of single-use plans (or duplicate query hashes) are the thing to look for, but it’s not definitive. Some joker may have decided the best approach to parameter sniffing is to plaster OPTION(RECOMPILE) on everything and keep it from being saved in the cache. Yes I’m bitter.

Proving It

Alright – there’s smoke, but is there fire? This is where it gets hard. Here were my real-world scenarios: a 2014 physical server with 5-10 thousand compilations/sec (!) and a 6 vCore SQL DB with a huge amount of missing CPU in Query Store but only(?) about 50 compiles/sec.

Dead End: Query Store

Query Store actually includes some info about compilation in sys.query_store_query. Of course, this wasn’t even an option for the archaic 2014 instance, but on Azure, I tried checking the most recently executed queries and adding up their compilation impact. Unfortunately, this didn’t work – it was orders of magnitude too low. It might have been somewhat due to an AUTO vs ALL capture mode but….it was still a little bit hacky, and the table clearly wasn’t designed for this.

Also, in the case of my Azure SQL DB, the heavy compilation was regularly knocking Query Store into READ ONLY mode, so when I went to find out the problem, there was no recent history. Oops.

XE Struggles

The answer will have to be found with Extended Events, but this is far from trivial. In 2014, there is a *single* event that can capture all compilation CPU, and it’s a chunky one – I mean, we’re talking morbidly obese, mounds of blubber draped over a WalMart scooter chunky: query_post_compilation_showplan.

It’s even one of a handful that MS stapled a warning to: “significant performance overhead.” And of course there’s no way to exclude the query plan from the output, so with each event you could be dealing with huge XML (especially for those complex queries taking a while to compile).

Oh, and the XE description is wrong as well – cpu_time is in milliseconds, not micro.

derp

Azure Efforts

Still, it was the only option available. I went ahead and tried it for the Azure DB since compilations were lower. I didn’t want to wait weeks for approval to set up a storage account, so I just dumped the output into the ring buffer. Oh look, there’s no way to easily view XE output on an Azure SQL DB…

I didn’t want to leave a high-impact XE running, so I just saved the XML into a file, then used PowerShell to parse and aggregate it. It…worked. Mostly. And was painful. Don’t do it this way. It also turned out that the plans were so large that the ring buffer could only hold a couple seconds worth at a time. The data was still enough to prove a very high compilation impact, and let me send likely culprits to a developer buddy.

XE On-Premise

Given my lukewarm success in Azure, I decided to try a similar approach for my on-premise 2014 server, except I would be able to view live data for the XE, so maybe I could push it out to 10+ seconds for a better sample?

I waited until a low activity time, with “just” a couple thousand compiles a second, fired up the XE, and…

SSMS crashed. I tried an even smaller dataset and other tricks, but SSMS just couldn’t handle the sample size needed. I had to find a new approach.

My nemesis was the enormous query plan…if only I could ignore it. My “Aha!” moment came when I realized I could.

The histogram target bucketizes output by a single attribute, so my cheat was to aggregate on cpu_time for a thirty second sample, then math the total CPU spent compiling. For example, if there were 2 queries that took 10s to compile, and 10000 that took 20ms, this would mean that 2*10+10000*.02 = 220s were spent compiling, or 7.3 cores on average in a 30s sample. Here’s the script I came up with. (Special thanks to Kendra Little for the TSQL to query a histogram)

--CREATE EVENT SESSION [Comp_hist] ON SERVER
--ADD EVENT sqlserver.query_post_compilation_showplan
--ADD TARGET package0.histogram(SET filtering_event_name=N'sqlserver.query_post_compilation_showplan',slots=(1024),source=N'cpu_time',source_type=(0))
--WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_MULTIPLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
--GO

SET XACT_ABORT ON

IF OBJECT_ID('tempdb..#hist') IS NOT NULL DROP TABLE #hist
--DROP TABLE IF EXISTS #hist

ALTER EVENT SESSION [Comp_hist] ON SERVER  
STATE = start;  
GO  

DECLARE @start DATETIME2(1) = GETDATE()

WAITFOR DELAY '00:00:30'

--thanks kendra!
SELECT
    xed.slot_data.value('(value)[1]', 'int') AS cpu_ms,
    xed.slot_data.value('(@count)[1]', 'int') AS slotcount
INTO #hist
FROM (
    SELECT
        CAST(xet.target_data AS xml)  as target_data
    FROM sys.dm_xe_session_targets AS xet  
    JOIN sys.dm_xe_sessions AS xe  
       ON (xe.address = xet.event_session_address)  
    WHERE xe.name = 'Comp_hist'
        and target_name='histogram'
    ) as t
CROSS APPLY t.target_data.nodes('//HistogramTarget/Slot') AS xed (slot_data);

DECLARE @end DATETIME2(1) = GETDATE()

ALTER EVENT SESSION [Comp_hist] ON SERVER  
STATE = stop;

SELECT SUM(cpu_ms*slotcount)*1.0/DATEDIFF(millisecond,@start,@end) AS [Avg cores compiling]
FROM #hist

The script was an utter success – not only was the overhead minimal, it captured the data I needed to prove that our largest, priciest server was spending 40% of its compute on plan generation.

Happy Ending

My developer contact really ran (sprinted?) with the info from the Azure DB, and fixed the issues behind the biggest compilation offenders (from Entity Framework). Average CPU dropped from 70% to 20%, such that we can scale down and save money. For the on-premise server, the 40% number was enough to convince developers to don their hazmat suits and make changes to the problematic legacy code. I haven’t heard anything back yet, which might be because the legacy code ate them – it’s that bad.

Better Solution?

I’m still not content with my current approach. For one, Azure SQL DB doesn’t allow histogram targets, so it just doesn’t work there. Also, the XE I use is normally high impact, so I’m still wary of using it in a histogram. After some digging, I found a Debug XE (remember, Debug is the type you have to specifically check in SSMS) that looks promising: sql_statement_post_compile. I’m not sure when it was added, but I see it for Azure and 2019, but not 2014. Next time I’m checking Azure, I’ll test that one out instead.

So in summary, use sql_statement_post_compile if you have access to it, use a histogram if you’re not Azure SQL DB, and don’t let EF generate giant IN lists of hardcoded values that take 300ms to compile. Ick, why did I end with something about EF? Happy thoughts. Happy thoughts! WHY EF WHYYYYYYY????!!!!!!

One thought on “How to Measure Compilation Impact”

  1. Hello Forrest — In Azure you should have access to query_parameterization_data (not the most discoverable name, for historical reasons), with compile duration and CPU time. This doesn’t collect showplan, so it’s fairly lightweight. It’s also available for the on-prem versions, starting with SQL Server 2019 I believe.

Leave a Reply

Your email address will not be published. Required fields are marked *