Performance Impact of Small Queries

All performance statistics are lies, but some are useful. Yes, just some, and opinions differ about which ones. If you want to start a nerd fight, find a cluster of Senior DBAs at SQL Saturday and tell them how much you love Page Life Expectancy.

As a tuning aficionado, I happen to adore CPU metrics, such as from exec_query_stats and Query Store. They tend to be extremely helpful in finding a server’s worst offending queries. But I see cases where a tiny SELECT hitting a million times had more effect on the server than Query Store CPU stats would suggest.

This makes sense – every query is going to have overhead, such as sending rows to the client, that probably isn’t captured in measurements. But can we prove it, and maybe get a sense of how much is missed?

Let’s create a small query, spam it against a server, and compare what the server says about its CPU to SQL Server’s metrics. Here’s the demo table, and a proc to seek a single row from it (designed in a way that doesn’t actually return any rows).

DROP TABLE IF EXISTS dbo.seektest
GO

CREATE TABLE dbo.seektest (
ID INT IDENTITY PRIMARY KEY,
junk CHAR(1)
)

INSERT dbo.seektest
SELECT TOP (5000000) 'a'
FROM master.dbo.spt_values a
CROSS JOIN master.dbo.spt_values b
GO

CREATE PROC dbo.SeekTesting1
(@ID int)
AS
BEGIN

DECLARE @b int

SELECT @b = ID
FROM dbo.SeekTest
WHERE ID = @ID

END

 

Now use a script to loop this pointless busy work (I’m channeling my inner junior developer here). We don’t want to use a program on the same server, since that will affect CPU. I’m lucky enough to have another large server available on the same network, where I run this PowerShell script adapted from Merrill Aldrich. (The original link to SQL Blog is dead, but you can find Google’s cached version when searching for “Quick and Dirty PowerShell SQL Server Load Test”)

$numWorkers = 32

$startTime = ( Get-Date ).AddSeconds(20)
$endTime = ( Get-Date ).AddSeconds(30)

$jobscript = {
    param( $startTime, $endTime, $conn )
         
    while ( ( Get-Date ) -lt $startTime ) { Start-Sleep -Milliseconds 100 }
    
    while ( ( Get-Date ) -lt $endTime ) { 
        $n = Get-Random -Maximum 5000001;
        Invoke-Sqlcmd -Query "EXEC dbo.SeekTesting1 $n" -Server "YourTestServer" -Database "TestDB";
        }
}

(1..$numWorkers) | foreach {
    Start-Job $jobscript -ArgumentList $startTime,$endTime,$conn
}

#Run below line for cleanup
#Get-Job | Remove-Job

Launching the script gives me a 10 second window where I see CPU climb from 1% to about 20%. Since my test rig is 48 cores, I calculate this as about 91s of added CPU activity. Then I compare to what exists in dmvs/Query Store.

SELECT st.text, qs.*
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
WHERE qs.query_hash = 0x0946FE740B52871B

Both of these show about 5s of CPU for the query – way less than 91s. By the way, you can reset stats/QDS with the below script between runs (you are using a test server, right?)

DBCC FREEPROCCACHE
GO

ALTER DATABASE TestDB SET QUERY_STORE CLEAR;

I pulled up PerfView to see where time was spent, and grabbed a flame graph. The boxed section is the actual query execution, and probably the only part that gets measured in SQL Server. The rest goes to things like parsing, checking security, and sending data. You know, fluff.

I ran multiple tests and variations, such as single threaded execution and adjusting the proc to return data. Queries were consistent in taking an additional ~.2ms of CPU overhead per execution, beyond what dmvs and QDS reported. Additional details for the curious: SQL Server 2016 SP2 Developer, 3.18GHz clock.

I imagine this overhead number is highly dependent on other factors, but it’s good enough for me to start estimating the actual costs of those tiny queries that get repeated a bajillion times. You might know them, the ones where developers loop through a couple hundred thousand values because of EF?

When I do the math, for a 32-core server, a 10% increase in CPU would need 57.6 million executions an hour, or 16k/sec. Generalized, a 10% CPU overhead needs 500 executions/sec/core. Despite a lot of assumptions, I’m happy to have a rule of thumb to start from. I hope you find the estimate useful too, but remember: this counts as a performance stat, and all performance stats are lies.

Round Robin Hunting

Simple does not equal boring. Should not. But I have a really bad habit of ignoring “simple” things I think I understand, only to find out later that some subtle interaction turns “boring” into “aggggghhhh!” This was the case with Round Robin Distribute Streams. Here’s its mugshot:

All it does is take rows from a single producer thread and distribute them in turn, or round robin, to multiple consumer threads. It looks something like this:

Simple, right? Nothing else to see, time to move on, etc? That’s what I thought, until I started seeing Round Robin pop up in a bizarre class of not-so-simple plans, plans involving intra-query parallel deadlocks.

Parallel deadlocks are a fascinating topic I hope to cover later, but let’s start just by reproducing a Round Robin. It turns out that hunting for this operator is an instructive journey in itself.

Usually parallelism occurs on large tables, and access on the base tables is parallel without a need to distribute streams, like below:

Also, certain conditions need to be true for parallelism to occur. For example, no matter how large your table is, a simple SELECT * is going to be serial. Stuffing rows into network packets is done serially, and if there’s no other work to distribute, the whole plan avoids parallelism. Add a WHERE clause or change to a SELECT COUNT(*), and you can start to see it.

Maybe if the table is really small, SQL Server will distribute streams instead of parallel scanning. Let’s build a small table:

DROP TABLE IF EXISTS dbo.boringtable
GO

CREATE TABLE dbo.boringtable (
ID INT IDENTITY PRIMARY KEY,
yawn CHAR(100)

--1 row into the boring table (can change number of rows through TOP(N))
INSERT dbo.boringtable
SELECT TOP (1) REPLICATE('z',100)
FROM master..spt_values a
CROSS JOIN master..spt_values b
CROSS JOIN master..spt_values c

Now, let’s construct a parallel query, and since this is just a test, I’m going to cheat a little with trace flags. I use a WHERE clause so SQL Server sees there’s work to distribute, then I use 8649 to bully the optimizer into using parallelism and 8757 to skip a trivial plan (so that the optimizer can consider parallelism).

SELECT *
FROM dbo.boringtable
WHERE yawn LIKE 'z%'
OPTION(
QUERYTRACEON 8649
,QUERYTRACEON 8757
)

Well that’s interesting. Even though the table has one row – and SQL Server knows it – the table is accessed in a parallel way, with just a single lucky thread getting to do anything.

So small tables don’t cause a Distribute Streams. Maybe we can use something that prevents the scan from being parallel. There happen to be numerous options, but my favorite one is the backwards scan. SQL Server can do an ascending scan (assuming that’s the natural index order) in parallel, but not descending. Weird, right? Also useful for demos. Let’s add an ORDER BY DESC.

SELECT *
FROM dbo.boringtable
WHERE yawn LIKE 'z%'
ORDER BY ID DESC
OPTION(
QUERYTRACEON 8649
,QUERYTRACEON 8757
)

That didn’t quite work. Instead of a descending scan, we get a parallel ascending scan followed by a descending sort, because the cost is so low on a single-row table.

The next goal is to prevent the Sort. We can discourage through costing, or use a TOP to force a separate region of the plan.

;WITH CTE AS (
SELECT TOP 1 *
FROM dbo.boringtable
ORDER BY ID DESC
)

SELECT *
FROM CTE
WHERE yawn LIKE 'z%'
OPTION(
QUERYTRACEON 8649
,QUERYTRACEON 8757 --actually unnecessary now
)

Success! Highly contrived, but still a success, and it’s actually similar to cases where I see Round Robin in the wild: a descending scan feeding a parallel section.

Say, what happens if we tell SQL Server this is a large table and then do a descending scan?

UPDATE STATISTICS dbo.BoringTable WITH PAGECOUNT = 10000, ROWCOUNT = 5000000
GO

SELECT TOP 10 *
FROM dbo.boringtable
WHERE yawn LIKE '%a%'
ORDER BY ID DESC
OPTION(
QUERYTRACEON 8649
,QUERYTRACEON 8757
)

Huh…