June 2019 – Forrest Shares Stuff

DROP TABLE IF EXISTS dbo.seektest GO CREATE TABLE dbo.seektest ( ID INT IDENTITY PRIMARY KEY, junk CHAR(1) ) INSERT dbo.seektest SELECT TOP (5000000) 'a' FROM master.dbo.spt_values a CROSS JOIN master.dbo.spt_values b GO CREATE PROC dbo.SeekTesting1 (@ID int) AS BEGIN DECLARE @b int SELECT @b = ID FROM dbo.SeekTest WHERE ID = @ID END

$numWorkers = 32 $startTime = ( Get-Date ).AddSeconds(20) $endTime = ( Get-Date ).AddSeconds(30) $jobscript = { param( $startTime, $endTime, $conn ) while ( ( Get-Date ) -lt $startTime ) { Start-Sleep -Milliseconds 100 } while ( ( Get-Date ) -lt $endTime ) { $n = Get-Random -Maximum 5000001; Invoke-Sqlcmd -Query "EXEC dbo.SeekTesting1 $n" -Server "YourTestServer" -Database "TestDB"; } } (1..$numWorkers) | foreach { Start-Job $jobscript -ArgumentList $startTime,$endTime,$conn } #Run below line for cleanup #Get-Job | Remove-Job

Simple does not equal boring. Should not. But I have a really bad habit of ignoring “simple” things I think I understand, only to find out later that some subtle interaction turns “boring” into “aggggghhhh!” This was the case with Round Robin Distribute Streams. Here’s its mugshot:

All it does is take rows from a single producer thread and distribute them in turn, or round robin, to multiple consumer threads. It looks something like this:

Simple, right? Nothing else to see, time to move on, etc? That’s what I thought, until I started seeing Round Robin pop up in a bizarre class of not-so-simple plans, plans involving intra-query parallel deadlocks.

Parallel deadlocks are a fascinating topic I hope to cover later, but let’s start just by reproducing a Round Robin. It turns out that hunting for this operator is an instructive journey in itself.

Usually parallelism occurs on large tables, and access on the base tables is parallel without a need to distribute streams, like below:

Also, certain conditions need to be true for parallelism to occur. For example, no matter how large your table is, a simple SELECT * is going to be serial. Stuffing rows into network packets is done serially, and if there’s no other work to distribute, the whole plan avoids parallelism. Add a WHERE clause or change to a SELECT COUNT(*), and you can start to see it.

Maybe if the table is really small, SQL Server will distribute streams instead of parallel scanning. Let’s build a small table:

DROP TABLE IF EXISTS dbo.boringtable
GO

CREATE TABLE dbo.boringtable (
ID INT IDENTITY PRIMARY KEY,
yawn CHAR(100)

--1 row into the boring table (can change number of rows through TOP(N))
INSERT dbo.boringtable
SELECT TOP (1) REPLICATE('z',100)
FROM master..spt_values a
CROSS JOIN master..spt_values b
CROSS JOIN master..spt_values c

Now, let’s construct a parallel query, and since this is just a test, I’m going to cheat a little with trace flags. I use a WHERE clause so SQL Server sees there’s work to distribute, then I use 8649 to bully the optimizer into using parallelism and 8757 to skip a trivial plan (so that the optimizer can consider parallelism).

SELECT *
FROM dbo.boringtable
WHERE yawn LIKE 'z%'
OPTION(
QUERYTRACEON 8649
,QUERYTRACEON 8757
)

Well that’s interesting. Even though the table has one row – and SQL Server knows it – the table is accessed in a parallel way, with just a single lucky thread getting to do anything.

So small tables don’t cause a Distribute Streams. Maybe we can use something that prevents the scan from being parallel. There happen to be numerous options, but my favorite one is the backwards scan. SQL Server can do an ascending scan (assuming that’s the natural index order) in parallel, but not descending. Weird, right? Also useful for demos. Let’s add an ORDER BY DESC.

SELECT *
FROM dbo.boringtable
WHERE yawn LIKE 'z%'
ORDER BY ID DESC
OPTION(
QUERYTRACEON 8649
,QUERYTRACEON 8757
)

That didn’t quite work. Instead of a descending scan, we get a parallel ascending scan followed by a descending sort, because the cost is so low on a single-row table.

The next goal is to prevent the Sort. We can discourage through costing, or use a TOP to force a separate region of the plan.

;WITH CTE AS (
SELECT TOP 1 *
FROM dbo.boringtable
ORDER BY ID DESC
)

SELECT *
FROM CTE
WHERE yawn LIKE 'z%'
OPTION(
QUERYTRACEON 8649
,QUERYTRACEON 8757 --actually unnecessary now
)

Success! Highly contrived, but still a success, and it’s actually similar to cases where I see Round Robin in the wild: a descending scan feeding a parallel section.

Say, what happens if we tell SQL Server this is a large table and then do a descending scan?

UPDATE STATISTICS dbo.BoringTable WITH PAGECOUNT = 10000, ROWCOUNT = 5000000
GO

SELECT TOP 10 *
FROM dbo.boringtable
WHERE yawn LIKE '%a%'
ORDER BY ID DESC
OPTION(
QUERYTRACEON 8649
,QUERYTRACEON 8757
)

Huh…

Month: June 2019

Performance Impact of Small Queries

Round Robin Hunting