If wait stats were celebrities, then CXPACKET would be a superstar, showing up in a gauche limo and getting swarmed by nerdy fans asking for an autograph. It would also suffer from multiple personalities, because what it represents, parallelism, is complex with many moving parts.
So Microsoft decided to split this wait into two: CXPACKET and CXCONSUMER. CXPACKET is the evil twin with a little goatee, and CXCONSUMER is your friendly neighborhood benign wait (supposedly). Parallelism remained complicated, and MS’s explanation remained simple – annoyingly so. I had to do a lot of reading and testing to develop some intuition – here’s my attempt to share.
The starting point is that exchange operators have one or more consumer threads that receive packets of data from one or more producer threads. (You can read more in Paul White’s article here.) A DOP 8 Repartition Streams will actually coordinate 16 workers – 8 on the consumer side and 8 on the producer.
In general, CXCONSUMER waits represent consumer threads waiting, and CXPACKET the producer side. I visualize it like this:
This is, of course, a drastic simplification. For one, the above only represents a single worker on each side. Reality will be more complex. Also, CXPACKET waits aren’t limited to the producer side – it’s common to see them on the consumer side as well, especially in Gather Streams.
Maybe it’s a little easier now to understand why Microsoft guidance is to ignore CXCONSUMER waits as harmless. Normally, the heavy lifting of a query is done on the producer side – of course the consumer threads will be waiting for data, and they will normally finish their own work before the next packet from the producer side is ready. But certain adverse situations (blocking, skew) still explode CXCONSUMER waits, so the guidance is merely a guideline.
Just because I say something doesn’t make it so, whether or not I claim authority. So here, have a simple demo:
I start with a 10 million row table, using a script modified from the wonderful dbfiddle.uk
DROP TABLE IF EXISTS dbo.A CREATE TABLE dbo.A (ID int primary key) ;with p0(i) as (select 1 union all select 1 union all select 1 union all select 1) , p1(i) as (select 1 from p0 as a, p0 as b, p0 as c, p0 as d, p0 as e)--1K rows , p2(i) as (select 1 from p1 as a, p1 as b, p1 as c) INSERT dbo.A SELECT TOP(10000000) ROW_NUMBER() OVER(ORDER BY 1/0) FROM p2
And here’s a wrapper you can use to grab waits for the query
DROP TABLE IF EXISTS #waits SELECT * INTO #waits FROM sys.dm_exec_session_wait_stats WHERE session_id = @@SPID --Query goes here SELECT sws.wait_type, sws.waiting_tasks_count - ISNULL(w.waiting_tasks_count,0) AS wait_count, sws.wait_time_ms - ISNULL(w.wait_time_ms,0) AS wait_ms FROM sys.dm_exec_session_wait_stats sws LEFT JOIN #waits w ON w.wait_type = sws.wait_type WHERE sws.session_id = @@SPID AND sws.waiting_tasks_count > ISNULL(w.waiting_tasks_count,0)
Ok, now let’s create a query that’s deliberately slow on the consumer side using expensive nested HASHBYTES calls
DECLARE @trash1 VARBINARY(64) SELECT @trash1 = HASHBYTES('SHA2_256',HASHBYTES('SHA2_256',HASHBYTES('SHA2_256',HASHBYTES('SHA2_256', CONVERT(VARCHAR(10),A.ID) )))) FROM dbo.A WHERE A.ID%10 < 100
Here’s the plan:
Slow down the producer side with the below query (and remember that actual execution plans don’t include CXCONSUMER waits, so you’ll have to use the wrapper)
DECLARE @trash2 INT SELECT @trash2 = A.ID FROM dbo.A WHERE 0x800000 < HASHBYTES('SHA2_256',HASHBYTES('SHA2_256',HASHBYTES('SHA2_256',HASHBYTES('SHA2_256', CONVERT(VARCHAR(10),A.ID)))))
And the plan for that…
Which has high CXCONSUMER waits, but not as high as the CXPACKETs were, because there’s only one consumer thread for the Gather Streams exchange.
So there you go: an animation and demo for the superstar twins of wait stats. Excuse me while I try to get another autograph demo…