What’s the Difference between CXPACKET and CXCONSUMER

If wait stats were celebrities, then CXPACKET would be a superstar, showing up in a gauche limo and getting swarmed by nerdy fans asking for an autograph. It would also suffer from multiple personalities, because what it represents, parallelism, is complex with many moving parts.

So Microsoft decided to split this wait into two: CXPACKET and CXCONSUMER. CXPACKET is the evil twin with a little goatee, and CXCONSUMER is your friendly neighborhood benign wait (supposedly). Parallelism remained complicated, and MS’s explanation remained simple – annoyingly so. I had to do a lot of reading and testing to develop some intuition – here’s my attempt to share.

The starting point is that exchange operators have one or more consumer threads that receive packets of data from one or more producer threads. (You can read more in Paul White’s article here.) A DOP 8 Repartition Streams will actually coordinate 16 workers – 8 on the consumer side and 8 on the producer.

In general, CXCONSUMER waits represent consumer threads waiting, and CXPACKET the producer side. I visualize it like this:

This is, of course, a drastic simplification. For one, the above only represents a single worker on each side. Reality will be more complex. Also, CXPACKET waits aren’t limited to the producer side – it’s common to see them on the consumer side as well, especially in Gather Streams.

Maybe it’s a little easier now to understand why Microsoft guidance is to ignore CXCONSUMER waits as harmless. Normally, the heavy lifting of a query is done on the producer side – of course the consumer threads will be waiting for data, and they will normally finish their own work before the next packet from the producer side is ready. But certain adverse situations (blocking, skew) still explode CXCONSUMER waits, so the guidance is merely a guideline.

Just because I say something doesn’t make it so, whether or not I claim authority. So here, have a simple demo:

I start with a 10 million row table, using a script modified from the wonderful dbfiddle.uk

DROP TABLE IF EXISTS dbo.A
CREATE TABLE dbo.A (ID int primary key)

;with
  p0(i) as (select 1 union all select 1 union all select 1 union all select 1)
, p1(i) as (select 1 from p0 as a, p0 as b, p0 as c, p0 as d, p0 as e)--1K rows
, p2(i) as (select 1 from p1 as a, p1 as b, p1 as c)

INSERT dbo.A
SELECT TOP(10000000) ROW_NUMBER() OVER(ORDER BY 1/0)
FROM p2

And here’s a wrapper you can use to grab waits for the query

DROP TABLE IF EXISTS #waits

SELECT *
INTO #waits
FROM sys.dm_exec_session_wait_stats
WHERE session_id = @@SPID

--Query goes here

SELECT sws.wait_type, sws.waiting_tasks_count - ISNULL(w.waiting_tasks_count,0) AS wait_count,
sws.wait_time_ms - ISNULL(w.wait_time_ms,0) AS wait_ms
FROM sys.dm_exec_session_wait_stats sws
LEFT JOIN #waits w ON w.wait_type = sws.wait_type
WHERE sws.session_id = @@SPID
AND sws.waiting_tasks_count > ISNULL(w.waiting_tasks_count,0)

Ok, now let’s create a query that’s deliberately slow on the consumer side using expensive nested HASHBYTES calls

DECLARE @trash1 VARBINARY(64)

SELECT @trash1 = HASHBYTES('SHA2_256',HASHBYTES('SHA2_256',HASHBYTES('SHA2_256',HASHBYTES('SHA2_256',
			CONVERT(VARCHAR(10),A.ID)
			))))
FROM dbo.A
WHERE A.ID%10 < 100

Here’s the plan:

High CXPACKET query
Ouch!

Slow down the producer side with the below query (and remember that actual execution plans don’t include CXCONSUMER waits, so you’ll have to use the wrapper)

DECLARE @trash2 INT

SELECT @trash2 = A.ID
FROM dbo.A
WHERE 0x800000 < HASHBYTES('SHA2_256',HASHBYTES('SHA2_256',HASHBYTES('SHA2_256',HASHBYTES('SHA2_256',
			CONVERT(VARCHAR(10),A.ID)))))

And the plan for that…

High CXCONSUMER query

Which has high CXCONSUMER waits, but not as high as the CXPACKETs were, because there’s only one consumer thread for the Gather Streams exchange.

So there you go: an animation and demo for the superstar twins of wait stats. Excuse me while I try to get another autograph demo…