The Top/Max Transformation

Behold, a simple MAX query:

But wait, where did that Top operator come from? Oh, SQL Server, you clever rascal…

Simple and extremely effective – a solid query transformation. But can you handle more complex versions, SQL Server?

Let’s try the bane of ordered scans: partitioned tables.

Please note that all repro scripts are shared at the bottom

Partitioning

The transformation works if the column is both the leading index key and the partitioning key.

But fails if the leading key is not the partitioning key.

I guess that makes sense to me, because it would have to grab the top value from each partition then sort those, a much more complicated transformation.

Second Column

Can SQL Server use the Top/Max transformation when looking at the second column of an index? In theory it should work.

Time for our favorite DBA sing-along song, It Depends.

Here’s the table, clustered on c1, c2, loaded with some filler data as seen in the repro script at the end.

CREATE TABLE dbo.A(
c1 INT NOT NULL, 
c2 INT NOT NULL,
INDEX CX_somecreativename CLUSTERED (c1,c2)
)

The simplest form works

SELECT MAX(c2)
FROM dbo.A
WHERE c1 = 3

What if we make it more complicated? Let’s add a basic temp table.

CREATE TABLE #3 (three INT NOT NULL)
INSERT #3 VALUES (3)

This works…

SELECT MAX(c2)
FROM A
WHERE c1 = (SELECT TOP(1) 3 FROM #3)

But this doesn’t

SELECT MAX(c2)
FROM A
WHERE c1 = (SELECT TOP(1) three FROM #3)

Spot the difference? Pulling a column value from the second table breaks the transformation, but pulling a constant keeps it. Weird. Anyways, let’s try a rewrite (though inexact) of the query.

SELECT the_max
FROM (SELECT TOP(1) three AS c3 FROM #3) s
CROSS APPLY (
	SELECT MAX(c2) AS the_max
	FROM dbo.A
	WHERE A.c1 = s.c3
	) x

Victory again! Wait, what’s that? A change to the table definition making the second column nullable?

ALTER TABLE dbo.A
ALTER COLUMN c2 INT NULL

Ok fine, I’m sure the last query won’t…oh.

So the TOP/MAX transformation is finicky on a second column, depending on nullability [is that a word?] and how the query is written.

Going Backwards

There’s even a situation where SQL Server should (should!) be able to do a reverse transformation, a TOP into a MAX. Columnstore indexes store maximum values per segment, so a TOP(1) query could simply grab that!

Conclusion

I can’t tell you how tempted I was to call this MAXTOP. Thinking about it still causes stray giggles to bubble up, as I gleefully envision sharing this info with eager young DBAs at a SQL Saturday, then ushering them off to a session on parallelism. Thankfully I’m not that evil.

The real conclusion is that SQL Server is programmed to be very clever. Despite the cleverness though, details matter, and the cleverness often falls short. What’s that warm glow I feel inside? Ah yes, job security.

Repro scripts

USE TestDB
GO

CREATE PARTITION FUNCTION pf_hundreds(int)
    AS RANGE RIGHT FOR VALUES (100,200,300,400,500,600,700,800,900);

CREATE PARTITION SCHEME ps_hundreds
AS PARTITION pf_hundreds
ALL TO ( [PRIMARY] );
GO

DROP TABLE IF EXISTS PartitionedNumbers
GO

CREATE TABLE PartitionedNumbers (
Number1 INT NOT null, 
Number2 INT NOT NULL, 
CONSTRAINT PK_numnum PRIMARY KEY CLUSTERED(Number1,Number2))
ON ps_hundreds(Number1) --drop/create table partitioned on Number2 for second query

INSERT PartitionedNumbers
SELECT TOP(999) ROW_NUMBER() OVER(ORDER BY 1/0), ROW_NUMBER() OVER(ORDER BY 1/0)
FROM master.dbo.spt_values

SELECT MAX(Number1)
FROM PartitionedNumbers
USE TestDB
GO

DROP TABLE IF EXISTS dbo.A
DROP TABLE IF EXISTS #3
go

CREATE TABLE dbo.A(
c1 INT NOT NULL, 
c2 INT NOT NULL,
INDEX CX_somecreativename CLUSTERED (c1,c2)
)

INSERT dbo.A
SELECT TOP(100000) 
(ROW_NUMBER() OVER(ORDER BY 1/0))%100,
(ROW_NUMBER() OVER(ORDER BY 1/0))%500 + 250
FROM sys.columns a, sys.columns b, sys.columns c

CREATE TABLE #3 (three INT NOT NULL)
INSERT #3 VALUES (3)

--works
SELECT MAX(c2)
FROM dbo.A
WHERE c1 = 3

--works
SELECT MAX(c2)
FROM dbo.A 
WHERE c1 = (SELECT 3)

--works
SELECT MAX(c2)
FROM A
WHERE c1 = (SELECT TOP(1) 3 FROM #3)

--does not work
SELECT MAX(c2)
FROM A
WHERE c1 = (SELECT TOP(1) three FROM #3)

--works
SELECT the_max
FROM (SELECT TOP(1) three AS c3 FROM #3) s
CROSS APPLY (
	SELECT MAX(c2) AS the_max
	FROM dbo.A
	WHERE A.c1 = s.c3
	) x


--change things around
ALTER TABLE dbo.A
ALTER COLUMN c2 INT NULL

--does not work
SELECT the_max
FROM (SELECT TOP(1) three AS c3 FROM #3) s
CROSS APPLY (
	SELECT MAX(c2) AS the_max
	FROM dbo.A
	WHERE A.c1 = s.c3
	) x

No Running at the Resource Pool

Ok, I get it, scheduling queries can be complicated. See this and this and maybe this and this too but only if you have insomnia. I still thought I kinda understood it. Then I started seeing hundreds of query timeouts on a quiet server, where the queries were waiting on…what? CPU? The server’s at 20%, why would a query time out after a minute, while only getting 5s of CPU, and not being blocked?

It was Resource Governor messing up one of my servers recently in a way I never expected. Apparently RG will try to balance *total* CPU usage for pools, to the point that a single query sharing a scheduler will be completely starved.

Uh, this might be easier to explain with a picture:

And a demo of course:

Let’s set up some Resource Pools with no constraints. (Thanks to this blog for making some easy-to-copy scripts). Again, these pools don’t limit anything – the only thing they do is exist. Behold, pools Apple and Banana:

USE master
GO
 
CREATE RESOURCE POOL [Apples] WITH(
		min_cpu_percent=0, 
		max_cpu_percent=100, 
		min_memory_percent=0, 
		max_memory_percent=100, 
		AFFINITY SCHEDULER = AUTO
)
GO
 
CREATE RESOURCE POOL [Bananas] WITH(
		min_cpu_percent=0, 
		max_cpu_percent=100, 
		min_memory_percent=0, 
		max_memory_percent=100, 
		AFFINITY SCHEDULER = AUTO
)
GO

CREATE WORKLOAD GROUP [Apple] 
USING [Apples]
GO
CREATE WORKLOAD GROUP [Banana] 
USING [Bananas]
GO

CREATE FUNCTION dbo.Fruit_Inspector() 
RETURNS SYSNAME WITH SCHEMABINDING
AS
BEGIN
  DECLARE @workload_group sysname;
 
  IF (program_name() LIKE 'A%')
      SET @workload_group = 'Apple';
  IF (program_name() LIKE 'B%')
      SET @workload_group = 'Banana';
     
  RETURN @workload_group;
END;
GO

ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = dbo.Fruit_Inspector);
GO

ALTER RESOURCE GOVERNOR RECONFIGURE
GO

Now for a little bit of SSMS trickery: you can use connection properties to set an application name of your choice, assigning your connection to a resource pool.

Check which scheduler you’re on (and your session id, workload group, and app name) with this query:

SELECT er.scheduler_id, wg.name AS wg_name, er.session_id, es.program_name
FROM sys.dm_exec_requests er
JOIN sys.resource_governor_workload_groups wg
ON wg.group_id = er.group_id
JOIN sys.dm_exec_sessions es
ON er.session_id = es.session_id
WHERE er.session_id = @@SPID

Get a session ready for Resource Pool Apple – it doesn’t matter what scheduler it’s on. Place an infinite-loop CPU-sucking query in it, ready to go. I don’t start it until later.

DECLARE @trash bigint

WHILE 1=1 
BEGIN
	SELECT @trash = COUNT(*) FROM dbo.Votes
end

Now get sessions for Pools Apple and Banana that happen to land on an identical scheduler (I did this by opening a bunch of Apple sessions, then reconnecting the Banana session until I got a match).

Start the other infinite-loop query, then run a high-CPU query on each of colliding sessions – here’s what I used:

SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
UNION ALL SELECT COUNT_BIG(*) FROM dbo.Votes a
OPTION(MAXDOP 1,USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140'))

Remember we have two queries running from Pool Apple, and one from Pool Banana, where the Banana query is on the same scheduler as an Apple query. Here’s the Banana runtime – 15s of pure CPU:

But here’s the Apple runtime:

Apple took 30s because it wasn’t allowed *any* CPU time while Banana was running. Because SQLOS was trying to balance the resource pools with the infinite query running, the Apple query was waiting on SOS_SCHEDULER_YIELD for the whole Banana query, which I do think is absolutely bananas.

Remember the Banana CPU time was 15180

You’ll even see super-long scheduler yield waits with this pattern.

If the totals look off, that’s because I tested multiple times in these sessions

Yes, seriously, waiting 5 seconds to get time on the CPU.

Please note this doesn’t even need to be a heavy query from Pool A:

“Working as intended” – Top Support Man

In summary, have the doodle again:

In other summary, Resource Governor is scary, I blame it for hundreds of avoidable timeouts, and I hope MS improves their documentation.

P.S. If you followed my demo, that infinite loop is still running – might wanna kill it. Even worse, RG is still around, which you can fix with the below:

ALTER RESOURCE GOVERNOR DISABLE