Behold, a simple MAX query:
But wait, where did that Top operator come from? Oh, SQL Server, you clever rascal…
Simple and extremely effective – a solid query transformation. But can you handle more complex versions, SQL Server?
Let’s try the bane of ordered scans: partitioned tables.
Please note that all repro scripts are shared at the bottom
Partitioning
The transformation works if the column is both the leading index key and the partitioning key.
But fails if the leading key is not the partitioning key.
I guess that makes sense to me, because it would have to grab the top value from each partition then sort those, a much more complicated transformation.
Second Column
Can SQL Server use the Top/Max transformation when looking at the second column of an index? In theory it should work.
Time for our favorite DBA sing-along song, It Depends.
Here’s the table, clustered on c1, c2, loaded with some filler data as seen in the repro script at the end.
CREATE TABLE dbo.A( c1 INT NOT NULL, c2 INT NOT NULL, INDEX CX_somecreativename CLUSTERED (c1,c2) )
The simplest form works
SELECT MAX(c2) FROM dbo.A WHERE c1 = 3
What if we make it more complicated? Let’s add a basic temp table.
CREATE TABLE #3 (three INT NOT NULL) INSERT #3 VALUES (3)
This works…
SELECT MAX(c2) FROM A WHERE c1 = (SELECT TOP(1) 3 FROM #3)
But this doesn’t
SELECT MAX(c2) FROM A WHERE c1 = (SELECT TOP(1) three FROM #3)
Spot the difference? Pulling a column value from the second table breaks the transformation, but pulling a constant keeps it. Weird. Anyways, let’s try a rewrite (though inexact) of the query.
SELECT the_max FROM (SELECT TOP(1) three AS c3 FROM #3) s CROSS APPLY ( SELECT MAX(c2) AS the_max FROM dbo.A WHERE A.c1 = s.c3 ) x
Victory again! Wait, what’s that? A change to the table definition making the second column nullable?
ALTER TABLE dbo.A ALTER COLUMN c2 INT NULL
Ok fine, I’m sure the last query won’t…oh.
So the TOP/MAX transformation is finicky on a second column, depending on nullability [is that a word?] and how the query is written.
Going Backwards
There’s even a situation where SQL Server should (should!) be able to do a reverse transformation, a TOP into a MAX. Columnstore indexes store maximum values per segment, so a TOP(1) query could simply grab that!
Conclusion
I can’t tell you how tempted I was to call this MAXTOP. Thinking about it still causes stray giggles to bubble up, as I gleefully envision sharing this info with eager young DBAs at a SQL Saturday, then ushering them off to a session on parallelism. Thankfully I’m not that evil.
The real conclusion is that SQL Server is programmed to be very clever. Despite the cleverness though, details matter, and the cleverness often falls short. What’s that warm glow I feel inside? Ah yes, job security.
Repro scripts
USE TestDB GO CREATE PARTITION FUNCTION pf_hundreds(int) AS RANGE RIGHT FOR VALUES (100,200,300,400,500,600,700,800,900); CREATE PARTITION SCHEME ps_hundreds AS PARTITION pf_hundreds ALL TO ( [PRIMARY] ); GO DROP TABLE IF EXISTS PartitionedNumbers GO CREATE TABLE PartitionedNumbers ( Number1 INT NOT null, Number2 INT NOT NULL, CONSTRAINT PK_numnum PRIMARY KEY CLUSTERED(Number1,Number2)) ON ps_hundreds(Number1) --drop/create table partitioned on Number2 for second query INSERT PartitionedNumbers SELECT TOP(999) ROW_NUMBER() OVER(ORDER BY 1/0), ROW_NUMBER() OVER(ORDER BY 1/0) FROM master.dbo.spt_values SELECT MAX(Number1) FROM PartitionedNumbers
USE TestDB GO DROP TABLE IF EXISTS dbo.A DROP TABLE IF EXISTS #3 go CREATE TABLE dbo.A( c1 INT NOT NULL, c2 INT NOT NULL, INDEX CX_somecreativename CLUSTERED (c1,c2) ) INSERT dbo.A SELECT TOP(100000) (ROW_NUMBER() OVER(ORDER BY 1/0))%100, (ROW_NUMBER() OVER(ORDER BY 1/0))%500 + 250 FROM sys.columns a, sys.columns b, sys.columns c CREATE TABLE #3 (three INT NOT NULL) INSERT #3 VALUES (3) --works SELECT MAX(c2) FROM dbo.A WHERE c1 = 3 --works SELECT MAX(c2) FROM dbo.A WHERE c1 = (SELECT 3) --works SELECT MAX(c2) FROM A WHERE c1 = (SELECT TOP(1) 3 FROM #3) --does not work SELECT MAX(c2) FROM A WHERE c1 = (SELECT TOP(1) three FROM #3) --works SELECT the_max FROM (SELECT TOP(1) three AS c3 FROM #3) s CROSS APPLY ( SELECT MAX(c2) AS the_max FROM dbo.A WHERE A.c1 = s.c3 ) x --change things around ALTER TABLE dbo.A ALTER COLUMN c2 INT NULL --does not work SELECT the_max FROM (SELECT TOP(1) three AS c3 FROM #3) s CROSS APPLY ( SELECT MAX(c2) AS the_max FROM dbo.A WHERE A.c1 = s.c3 ) x