Avoiding Sorts with the $PARTITION Function

Backstory

I imagine everybody has them – those annoying, bloated, misshapen tables that don’t really belong in the database. I mean, this is an expensive enterprise RDMS supporting a high volume OLTP environment, and a depressingly large portion of the storage (also expensive btw) is dedicated to decade’s worth of nvarchar(max) email attachments.

I know how hard it can be to undo someone else’s decision to treat SQL Server like a landfill, as this usually involves getting others to do something, i.e. politics. But sometimes we DBAs can make a helpful change to these archive tables on our own. And that’s where I found myself, with billions of rows in a neglected heap that couldn’t be moved, but could be turned into a clustered columnstore table for 5X compression.

Setup Script

Here’s the setup script in case you want to follow along for the repro. No billions of rows here, just 10 million.

CREATE PARTITION FUNCTION pf_years( DATETIME )
    AS RANGE RIGHT FOR VALUES ('2010','2011','2012','2013','2014','2015',
	'2016','2017','2018','2019','2020');

CREATE PARTITION SCHEME ps_years
AS PARTITION pf_years
ALL TO ( [PRIMARY] );

--Here's a table we're loading
CREATE TABLE BunchaDates
(somedate datetime,
junk char(10)
) ON ps_years(somedate)

CREATE CLUSTERED COLUMNSTORE INDEX CCX_BunchaDates ON dbo.BunchaDates ON ps_years(somedate)

--Original table, full of junk
CREATE TABLE BunchaDatesSource (
somedate datetime,
junk char(10)
)

INSERT BunchaDatesSource
SELECT TOP (10000000)
DATEADD(MINUTE,24*60*365*6*RAND(CHECKSUM(NEWID())),'1/1/2014')
,'demodemo'
FROM master..spt_values a
,master..spt_values b
,master..spt_values c

The Problem

My plan was to load the data from the heap into the columnstore table, later doing a drop and rename, but I wanted to transfer a limited amount at a time. My environment has an Availability Group set up, so my constraint was minimizing log generation. (If you’re looking for a technique to load a columnstore table as quickly as possible, check out Joe Obbish.) But, when I started loading a limited time range of data, I quickly ran into a problem.

INSERT BunchaDates
SELECT somedate,junk
FROM BunchaDatesSource bs
WHERE YEAR(bs.somedate) = '2014'

A Sort operator shows up! To explain my distaste, the columnstore table doesn’t need it – it is inherently unordered (at least in the way we think about it). Moreover, I am selecting rows guaranteed to reside within a single partition. Yet SQL Server insists on ordering rows, because it very helpfully wants us to avoid the performance penalty of switching between partitions insert-by-insert. Instead I get a slow query, ginormous memory grant, and tempdb spills. Yay!

Ok. You might notice that I used YEAR() in my predicate. What if I use a proper date range instead?

Crud. Ok. Was it the boundary dates? Maybe a narrow date range entirely within the year will work?

Fine. What about an exact time? Pretty please, oh kind and generous Query Optimizer?

The Solution

Looping through all possible times of a datetime time field in order to avoid a sort is very bad way to do things. But thankfully, there’s a better way: $PARTITION.

The $PARTITION function uses your partition function to return which partition a value belongs to, and importantly, it can be used to provide a guarantee to SQL Server that it doesn’t have to insert into multiple partitions, and will therefore prevent a superfluous sort.

INSERT BunchaDates
SELECT somedate,junk
FROM BunchaDatesSource bs
WHERE $PARTITION.pf_years(somedate) = $PARTITION.pf_years('2014')

No Sort! But there’s still room for improvement:

You see, $PARTITION doesn’t qualify for predicate pushdown, and on a query involving billions of rows, that adds up.

But guess what, we can still add the date range, and that does get predicate pushdown.

INSERT BunchaDates
SELECT somedate,junk
FROM BunchaDatesSource bs
WHERE $PARTITION.pf_years(somedate) = $PARTITION.pf_years('2014')
AND bs.somedate >= '2014/1/1' AND bs.somedate < '2015/1/1'

Conclusion

Sometimes, the optimizer misses something, and you have to babysit it. In my case, using $PARTITION allowed me to finish the table load in my time window, eventually freeing up significant space to be used for other questionable data.

The takeaway? $PARTITION lets you guarantee for the optimizer that only a single partition is involved, but you might want to retain your other predicates for predicate pushdown or perhaps index qualification.

Now excuse me while I compose yet another carefully worded email about that email attachment table…

Leave a Reply

Your email address will not be published. Required fields are marked *