I like fast databases. Naturally, this puts me at odds with GUIDs. Alas, developers like to use them, sometimes for good reasons (“we need ids generated on different servers to be unique”) and often for bad (“what if we run out of bigints”).
The Columnstore Problem
You may have run into issues with GUIDs as clustering keys, but another major problem is in columnstore. Smart people at Microsoft wrote columnstore in a way to take advantage of modern CPU features, but those CPU features don’t play well with datatypes larger than 8 bytes. Which includes GUIDs.
Let me show you: here’s a select on a GUID

And here’s a select on a bigint

We Need to Split Up
Crazy idea time: a uniqueidentifier is 16 bytes…what if we just turned that into two 8 byte bigint columns?

create table #t ( UID uniqueidentifier not null ) create table dbo.GUID_IDEA ( UID uniqueidentifier not null, b1 bigint, b2 bigint, index CCX_GUID clustered columnstore ) insert #t select top (5000000) newid() from sys.all_columns a, sys.all_columns b, sys.all_columns c insert dbo.GUID_IDEA (UID, b1, b2) select UID ,convert(bigint,RIGHT_SHIFT(convert(binary(16),UID),64)) --there are several ways to split - this one is 2022+ ,convert(bigint,RIGHT_SHIFT(LEFT_SHIFT(convert(binary(16),UID),64),64)) from #t
The Demo
Split done. We have a 5m row table storing both the full GUID and its decomposed bigints. Time to test it. Load a temp table with some sample values.
create table #driver( UID uniqueidentifier, b1 bigint, b2 bigint ) go insert #driver select * from dbo.GUID_IDEA tablesample(5 rows) go 5 --this lets us grab rows from different locations
Now, how does a GUID-only join perform?
select * from #driver d join dbo.GUID_IDEA gi on gi.UID = d.UID

And what if we include the bigints on the join?
select * from #driver d join dbo.GUID_IDEA gi on gi.UID = d.UID and gi.b1 = d.b1 and gi.b2 = d.b2

According to the plan details, that’s 268ms CPU down to 21ms, huge win! A 16 byte predicate can’t be pushed down here, but two separate 8 byte predicates (or rather, their bitmap filters) can.
This also works for where clauses.


Side Notes:
See this? Should be fast, right?
select * from dbo.GUID_IDEA gi where b1 = 3511787043328731458
Except…

Whereas this one is fast:
select * from dbo.GUID_IDEA gi where b1 = 42

This happens because the parser is just looking at the number of digits, and if there are as many digits as max bigint (and thus at risk of being larger than a bigint), it stops treating the number like a bigint.
Thankfully the fix is easy:
select * from dbo.GUID_IDEA gi where b1 = convert(bigint,3511787043328731458)
Another question you might have is why bigints? Why not just use two binary(8) columns.
Alas…
create table dbo.BETTER_IDEA ( UID uniqueidentifier not null ,p1 binary(8) not null ,p2 binary(8) not null ,index ccx_better clustered columnstore ) ...blah blah load it and split it, you can figure it out select * from BETTER_IDEA where p1 = 0x97E6A3BAC4E4A64F

This is one I don’t have an explanation for, and may need to send some wine to New Zealand for an answer. Edit: the Wizard from the Other Down Under provided an answer: Columnstore never stores binary data in a pushdown friendly format. Actual explanation here.
Conclusion
I have never done this in production, and I’m not sure I want to even with a 20X speedup – simple designs and getting data right the first time tend to outperform fancy tricks. Maybe, just maybe, it’s worth bolting on to a calcified schema where it’s the only way. If you do try this yourself, let me know how it goes!



































