Left and Right Deep Hash Joins

There’s a lot already written about left versus right deep hash joins. Unfortunately for us SQL Server nerds, “left” and “right” don’t make as much sense in SSMS query plans – those have a different orientation than the trees of database theory.

But if you rotate plans, you can see left and right that make sense (even if they still don’t match canonical shapes). Just follow the join operators.

The great thing about left deep plans is that only a couple of hash tables need to exist at a time, saving memory.

Whereas a right-deep hash join plan needs to keep all the hash tables around until it’s done.

So why would SQL Server ever choose a right-deep plan? There are two key ideas to this. First, each shape has just one privileged position, a single table that is only part of probes, never needing to fill up a hash table, never adding to the memory overhead. Review the two plans above – can you spot which in each never contributes to a build?

Second, SQL Server has to pick just two tables to start with (more than two is an n-ary join, which may theoretically exist). Certain join patterns let you choose any table to start with, while some severely limit choices. Star joins, a common data warehouse pattern, require the central table to be part of the initial join.

I had a vague intuition of this part, but drawing it out really helped me – here are three basic patterns I tend to see:

First, an N-ary join, where all tables join on an equivalent value.

SELECT *
FROM Users A
JOIN BannedUsers B
   ON A.UserID = B.UserID
JOIN Donors C
   ON A.UserID = C.UserID
JOIN Achievements D
   ON C.UserID = D.UserID

Then there are what I call chained joins (I assume there’s an official name, but I don’t know it).

SELECT *
FROM Users A
JOIN Vacations B
   ON A.UserID = B.UserID
JOIN Itineraries C
   ON B.VacationID = C.VacationID
JOIN Destinations D
   ON C.DestinationID = D.DestinationID

Finally, and most importantly for this topic, there are star joins, where everything relates back to a central table.

SELECT *
FROM UserFacts A
JOIN NameDim B
   ON A.NameID = B.NameID
JOIN AddressDim C
   ON A.AddressID = c.AddressID
JOIN DemographicsDim D
   ON A.DemographicsID = D.DemographicsID

In a star join, that central table is usually massive. Billions of rows into a hash table is typically a Bad Idea. And since all joins involve that table, SQL Server can’t delay it until the privileged position of a left deep plan. Instead, there will be a right deep hash join plan, and the massive table will be placed in the privileged position that never fills up a hash table.

So right-deep hash join plans exist because sticking gajillions of rows into a hash build tends to go poorly, and the optimizer picks a right deep plan to avoid blowing out your tempdb. Hurray for the optimizer!…this time.

Leave a Reply

Your email address will not be published. Required fields are marked *