Thoughts on RLS for Data Migration

The common corporate amoeba floats around, grasping at revenue, customers, and other nutrients with ciliated pseudopods. You, dear gainfully-employed DBA, are an organelle in this subsentient blob. Sometimes your employer-host catches another amoeba, and it becomes your duty to ingest their delicious data.

This was the situation I found myself in earlier this year – our company had absorbed another, and it was time to slurp up their tables. There were a lot of decisions to make and tradeoffs to weigh, and we ended up choosing to trickle-insert their data, but make it invisible to normal use until the moment of cutover.

The way we implemented this was with Row Level Security. Using an appropriate predicate, we could make sure ETL processes only saw migrated data, apps saw unmigrated data, and admins saw everything. To give a spoiler: it worked, but there were issues.

Let’s consider a simplistic RLS filter, with the StackOverflow2010 database for examples.

CREATE FUNCTION dbo.RLSfunctionSimple(@Migrated AS bit)
    RETURNS TABLE
WITH SCHEMABINDING
AS
    RETURN 
	SELECT 1 AS Pass
    WHERE (@Migrated = 1 AND USER_NAME() IN ('ETLProcess','BubbaTheAdmin'))
	OR (@Migrated = 0 AND USER_NAME() NOT IN ('ETLProcess'))
GO

Great! This works. We add an IsMigrated column to the Users table, set the security policy, and bam, we’re done. Yeah no.

ALTER TABLE dbo.Users 
ADD IsMigrated BIT NOT NULL DEFAULT(0)

CREATE SECURITY POLICY RLS_Users
ADD FILTER PREDICATE dbo.RLSfunctionSimple(IsMigrated) 
ON dbo.Users
WITH (STATE = ON);

More people than just Bubba want to see migrated data. A migration is a big deal, and everyone, everyone, is going to ask to see migrated data for reasons that are honestly quite legitimate. You will constantly be modifying who gets access.

There’s another issue: adding RLS to a table takes a SCH-M lock on that table. Have you experienced the joys of frequent SCH-M locks on critical tables in a highly active database?

So out of a desire to easily make access changes and avoid SCH-M locks (and because you can’t alter the function while it’s used by the security policy), you decide to make the RLS function reference tables. Let’s add one table for the ETL processes that should only see migrated rows, and one table for the folks who want to read everything. Everyone else by default only views unmigrated data.

CREATE TABLE dbo.SeeEverything (
UserName NVARCHAR(256) PRIMARY KEY
)
CREATE TABLE dbo.SeeMigrated (
UserName NVARCHAR(256) PRIMARY KEY
)

And here’s what the RLS function might look like:

CREATE FUNCTION dbo.RLSfunctionComplex(@Migrated AS bit)
    RETURNS TABLE
WITH SCHEMABINDING
AS
    RETURN 
	SELECT 1 AS Pass
    WHERE (@Migrated = 1 
		AND EXISTS (SELECT 1 FROM dbo.SeeMigrated WHERE UserName = USER_NAME())
		OR EXISTS (SELECT 1 FROM dbo.SeeEverything WHERE UserName = USER_NAME())
		)
	OR (@Migrated = 0 
		AND NOT EXISTS (SELECT 1 FROM dbo.SeeMigrated WHERE UserName = USER_NAME())
		)

And now this

SELECT DisplayName
FROM dbo.Users
WHERE ID = 12345

Becomes this

RLS that references tables will break queries. Sometimes, the added complexity just made the optimizer fall over. Other times useful query transformations broke.

Note that the above query plan has nine extra nodes. Imagine a relatively complex query, where most of the involved tables have RLS. Let me tell you, things break. You might say, oh don’t worry, we’ll only add to core tables. No, that’s not how it will work. Bad data is expensive. The business, at least during a migration, cares about data quality. If you have RLS functioning fine on N tables, there will be pressure to make it N+1, just to be safe.

There’s a specific and common transformation that breaks when RLS reads other tables, the TOP-MAX transformation.

Here’s the normal version, at 0ms.

SELECT MAX(ID)
FROM dbo.Users

And here it is after RLS, scanning your tables and taking 116ms

And this is only part of the plan

Any place you rely on the TOP-MAX transformation, you will have to rewrite as an explicit TOP 1.

Recommendations

Try as hard as you reasonably can to avoid external references in your RLS function. We still ended up with one – we landed on a single look-up-everything table, and it didn’t break as much as other multi-table functions did.

Expect your coverage to grow beyond initial estimates, whether users who need access or tables that need protection.

You can hot-swap your RLS function with ALTER SECURITY POLICY. It still takes a SCH-M lock on the affected table, but it’s a lot better than dropping your security policy in order to alter the function.

All-in-all, I’m mostly ok with our decision. It meant a lot of work ahead of time (including adding the migration column to tables – don’t forget to include in the indexes as well). But when the migration day came, we turned RLS off, and everything just worked. Given typical cutover craziness, that’s not a bad tradeoff to make.