Fixing Queues with Watermarks

Queues are pretty common, but it’s not too often I encounter them in SQL Server. There’s a reason: SQL Server is bad at them. I’m not just talking about Service Borker – even native table implementations crumble under heavy load.

The standard pattern looks something like below – allowing multiple threads to pull a work item from the single queue table.

WITH CTE AS 
    (SELECT TOP (1) *
    FROM dbo.testq WITH (ROWLOCK, READPAST) --locking hints allow concurrency
    ORDER BY ID)
DELETE CTE
OUTPUT Deleted.filler
INTO @d --table variable to store the stuff to process

I recently had the pleasure of investigating a procedure that pulled from a queue. Normally it was fast, but occasionally runtime would spike. The spooky thing was the query was using an ordered index scan that should only read one row, but during the spikes it was reading thousands.

Surely there’s a rational explanation…

…what to blame when your queue table has a bad hair day

Seriously. In the standard implementation of a queue table, dequeued records are deleted. But SQL Server typically doesn’t remove deleted rows immediately. Instead it marks them as ghost records (aka soft deletes), and exorcises them later, which is why a TOP(1) can easily read a lot more than one row.

So when the dequeue finds the next record to work on, it scans from the start of the index through ghosted rows to reach it. This isn’t a problem if there are only a couple, but sometimes servers can be so busy that Ghost Record Cleanup can’t keep up, or the ghost records are needed for versioning (which can easily happen if you use AGs). In those cases, the number of ghosts grows and grows, and you end up haunted by bad performance.

It was an intriguing problem, and I worked like a man possessed. I tested some bizarre approaches to shift rows around or force cleanup without blocking parallel dequeues, but what ended up succeeding was a watermark pattern. In a separate table, I store the minimum identity processed. Then, every dequeue can range scan starting from the watermark.

In the original implementation, doubling the number of rows to dequeue would quadruple the runtime. 30k rows took 2:50, while 60k rows took 10:01. For the watermark pattern however, 30k rows dequeued in :23, and 60k in :48. A clear win!

Here’s roughly what my implementation looked like. If you want to test for yourself, you can force ghost records (specifically version ghosts) by leaving a select transaction open with snapshot isolation enabled. I then spammed dequeues in while loops with the help of SQLQueryStresser.

First, the tables:

DROP TABLE IF EXISTS dbo.testq;
DROP TABLE IF EXISTS dbo.qtrack;

CREATE TABLE dbo.testq
(
    ID INT IDENTITY PRIMARY KEY,
    inserttime DATETIME2(2) CONSTRAINT DF_qq2 DEFAULT GETUTCDATE(),
    filler CHAR(400)
)

CREATE TABLE dbo.qtrack --watermark table
(
    ID INT PRIMARY KEY
)

INSERT dbo.qtrack
VALUES (0)

INSERT dbo.testq (filler)
SELECT TOP (30000) 
    REPLICATE(CONVERT(VARCHAR(36), NEWID()), 11) --filler
FROM sys.all_columns a,
     sys.all_columns b;
GO

Here’s the normal style of dequeue:

CREATE OR ALTER PROC dbo.dequeue_regular
AS
BEGIN
    DECLARE @d TABLE
    (
        junk VARCHAR(500)
    );
    WITH CTE
    AS (SELECT TOP (1)
               *
        FROM dbo.testq WITH (ROWLOCK, READPAST)
        ORDER BY ID)
    DELETE CTE
    OUTPUT Deleted.filler
    INTO @d;
END;

And the dequeue using the watermark table:

CREATE OR ALTER PROC dbo.dequeue_watermark
AS
BEGIN

    DECLARE @d TABLE
    (
        ID INT,
        junk VARCHAR(500)
    );

    DECLARE @w INT = (SELECT TOP 1 ID FROM dbo.qtrack)

    WITH CTE
    AS (SELECT TOP (1)
               *
        FROM dbo.testq WITH (ROWLOCK, READPAST)
        WHERE ID >= @w
        ORDER BY ID)
    DELETE CTE
    OUTPUT Deleted.ID,
           Deleted.filler
    INTO @d;

    DECLARE @t INT = (SELECT TOP 1 ID FROM @d)

    IF @t % 100 = 0
    BEGIN
        UPDATE dbo.qtrack
        SET ID = @t - 50;
    END;
END;

There are several things to note with the way I designed this. Updating the watermark in every concurrent dequeue would be a Bad Idea, so I added the %100 such that roughly every hundredth would move the watermark. Also, I didn’t simply adjust the watermark to the processed value, but rather some buffer amount lower. To be super safe, we also set up an every 5min job to rebuild the table, correct the watermark, and log any issues.

And that’s it, a method to protect yourself from ghosts using SQL. It’s been working like a charm for us, and I hope it works for those of you who need it. Oh, and please let me know if you encounter data Bigfoot – I’d like to build a proc for him too.