There’s something about being a DBA that gives us special insight into the dysfunctions of both code and organizations. When we’re the ones keeping databases running, we get a first row seat to all the dumb that floats up.
Yet sometimes it feels hard getting developers to care. Five thousand deadlocks an hour? Don’t worry, we have retry logic. Entity Framework uses the wrong datatype in a query, causing it to scan and consume half a server’s CPU? Oh no worries, everything is still working. Remember…servers are cattle, not pets.
Bitterness aside, I found something that helps communicate impact: cost. No, not query bucks – dollars. One of the wonderful benefits of Azure is that organizations can see the cost of garbage code. Well, those of us working with on-premise servers can use the same approach. Send an email informing developers (and their managers) that one of their bad queries would cost $80k a year to run in Azure, and you’ll get traction. It’s also fun to show your boss how much your indexing was worth, right around review time.
Here’s the core of my approach:
- Find an approximately equivalent resource in Azure (e.g. if you have an AG, look at a Business Critical Managed Instance).
- Use your company’s pricing info (or the handy online estimator) to look at cost, and then calculate dollars per core-hour.
- Add in scaling assumptions. For example, any server consuming more than 75% of CPU in your organization may be considered too small and increased in size. This would make the available compute pool 75% instead of 100% for our calculation.
- Grab CPU usage metrics from your favorite source (yay Query Store!) for the offending query. Gather them during peak hours if possible, because that’s what you size your physical server around.
- Scale the cost to per-year (yes, I admit it’s to get a larger number, but it’s a timeframe the business usually budgets around too).
Step 2: Math. Step 3: Profit! I figured you might not want to math it out yourself, so here’s a Google Sheet with my formulas.
I’ve had a lot of success communicating this way, and I’m especially happy when I have actual Azure resources to work with. There are still some imperfect assumptions in my approach (scale and comparable Azure resources I think are the weakest), so I’d love feedback and suggestions. Now go use math to beat up developers!