Did We Learn Nothing From SRE?

It seems fairly common for organisations adopting AI tooling to measure adoption rates, token usage, generated code commits, etc and I must admit my mind boggles. John Willis recently wrote this excellent post on the topic and what W Edwards Deming might think about this and it got me thinking of the same problem from a different angle. I have to ask the question - Did we learn nothing from SRE?

SRE taught us that CPU usage, memory consumption, network saturation, etc, in and of themselves, are not useful measures of system health. A web service consuming 80% CPU that reliably responds in under 150ms simply represents effective use of resources rather a system under strain. SRE showed us that end user experience was the thing that mattered. What we needed to quantify was “Is the system behaving in a manner that is acceptable to the end user?” Generally this meant latency, throughput, error rates, etc - things the end user would notice. Degradation may well be due to resource contention but that was the lever to pull rather than the measure of success. Scaling up the infra in this example would have no net effect on system health but would indeed have an effect on the cloud bill watering down our ROI.

If we apply SRE thinking to the AI productivity boost problem, it seems obvious that measuring tool usage, generated code volume, agentic pull requests, token usage, etc will not give meaningful feedback on productivity. As John so aptly described, this is akin to assuming higher electricity usage in a factory is directly attributable to increased productivity. The night shift must surely be more productive?

The current frenzy of adoption driven by metrics such as token usage really misses the point. It assumes productivity increases are correlated to tool usage but this is a hypothesis at best. Deming was highly critical of what he termed “management by numbers” and Goodhart’s law agrees.


Metrics

So here’s an outlandish thought - The metrics that we should be looking at in this age of AI haven’t changed. The value that generates income and keeps the lights on is not the code and certainly isn’t the way it was written. It’s the end user experience we should continue to focus on and we already have a range of tools to measure it - DORA, SPACE, Flow Framework, DX, et al. AI shouldn’t really change how you measure “productivity” even if it comes with the ability to see a whole bunch of new numbers.

If you’re serious about productivity, run experiments with different ways of building with AI, without AI, all AI, etc and measure the impact on the end user value delivered. Are we shipping features engage our users? Is our application more stable than it was? Is our NPS climbing? Are calls to our call centre decreasing? Tool adoption and token usage may well be a consequence but they’re just numbers and not outcomes.