Human-in-the-Loop or Human-in-the-Way?

The promise of acceleration from AI developer tooling is highly alluring and yet there’s this annoying risk of such tooling generating logic that would result in … uh … negative outcomes. The go-to mitigation seems to be an insistence on maintaining human code review stage gates. Someone must be “accountable” (what does this even mean?) for the quality of the output they didn’t output. The so called human-(silver-bullet)-in-the-loop.

I give side eyes.

Reviewers, now overwhelmed with a flood of pull requests, face increasing pressure to merge faster whilst simultaneously inhaling, validating and being “accountable” for all the logic they didn’t write. Inevitably, effectiveness of review tanks and code quality, security and system resiliency suffer. The DORA reports from 2024 and 2025 say this is happening already, showing a reduction in system resiliency since the widespread adoption of AI although, promisingly, as of 2025 we are being more productive about reducing that resiliency.

There is also an implicit assumption that human review is superior to AI review (is it because we’re “accountable”?). I beg to differ. The AI powered reviews I’ve seen have been far more thorough (and useful) than the average human review. More importantly, AI review is more repeatable, more consistent. This should not be surprising, for decades we have relied on software to be better and more consistent than we are and why would this be an exception. It seems absurd that we would stage gate a merge with a checkpoint that is less reliable than that which it checks. Oh what’s that you say? This is different because LLMs are non-deterministic you say? True, but so are humans.

The humans have become the human-in-the-way and yet we can’t just leave it to the AI to do the right thing infallibly. It is inevitable that, given enough time, an LLM will generate less than ideal logic and now we’re rolling the dice on those negative outcomes we previously mentioned.

So what to do?

If we invoke the flux capacitor and go back to the advent of devops, we might remember the NOC teams that had four eyes policies on release processes as the gods of ssh were summoned, environment variables were fettled, binaries patched, servers restarted and fingers crossed. Devops pipelines took all those eyes right out of that process and this was the unlock to doing many frequent small releases rather than infrequent giant destabilising ones. No one looks at variables on servers anymore. Even if we get it wrong we fix the pipeline and redeploy. Human review and control processes moved up a level to the pipelines, to the system that deploys the system. Those pipelines were subjected to the same rigorous build, test and review processes that we apply to software development which allowed us to maintain quality as throughput scaled.

Catch a lightning bolt back to present day and the predicament we find ourselves in seems quite similar. Human review can’t remain effective whilst scaling to keep up with the firehose of change generated by engineers accelerated by AI coding tools. It makes sense that if we can offload the authoring of logic to AI tools then the same can be applied to the review of that logic. Human focus moves up a level to the system that builds the system. We fettle, test and polish that system until we can deal with the firehose while unlocking the ability to maintain quality as throughput scales. Just like we did with devops. Perhaps this is better described as putting the human-over-the-loop rather than in it?

Finally, and most importantly, there’s a happy ending for your friend and mine “accountability”. That remains firmly in human hands.