Swim Between the Flags

If you release software with any reasonable cadence, it’s almost inevitable you will end up using a feature flag system of one sort or another. There always comes the time when engineering wants to decouple from the release cycle of the Go To Market (GTM) team and with good reason. The GTM team is slower and more traditional. They want to announce big “drops” and send out the marching band to showcase all the cool new stuff you can do in a process that can take months to orchestrate whereas we prefer to deploy more frequently in small increments.

Feature flags swoop in from stage left and save the day! GTM have clarity on when the marching band can be unleashed and we can deploy and test in small manageable increments. All is well until, inevitably, we release a big bang GTM thing and it goes south. It’s 3am, monitoring is lit up like a Christmas tree and we need to fix it fast. Fortunately, we haven’t yet cleaned up that flag that we were definitely gunna so we just disable the feature and the problem is relinquished to a tomorrow problem. In the post mortem that definitely ensues, the team espouses the virtue of the feature flag that allowed them to return to slumber in a timely manner. HURRAH!

The team quickly learns to use safety flags around every change they make “just in case” and before you know it you have hundreds of feature flags in play. Toggling feature flags becomes perilous as there are now so many logic paths it’s impossible to properly test all the combinations. The people who created the flags in the first place have left and we’re not quite sure what they toggle anymore. With the lack of tests around the flags we’d best leave them alone and not risk breaking things. BOOOOOOOO!

It is honestly breathtaking how quickly this can get out of hand even in a small team!

We must accept that each flag adds a significant burden to the SDLC in the form of backward compatibility which doubles our testing effort to verify both sides of the flag. Feature flags can be a powerful tool but they should be used sparingly and, as obvious as it may seem, add value rather than hinder it.

But what about our safety net? By using flags to manage production issues we’re solving the problem at hand rather than the problem at cause. The issue still made it out to production, still bothered customers enough to cause an incident, still caused the feature to be completely rolled back for everyone. The only thing optimised for is speed to rollback.

Safety flags solve the wrong problem. If you have bugs making it out to production significant enough to warrant rollback of an entire feature, you have a quality problem and that is the problem that needs solving. Throwing a bunch of flags and logic branches over a system that already has a quality problem is a sure-fire way to make it worse. I’d suggest time spent shifting left and ensuring defects are caught earlier in the SDLC will result in better outcomes than the ability to quickly rollback a defect after it’s out in production.

All this being said, feature flags are still a powerful tool that can be used effectively if we apply a handful of guardrails to keep things sane:

Do you really need that flag? If the feature is small and can be released with a code deployment without fanfare then just do that. The easiest flag to deal with is the one you didn’t create.
Feature flags should be aligned to product releases or drops which cover wide sets of functionality released in one unit to the end user.
Automate the testing of feature toggle logic as much as possible (you’re shifting left anyway right?). Manual testing is complex and time consuming.
Clean up the feature flag and all the associated logic branches as soon as possible after the release. It’s done its job and its time to shed the burden that comes with it.
Avoid nesting flags or making them dependant on each other resulting complex toggle on/off sequences with sometimes unpredictable results.

solveTheRightProblem