Andy Weir
Fractional CTO

From delivery friction to sustainable fast flow – restoring pace and predictability under pressure.

I help scale-ups make delivery predictable again when growth starts to outpace their systems – stabilising delivery, unblocking decision flow, and embedding sustainable capability between product and engineering.

Three Effective Strategies to Tackle Bugs

Illustration of an engineer examining a green cartoon bug through a magnifying glass, safely contained inside a clear box labelled "Feature Flag" – symbolising testing safely in production.

Part 3: Controlled Delivery

Find bugs before your customers do

TL;DR – Controlled Delivery

“Just try harder” isn’t a strategy – it’s a symptom.

In Part 3 of this three-part series, we dig into how we made production safe to learn from – catching the bugs that slip through, verifying behaviour in production with live traffic, and shipping safely without slowing down.

You’ll learn how we:

We didn’t fix feedback by trying harder. We fixed a system that made it hard to succeed.

If you’re scaling fast or working with legacy systems under real-world risk, this is how you ship safely – and sleep well at night.

This series began as a conference talk I gave at LDX3 and Agile on the Beach – a story of how I helped a team tackle delivery friction head-on in their fast-growing HealthTech platform. What started as a few slides and field notes turned into a set of strategies any team can learn from, whether you’re scaling fast or wrestling legacy.

”Just Try Harder” Isn’t A Strategy!

We’ve all heard it after a release goes wrong: just try harder next time. But effort wasn’t the problem – feedback was. We’d already fixed that in Part 1 and Part 2. Now it was time to make production safe enough that we didn’t have to rely on luck or blame to stay fast.

This is the final part in a three-part series about how I helped a HealthTech platform shift from firefighting to fast flow – and made software delivery faster, safer, and more predictable under pressure.

I’ll share what worked – no frameworks, no heroics – just practical, hard-earned patterns from the coalface.

We’ll start with what it took to ship safely once we were fast.

Lie of the Land

Part 2 got us to every two days confidence. Part 3 is about keeping that confidence under real traffic – ship without fear.

Before we could make production safe to learn from, we needed to understand why it felt dangerous.

Fast Wasn’t Safe

Speed helped us move faster, but it didn’t make us safer.

Our next major feature was a new comms service sending appointment reminders, lab results, and other sensitive updates. In a HealthTech context, even small mistakes carried serious consequences – regulatory risk and the possibility of real-world harm. We couldn’t rely on habit or hope. We needed systems that made safety visible.

Logic We Couldn’t Trust

Each edge case added another layer of complexity – a new condition, another exception – and few people could describe how a message flowed end-to-end. We weren’t starting from a blank slate – we were unpicking history. To change safely, we had to understand what we already had.

Production Had the Full Picture

No test environment could reproduce production data or behaviour. Only live traffic showed how messages actually moved – and where they sometimes didn’t. Production wasn’t just where risk lived. It was where truth lived. That made it both valuable and uncomfortable as a place to learn.

Build Better Safety

The answer wasn’t more gates or longer checklists. We needed stronger feedback where it mattered most: monitoring to tell us when something was wrong, observability to help us understand why, and feature flags to keep risk contained. That became the next goal.

Start With Safety Nets

We didn’t just build alarms – we built safety nets.

We’d learned it’s hard to add in safety later. We had to build it in from the start. Before launching anything new, we made sure we could see how it behaved and recover quickly if it failed.

Map the Event Flow

Traceability starts before a single line of code.

Before we touched code, we mapped the comms workflow using event storming. Every booking created, result received, and reminder sent went on the whiteboard. It was quick, visual, and collaborative, and it surfaced gaps we hadn’t seen before. For the first time, everyone shared a clear picture of how data moved through the system.

Track Requests End-to-End

Every message should tell its own story.

We added a unique identifier to every message chain and let it flow through each service, queue, and delivery step. When a message failed or was duplicated, we could trace it directly instead of guessing. It replaced speculation with evidence, making problems easier to trace and fix.

Log Failures as Events

Failures count as feedback only if you can see them.

Previously, exceptions disappeared into logs. We treated them as structured events instead, tagged with correlation IDs that linked each failure back to the original request. Those events appeared alongside normal activity, giving us a fuller view of how the system behaved. Failures became visible (and fixable) before users noticed.

Metrics From Day One

You can’t retrofit visibility.

We made observability part of the work from the start – tracing, metrics, error tracking, and feature flags. The tools change, but the principle doesn’t: you need to see what’s happening before it matters.

From the first commit, every workflow emitted measurable signals, so drift showed up early – long before any user ever saw it.

That early visibility turned opinion into evidence. It gave us confidence that the new system behaved the same as the old one long before any user ever saw it.


Learn Before You Launch

We started with a dark launch – the new service ran in production behind feature flags. It processed the same real inputs as the live path, but delivery was disabled, full visibility – zero user exposure. We treated the rollout as a controlled experiment: learn first, launch second. Confidence came from production parity, not hope.

Run Systems in Parallel

Confidence comes from comparison, not assumption.

We ran both versions in parallel, triggered by the same live events. The old path sent real messages, while the new one remained silent. Both running through the same logic for comparison. Every input and outcome was logged side by side, so if behaviour drifted, we saw it straight away.

Capture Outputs

You can’t test what you don’t record.

We captured template + parameters, recipient, send/no-send decision, and observed timing from both paths. Delivery on the new path was disabled behind a flag – compute everything, send nothing. Differences were traced through the event flow to the exact cause, keeping quality discussions factual, not emotional.

Compare Results

Small differences matter most before launch.

Once both systems were running in parallel, we started comparing results – data, not opinion. We tracked message counts, timings, and delivery outcomes across the old and new paths. Any mismatch showed up within minutes, and we could trace it straight back through the event flow.

Fixing those discrepancies before launch saved rework, reduced support calls, and gave everyone confidence that the new system behaved as expected.

Use Production Traffic

Real traffic is the only real test.

No matter how good your tests are, production is still your final test – real users, real data, real load, and edge cases you’ll never see in test. Bugs don’t always stand out – they hide just below the surface. That’s why we started with internal users, then pilot groups, before opening it up to everyone.


Safe Rollout

We didn’t ship everything at once. Each release was a small, deliberate step. Feature flags gave us the control to decide who saw what and when. Instead of fearing production, we used it to build confidence in small, measured moves.

Plan Rollouts by Behaviour

Release by behaviour, not by date.

Rather than pushing code on a fixed schedule, we rolled out features based on how the system behaved. If metrics stayed steady and errors stayed low, we widened the rollout. If something drifted, we paused. That shift in focus – from deadlines to signals – made releases calm, predictable, and evidence-based.

Flags Control the Switch

Feature flags gave us control without ceremony.

Each new feature was hidden behind a flag. In many cases, it was a single-switch cutover – one flag turned the old path off and the new one on. We could turn functionality on or off instantly, for a single user or for everyone. If something broke, we didn’t need a patch or rollback – just that switch. That control meant we could deploy confidently, even before we were ready to release.

Pause and Verify

A short pause costs less than a fast fix.

After each stage, we stopped to check behaviour. Did message timings still align? Did success rates hold steady? That pause gave us space to look at the evidence before moving on. It turned release validation from an afterthought into part of the flow.

Build Shared Confidence

Confidence grows fastest when it’s shared.

We made the rollout visible – not just to engineers, but to QA, product, and support. Dashboards showed live metrics, and flags indicated who was seeing what. When something went wrong, everyone saw the same data and could act together. The result was quiet confidence – not the absence of risk, but a shared understanding of it.

By the time we flipped the flag, hotfixes were rare, deploys were predictable, and multiple safe releases a day were normal.

Reflection

Controlled delivery made production a safe space to learn. It took deliberate design.

Can you deliver fast, and still sleep well at night?
What would you need to change to make that true where you work?


Takeaways


Want to go deeper?

If you want to explore the thinking behind Controlled Delivery, observability, and building systems safe enough to learn from, here are four excellent reads:

Better software isn’t about trying harder.

We didn’t fix delivery by pushing people harder – we changed the system we worked in. Because when the system improves, everything else starts to shift too:

That’s what these three strategies are about.

So let’s go beyond “just try harder” and build better systems – and better software.

What would have to be true for you to test with production traffic safely?