Of all the tactics I have advocated as part of the lean startup, none has provoked as many extreme reactions as continuous deployment, a process that allows companies to release software in minutes instead of days, weeks, or months. My previous startup, IMVU, has used this process to deploy new code as often as an average of fifty times a day.
This has stirred up some controversy, with some claiming that this
rapid release process contributes to low-quality software or prevents
the company from innovating. If we accept the verdict of customers
instead of pundits, I think these claims are easy to dismiss. Far more
common, and far more difficult, is the range of questions from people
who simply wonder if it’s possible to apply continuous deployment to
their business, industry, or team.

The particulars of IMVU’s
history give rise to a lot of these concerns. As a consumer internet
company with millions of customers, it may seem to have little
relevancy for an enterprise software company with only a handful of
potential customers, or a computer security company whose customers
demand a rigorous audit before accepting a new release. I think these
objections really miss the point of continuous deployment, because they
focus on the specific implementations instead of general principles.
So, while most of the writing on continuous deployment so far focuses on the how of it, I want to focus today on the why. (If you’re looking for resources on getting started, see “Continuous deployment in 5 easy steps“)

The goal of continuous deployment is to help development teams drive waste out of their process by simultaneously reducing the batch size
and increasing the tempo of their work. This makes it possible for
teams to get – and stay – in a condition of flow for sustained periods.
This condition makes it much easier for teams to innovate, experiment,
and achieve sustained productivity. And it nicely compliments other
continuous improvement systems, such as Five Whys.

One
large source of waste in development is “double-checking.” For example,
imagine a team operating in a traditional waterfall development system,
without continuous deployment, test-driven development, or continuous
integration. When a developer wants to check-in code, this is a very
scary moment. He or she has a choice: check-in now, or double-check to
make sure everything still works and looks good. Both options have some
attraction. If they check-in now, they can claim the rewards of being
done sooner. On the other hand, if they cause a problem, their previous
speed will be counted against them. Why didn’t they spend just another
five minutes making sure they didn’t cause that problem? In practice,
how developers respond to this dilemma is determined by their
incentives, which are driven by the culture of their team. How severely
is failure punished? Who will ultimately bear the cost of their
mistakes? How important are schedules? Does the team value finishing
early?

But the thing to notice in this situation is that there
is really no right answer. People who agonize over the choice reap the
worst of both worlds. As a result, developers will tend towards two
extremes: those who believe in getting things done as fast as possible,
and those who believe that work should be carefully checked. Any
intermediate position is untenable over the long-term. When things go
wrong, any nuanced explanation of the trade-offs involved is going to
sound unsatisfying. After all, you could have acted a little sooner or
a little more careful – if only you’d known what the problem was going
to be in advance. Viewed through the lens of hindsight, most of those
judgments look bad. On the other hand, an extreme position is much
easier to defend. Both have built-in excuses: “sure there were a few
bugs, but I consistently over-deliver on an intense schedule, and it’s
well worth it” or “I know you wanted this done sooner, but you know I
only ever deliver when it’s absolutely ready, and it’s well worth it.”

These
two extreme positions lead to factional strife in development teams,
which is extremely unpleasant. Managers start to make a note of who’s
on which faction, and then assign projects accordingly. Got a crazy
last-minute feature, get the Cowboys to take care of it – and then let
the Quality Defenders clean it up in the next release. Both sides start
to think of their point of view in moralistic terms: “those guys don’t
see the economic value of fast action, they only care about their
precious architecture diagrams” or “those guys are sloppy and have no
professional pride.” Having been called upon to mediate these
disagreements many times in my career, I can attest to just how
wasteful they are.

However, they are completely logical
outgrowths of a large-batch-size development process that forces
developers to make trade-offs between time and quality, using the old “time-quality-money, pick two fallacy.”
Because feedback is slow in coming, the damage caused by a mistake is
felt long after the decisions that caused the mistake were made, making
learning difficult. Because everyone gets ready to integrate with the
release batch around the same time (there being no incentive to
integrate early), conflicts are resolved under extreme time pressure.
Features are chronically on the bubble, about to get deferred to the
next release. But when they do get deferred, they tend to have their
scope increased (“after all, we have a whole release cycle, and it’s
almost done…”), which leads to yet another time crunch, and so on. And,
of course, the code rarely performs in production the way it does in
the testing or staging environment, which leads to a series of
hot-fixes immediately following each release. These come at the expense
of the next release batch, meaning that each release cycle starts off
behind.

Many times when I interview a development team caught in
the pincers of this situation, they want my help “fixing people.”
Thanks to a phenomenon called the Fundamental Attribution Error
in psychology, humans tend to become convinced that other people’s
behavior is due to their fundamental attributes, like their character,
ethics, or morality – even while we excuse our own actions as being
influenced by circumstances. So developers stuck in this world tend to
think the other developers on their team are either, deep in their
souls, plodding pedants or sloppy coders. Neither is true – they just
have their incentives all messed up.

You can’t change the
underlying incentives of this situation by getting better at any one
activity. Better release planning, estimating, architecting, or
integrating will only mitigate the symptoms. The only traditional
technique for solving this problem is to add in massive queues in the
forms of schedule padding, extra time for integration, code freezes and
the like. In fact, most organizations don’t realize just how much of
this padding is already going on in the estimates that individual
developers learn to generate. But padding doesn’t help, because it
serves to slow down the whole process. And as all development teams
will tell you – time is always short. In fact, excess time pressure is
exactly why they think they have these problems in the first place.

So
we need to find solutions that operate at the systems level to break
teams out of this pincer action. The agile software movement has made
numerous contributions: continuous integration, which helps accelerate
feedback about defects; story cards and kanban that reduce batch size;
a daily stand-up that increases tempo. Continuous deployment is another
such technique, one with a unique power to change development team
dynamics for the better.

Why does it work?

First,
continuous deployment separates out two different definitions of the
terms “release.” One is used by engineers to refer to the process of
getting code fully integrated into production. Another is used by
marketing to refer to what customers see. In traditional
batch-and-queue development, these two concepts are linked. All
customers will see the new software as soon as it’s deployed. This
requires that all of the testing of the release happen before it is
deployed to production, in special staging or testing environments. And
this leaves the release vulnerable to unanticipated problems during
this window of time: after the code is written but before it’s running
in production. On top of that overhead, by conflating the marketing
release with the technical release, the amount of coordination overhead
required to ship something is also dramatically increased.

Under
continuous deployment, as soon as code is written, it’s on its way to
production. That means we are often deploying just 1% of a feature –
long before customers would want to see it. In fact, most of the work
involved with a new feature is not the user-visible parts of the
feature itself. Instead, it’s the millions of tiny touch points that
integrate the feature with all the other features that were built
before. Think of the dozens of little API changes that are required
when we want to pass new values through the system. These changes are
generally supposed to be “side effect free” meaning they don’t affect
the behavior of the system at the point of insertion – emphasis on supposed.
In fact, many bugs are caused by unusual or unnoticed side effects of
these deep changes. The same is true of small changes that only
conflict with configuration parameters in the production environment.
It’s much better to get this feedback as soon as possible, which
continuous deployment offers.

Continuous deployment also acts as
a speed regulator. Every time the deployment process encounters a
problem, a human being needs to get involved to diagnose it. During
this time, it’s intentionally impossible for anyone else to deploy.
When teams are ready to deploy, but the process is locked, they become
immediately available to help diagnose and fix the deployment problem
(the alternative, that they continue to generate, but not deploy, new
code just serves to increase batch sizes to everyone’s detriment). This
speed regulation is a tricky adjustment for teams that are accustomed
to measuring their progress via individual efficiency. In such a
system, the primary goal of each engineer is to stay busy, using as
close to 100% of his or her time for coding as possible. Unfortunately,
this view ignores the overall throughput of the team. Even if you don’t
adopt a radical definition of progress, like the “validated learning about customers
that I advocate, it’s still sub-optimal to keep everyone busy. When
you’re in the midst of integration problems, any code that someone is
writing is likely to have to be revised as a result of conflicts. Same
with configuration mismatches or multiple teams stepping on each
others’ toes. In such circumstances, it’s much better for overall
productivity for people to stop coding and start talking. Once they
figure out how to coordinate their actions so that the work they are
doing doesn’t have to be reworked, it’s productive to start coding
again.

Returning to our development team divided into Cowboy and
Quality factions, let’s take a look at how continuous deployment can
change the calculus of their situation. For one, continuous deployment
fosters learning and professional development – on both sides of the
divide. Instead of having to argue with each other about the right way
to code, each individual has an opportunity to learn directly from the
production environment. This is the meaning of the axiom to “let your
defects be your teacher.”

If an engineer has a tendency to ship too soon, they will tend to find themselves grappling with the cluster immune system, continuous integration server, and five whys master
more often. These encounters, far from being the high-stakes arguments
inherent in traditional teams are actually low-risk, mostly private or
small-group affairs. Because the feedback is rapid, Cowboys will start
to learn what kinds of testing, preparation and checking really do let
them work faster. They’ll be learning the key truth that there is such
a thing as “too fast” – many quality problems actually slow you down.

But
for engineers that have the tendency to wait too long before shipping,
they too have lessons to learn. For one, the larger the batch size of
their work, the harder it will be to get it integrated. At IMVU, we
would occasionally hire someone from a more traditional organization
who had a hard time letting go of their “best practices” and habits.
Sometimes they’d advocate for doing their work on a separate branch,
and only integrating at the end. Although I’d always do my best to
convince them otherwise, if they were insistent I would encourage them
to give it a try. Inevitably, a week or two later, I’d enjoy the
spectacle of watching them engage in something I called “code
bouncing.” It’s like throwing a rubber ball against the wall. In a code
bounce, someone tries to check in a huge batch. First they have
integration conflicts, which require talking to various people on the
team to know how to resolve them properly. Of course, while they are
resolving, new changes are being checked in. So new conflicts appear.
This cycle repeats for a while, until the team either catches up to all
the conflicts or just asks the rest of the team for a general check-in
freeze. Then the fun part begins. Getting a large batch through the
continuous integration server, incremental deploy system, and real-time
monitoring system almost never works on the first try. Thus the large
batch gets reverted. While the problems are being fixed, more changes
are being checked in. Unless we freeze the work of the whole team, this
can go on for days. But if we do engage in a general check-in freeze,
then we’re driving up the batch size of everyone else – which will lead
to future episodes of code bouncing. In my experience, just one or two
episodes are enough to cure anyone of their desire to work in large
batches.

Because continuous deployment encourages learning,
teams that practice it are able to get faster over time. That’s because
each individual’s incentives are aligned with the goals of the whole
team. Each person works to drive down waste in their own work, and this
true efficiency gain more than offsets the incremental overhead of
having to build and maintain the infrastructure required to do
continuous deployment. In fact, if you practice Five Whys too, you can build all of this infrastructure in a completely incremental fashion. It’s really a lot of fun.

One
last benefit: morale. At a recent talk, an audience member asked me
about the impact of continuous deployment on morale. This manager was
worried that moving their engineers to a more-rapid release cycle would
stress them out, making them feel like they were always fire fighting
and releasing, and never had time for “real work.” As luck would have
it, one of IMVU’s engineers happened to be in the audience at the time.
They provided a better answer than I ever could. They explained that by
reducing the overhead of doing a release, each engineer gets to work to
their own release schedule. That means, as soon as they are ready to
deploy, they can. So even if it’s midnight, if your feature is ready to
go, you can check-in, deploy, and start talking to customers about it
right away. No extra approvals, meetings, or coordination required.
Just you, your code, and your customers. It’s pretty satisfying.

(Image source: ciadvantage.com)

Support VatorNews by Donating

Read more from related categories

Related News