makes us over-invest in prevention. It makes us less willing to trust,
to communicate openly, and – most painfully – to take risks. It is the
dominant reason I see teams fall back on “best practices” which may not
be effective, but are at least reassuring. Unfortunately, these actions
generally work to increase the batch size of our work, which magnifies
the consequences of failure and therefore leads to more fear. Reducing
fear is a heuristic we can use to judge process improvements. Anything
that reduces fear is likely to speed up the fundamental feedback loop.
The
interesting thing about fear is that to reduce it requires two
contradictory impulses. First, we can reduce fear by mitigating the
consequences of failure. If we construct areas where experimentation is
less costly, we can feel safer and therefore try new things. On the
other hand, the second main way to reduce fear is to engage in the
feared activity more often. By pushing the envelope, we can challenge
our assumptions about consequences and get better at what we fear at
the same time. Thus, it is sometimes a good idea to reduce fear by
slowing down, and sometimes a good idea to reduce fear by speeding up.
To illustrate this point, I want to excerpt a large part of a recent blog post by Owen Rogers, who organized my recent trip to Vancouver. I spent some time with his company before the conference and discussed ways to get started with continuous deployment, including my experience introducing it at IMVU. He summarized that conversation well, so rather than re-tread that material, I’ll quote it here:
One thing that I was surprised to learn was that IMVU started out with continuous deployment.
They were deploying to production with every commit before they had an
automated build server or extensive automated test coverage in place.
Intuitively this seemed completely backwards to me – surely it would be
better to start with CI,
build up the test coverage until it reached an acceptable level and
then work on deploying continuously. In retrospect and with a better
understanding of their context, their approach makes perfect sense.
Moreover, approaching the problem from the direction that I had
intuitively is a recipe for never reaching a point where continuous
deployment is feasible.Initially, IMVU sought to quickly build a
product that would prove out the soundness of their ideas and test the
validity of their business model. Their initial users were super early
adopters who were willing to trade quality for access to new features.
Getting features and fixes into hands of users was the greatest
priority – a test environment would just get in the way and slow down
the validation coming from having code running in production. As the
product matured, they were able to ratchet up the quality to prevent regression on features that had been truly embraced by their customers.Second,
leveraging a dynamic scripting language (like PHP) for building web
applications made it easy to quickly set up a simple, non-disruptive
deployment process. There’s no compilation or packaging steps which
would generally be performed by an automated build server – just copy
and change the symlink.Third, they evolved ways to
selectively expose functionality to sets of users. As Eric said, “at
IMVU, ‘release’ is a marketing term”. New functionality could be living
in production for days or weeks before being released to the majority
of users. They could test, get feedback and refine a new feature with a
subset of users until it was ready for wider consumption. Users were
not just an extension of the testing team – they were an extension of
the product design team.Understanding these three factors makes
it clear as to why continuous deployment was a starting point for IMVU.
In contrast, at most organizations – especially those with mature
products – high quality is the starting point. It is assumed that users
will not tolerate any decrease in quality. Users should only see new
functionality once it is ready, fully implemented and thoroughly
tested, lest they get a bad impression of the product that could
adversely affect the company’s brand. They would rather build the wrong
product well than risk this kind of exposure. In this context, the
automated test coverage would need to be so good as to render
continuous deployment infeasible for most systems. Starting instead
from a position where feedback cycle time is the priority and allowing
quality to ratchet up as the product matures provides a more natural
lead in to continuous deployment.
The rest of the post, which you can read here, discusses the application of these principles to other contexts. I recommend you take a look.
Returning
to the topic at hand, I think this example illustrates the tension
required to reduce fear. In order to do continuous deployment at IMVU,
we had to handle fear two ways:
- Reduce consequences – by
emphasizing the small number of customers we had, we were able to
convince ourselves that exposing them to a half-baked product was not
very risky. Although it was painful, we focused our attention on the
even bigger risks we were mitigating: the risk that nobody would use
our product, the risk that customers wouldn’t pay for virtual goods,
and the risk that we’d spend years of our lives building something that
didn’t matter – again. - Fear early, fear often – by
actually doing continuous deployment before we were really “ready” for
it, we got used to the real benefits and consequences of acting at that
pace. On the negative side, we got a visceral feel for the kinds of
changes that could really harm customers, like commits that take the
whole site down. But on the plus side, we got to see just how powerful
it is to be able to ship changes to the product at any hour of the day,
to get rapid feedback on new ideas, and to not have to wait for the
next “release train” to put your ideas in action. On the whole, it made
it easier for us to decide to invest in preventive maintenance (ie the Cluster Immune System) rather than just slow down and accept a larger batch size.
Making
this fear-reduction strategy work required more than just the core team
getting used to continuous deployment. We eventually discovered (via five whys)
that we also had to get each new employee acculturated to a fearless
way of thinking. For people we hired from larger companies especially,
this was challenging. To get them over that hurdle, we once again
turned to the “reduce consequences” and “face your fears” duality.
When
a new engineer started at IMVU, I had a simple rule: they had to ship
code to production on their first day. It wasn’t an absolute rule; if
it had to be the second day, that was OK. But if it slipped to the
third day, I started to worry. Generally, we’d let them pick their own
bug to fix, or, if necessary, assign them something small. As we got
better at this, we realized the smaller the better. Either way, it had
to be a real bug and it had to be fixed live, in production. For some,
this was an absolutely terrifying experience. “What if I take the site
down?!” was a common refrain. I tried to make sure we always gave the
same answer: “if you manage to take the site down, that’s our fault for
making it too easy. Either way, we’ll learn something interesting.”
Because
this was such a big cultural change for most new employees, we didn’t
leave them to sink or swim on their own. We always assigned them a
“code mentor” from the ranks of the more established engineers. The
idea was that these two people would operate as a unit, with the
mentor’s job performance during this period evaluated by the
performance of the new person. As we continued to find bugs in
production caused by new engineers who weren’t properly trained, we’d
do root cause analysis,
and keep making proportional investments in improving the process. As a
result, we had a pretty decent curriculum for each mentor to follow to
ensure the new employee got up to speed on the most important topics
quickly.
These two practices worked together well. For one, it
required us to keep our developer sandbox setup procedure simple and
automated. Anyone who had served as a code mentor would instinctively
be bothered if someone else made a change to the sandbox environment
that required special manual setup. Such changes inevitably waste a lot
of time, since we generally build a lot more developer sandboxes than
we realize. Most importantly, we immediately thrust our new employees
into a mindset of reduced fear. We had them imagine the most risky
thing they could possibly do – pushing code to production too soon – and then do it.
Here’s
the key point. I won’t pretend that this worked smoothly every time.
Some engineers, especially in the early days, did indeed take the site
down on their first day. And that was not a lot of fun. But it still
turned out OK. We didn’t have that many customers, after all. And
continuous deployment meant we could react fast and fix the problem
quickly. Most importantly, new employees realized that they weren’t
going to be fired for making a mistake. We’d immediately involve them
in the postmortem analysis, and in a lot of cases it was the newcomer
themselves (with the help of their mentor) who would would build the
prophylactic systems required to prevent the next new person from
tripping over that same issue.
Fear slows teams of all sizes
down. Even if you have a large team, could you create a sandboxed
environment where anyone can make changes that affect a small number of
customers? Even as we grew the team at IMVU, we always maintained a
rule that anyone could run a split-test without excess approvals as
long as the total number of customers affected was below a critical
threshold. Could you create a separate release process for small or
low-risk commits, so that work that happens in small batches is
released faster? My prediction in such a situation is that, over time,
an increasing proportion of your commits will become eligible for the
fast-track procedure.
Whatever fear-reducing tactics you try,
share your results in the comments. Or, if fear’s got you paralyzed,
share that too. We’ll do our best to help.
(Image source: www1.american.edu/TED)