The sandbox is an area of the product where certain rules are strictly enforced
I was recently privy to a product prioritization meeting in a
relatively large company. It was fascinating. The team spent an hour
trying to decide on a new pricing strategy for their main product line.
One of the divisions, responsible for the company’s large accounts, was
requesting data about a recent experiment that had been conducted by
another division. They were upset because this other team had changed
the prices for small accounts to make the product more affordable. The
larger-account division wanted to move the pricing in just the other
direction – making the low-end products more expensive, so their large
customers would have an increased incentive to upgrade.
Almost the entire meeting was taken up
with interpreting data. The problem was that nobody could quite agree
what the data meant. Many custom reports had been created for this
meeting, and the data warehouse team was in the meeting, too. The more
they were asked to explain the details of each row on the spreadsheet,
the more evident it became that nobody understood how those numbers had
been derived.
Worse, nobody was quite sure exactly which
customers had been exposed to the experiment. Different teams had been
responsible for implementing different parts of it, and so different
parts of the product had been updated at different times. The whole
process had taken. And by now, the people who had originally conceived
the experiment were in a separate division from the people who had
executed it.
Listening in, I assumed this would be the
end of the meeting. With no agreed-upon facts to help make the
decision, I assumed nobody would have any basis for making the case for
any particular action. Boy was I wrong. The meeting was just getting
started. Each team simply took whatever interpretation of the data
supported their position best, and started advocating. Other teams
would chime in with alternate interpretation that supported their
position, and so on. In the end, decisions were made – but not based on
any actual data. Instead, the executive running the meeting was forced
to make decisions based on the best arguments.
The funny thing to me was how much of the
meeting had been spent debating the data, when in the end, the
arguments that carried the day could have been made right at the start
of the emeting. It was as if each advocate sensed that they were about
to be ambushed; if another team had managed to bring clarity to the
situation, that might have benefited them – so the rational response
was to obfuscate as much as possible. What a waste.
Ironically, meetings like this had given
data and experimentation a bad name inside this company. And who can
blame them? The data warehousing team was producing classic waste –
reports that nobody read (or understood). The project teams felt these
experiments were a waste of time, since they involved building features
halfway, which meant they were never quite any good. And since nobody
could agree on each outcome, it seemed like “running an experiment” was
just code for postponing a hard decision. Worst of all, the executive
team was getting chronic headaches. Their old product prioritization
meetings may have been a battle of opinions, but at least they
understood what was going on. Now they first had to go through a ritual
that involved complex math, reached no definite outcome, and then proceeded to have a battle of opinions anyway!
When a company gets wedged like this, the
solution is often surprisingly simple. In fact, I call this class of
solutions “too simple to possibly work” because the people inside the
situation can’t conceive that their complex problem could have a simple
solution. When I’m asked to work with companies like this as a
consultant, 99% of my job is to find a way to get the team to get
started with a simple – but correct – solution.
Here was my prescription for this
situation. I asked the team to consider creating what I call a sandbox
for experimentation. The sandbox is an area of the product where the
following rules are strictly enforced:
- Any team can create a true split-test experiment that affects only the sandboxed parts of the product, however:
- One team must see the whole experiment through end-to-end.
- No experiment can run longer than a specified amount of time (usually a few weeks).
- No experiment can affect more than a specified number of customers (usually expressed as a % of total).
- Every experiment has to be evaluated based on a single standard report of 5-10 (no more) key metrics.
- Any
team that creates an experiment must monitor the metrics and customer
reactions (support calls, forum threads, etc) while the experiment is
in-progress, and abort if something catastrophic happens.
Putting a system like this in place is
relatively easy; especially for any kind of online service. I advocate
starting small; usually, the parts of the product that start inside the
sandbox are low-effort, high-impact aspects like pricing, initial
landing pages, or registration flows. These may not sound very
exciting, but because they control the product’s positioning for new
customers, they often allow minor changes to have a big impact.
Over time, additional parts of the product
can be added to the sandbox, until eventually it becomes routine for
the company to conduct these rigorous split-tests for even very large
new features. But that’s getting ahead of ourselves. The benefits of
this approach are manifest immediately. Right from the beginning, the
sandbox achieves three key goals simultaneously:
- It forces teams to work cross-functionally. The first few
changes, like a price change, may not require a lot of engineering
effort. But they require coordination across departments – engineering,
marketing, customers service. Teams that work this way are more
productive, as long as productivity is measured by their ability to
create customer value (and not just stay busy).
- Everyone understands the results. True split-test experiments are
easy to classify as successes or failures, because top-level metrics
either move or they don’t. Either way, the team learns immediately
whether their assumptions about how customers would behave were
correct. By using the same metrics each time, the team builds literacy
across the whole company about those key metrics.
- It promotes rapid iteration. When people have a chance to see a
project through end-to-end, and the work is done in small batches, and
has a clear verdict delivered quickly, they benefit from the power of
feedback. Each time they fail to move the numbers, they have a real
opportunity for introspection. And, even more importantly, to act on
their findings immediately. Thus, these teams tend to converge on
optimal solutions rapidly, even if they start out with really bad ideas.
Putting it all together, let me illustrate
with an example from another company. This team had been working for
many months in a standard agile configuration: a disciplined
engineering team taking direction from a product owner who would
prioritize the features they should work on. The team was adept at
responding to changes in direction from the product owner, and always
delivered quality code.
But there was a problem. The team rarely
received any feedback about whether the features they were building
actually mattered to customers. Whatever learning took place was
happening by the product owner; the rest of the team was just
heads-down implementing features.
This led to a tremendous amount of waste,
of the worst kind: building features nobody wants. We discovered this
reality when the team started working inside a sandbox like the one I
described above.
When new customers would try this product,
they weren’t required to register at first. They could simply come to
the website and start using it. Only after they started to have some
success would the system prompt them to register – and after that,
start to offer them premium features to pay with. It was a slick
example of lazy registration and a freemium model. The underlying
assumption was that making it seamless for customers to ease into the
product was optimal. In order to support that assumption, the team had
written a lot of very clever code to create this “tri-mode” experience
(every part of the product had to treat guests, registered users and
paying users somewhat differently).
One day, the team decided to put that
assumption to the test. The experiment was easy to build (although hard
to decide to do): simply remove the “guest” experience, and make
everyone register right at the start. To their surprise, the metrics
didn’t move at all. Customers who were given the guest experience were
not any more likely to register, and they were actually less likely to
pay. In other words, all that tri-mode code was complete waste.
By discovering this unpleasant fact, the
team had an opportunity to learn. They discovered, as is true of many
freemium and lazy registration systems, that easy is not always
optimal. When registration is too easy, customers can get
confused about what they are registering for. (This is similar to the
problem that viral loop companies have with the
engagement loop:
by making it too easy to join, they actually give away the positioning
that allows for longer-term engagement.) More importantly, the
experience led to some soul-searching. Why was a team this smart, this
disciplined, and this committed to waste-free product development
creating so much waste?
That’s the power of the sandbox approach.