💡

InsightHunt

Hunt the Insights

R

Ronny Kohavi

Episode #250

Author, Instructor, Former VP at Airbnb/Microsoft/Amazon

Independent / Maven Course Instructor

📈Growth & MetricsExecution👥Team & Culture

📝Full Transcript

14,481 words
Ronny Kohavi (00:00:00): I'm very clear that I'm a big fan of test everything, which is any code change that you make, any feature that you introduce has to be in some experiment. Because again, I've observed this sort of surprising result that even small bug fixes, even small changes can sometimes have surprising, unexpected impact. Ronny Kohavi (00:00:22): And so I don't think it's possible to experiment too much. You have to allocate sometimes to these high risk, high reward ideas. We're going to try something that's most likely to fail. But if it does win, it's going to be a home run. Ronny Kohavi (00:00:38): And you have to be ready to understand and agree that most will fail. And it's amazing how many times I've seen people come up with new designs or a radical new idea. And they believe in it, and that's okay. I'm just cautioning them all the time to say, "If you go for something big, try it out, but be ready to fail 80% of the time." Lenny (00:01:05): Welcome to Lenny's Podcast, where I interview world-class product leaders and growth experts to learn from their hard win experiences building and growing today's most successful products. Lenny (00:01:14): Today my guest is Ronny Kohavi. Ronny is seen by many as the world expert on A/B testing and experimentation. Most recently, he was VP and technical fellow of relevance at Airbnb where he led their search experience team. Prior to that, he was corporate vice president at Microsoft, where he led Microsoft Experimentation Platform team. Before that, he was director of data mining and personalization at Amazon. Lenny (00:01:38): He's currently a full-time advisor and instructor. He's also the author of the go-to book on experimentation called Trustworthy Online Controlled Experiments. And in our show notes, you'll find a code to get a discount on taking his live cohort-based course on Maven. Lenny (00:01:53): In our conversation, we get super tactical about A/B testing. Ronny shares his advice for when yo...

💡 Key Takeaways

  • 1Most experiments fail (60-90%), so you must optimize for experiment velocity and low marginal cost.
  • 2Do not optimize solely for revenue; use an OEC (Overall Evaluation Criterion) that balances long-term user value.
  • 3If a result looks too good to be true, it is likely a data error (Twyman's Law).
  • 4You need tens of thousands of users to detect large effects, but ~200k+ for robust continuous optimization.
  • 5Trust is the most important currency in an experimentation platform; use guardrail metrics like Sample Ratio Mismatch (SRM).

📚Methodologies (3)

📈 Growth & Metrics

The OEC is a quantitative measure of the experiment's objective. It is a single metric (or a function of metrics) that aligns with the company's strategic goals and is causally predictive of long-term customer lifetime value.

Core Principles

  • 1.Align with Lifetime Value (LTV): Identify short-term metrics that predict long-term success (e.g., successful sessions vs. just clicks).
  • 2.Constraint Optimization: Maximize the goal (e.g., revenue) subject to constraints (e.g., max ad pixels per page).
  • 3.Countervailing Metrics: Always pair a success metric with a 'drag' metric (e.g., email revenue vs. unsubscribe rate).

"It's very easy to increase revenue by doing theatrics... but it hurts the user experience. You have to define the OEC such that it is causally predictive of the lifetime value of the user."

#(overall#evaluation#criterion)
View Deep Dive →
Execution

Twyman's Law states that 'Any figure that looks interesting or different is usually wrong.' This framework requires rigorous validation of outliers before accepting them as true results.

Core Principles

  • 1.Sample Ratio Mismatch (SRM) Test: Check if the ratio of users in Control vs. Treatment matches the design (e.g., 50/50). If not, the experiment is invalid.
  • 2.Hold the Celebration: If a result exceeds normal variance (e.g., +10% lift when +1% is normal), assume it's a bug first.
  • 3.Segment Analysis: Break down the 'win' to see if it's driven by bots, specific browsers, or redirects.
  • +1 more...
#twyman's#trust#validation
View Deep Dive →
👥 Team & Culture

A cultural framework where teams accept a high failure rate (80%+) for bold ideas, document 'surprising' failures to learn, and allocate resources between optimization and moonshots.

Core Principles

  • 1.Accept High Failure Rates: In mature products, 80-90% of experiments will fail to move the metric. This is a feature, not a bug.
  • 2.Document Surprises: Create a searchable library of experiments. Focus on 'surprising' results (where actual differs from predicted), not just wins.
  • 3.Portfolio Allocation: Allocate ~70-80% of resources to incremental wins (low risk) and 20% to high-risk/high-reward bets.
  • +1 more...
#institutional#memory#80/20
View Deep Dive →