Lenny Rachitsky (00:00:00):
To build great AI products, you need to be really good at building evals. It's the highest ROI activity you can engage in.
Hamel Husain (00:00:05):
This process is a lot of fun. Everyone that does this immediately gets addicted to it. When you're building an AI application, you just learn a lot.
Lenny Rachitsky (00:00:12):
What's cool about this is you don't need to do this many, many times. For most products, you do this process once and then you build on it.
Shreya Shankar (00:00:18):
The goal is not to do evals perfectly, it's to actionably improve your product.
Lenny Rachitsky (00:00:23):
I did not realize how much controversy and drama there is around evals. There's a lot of people with very strong opinions.
Shreya Shankar (00:00:28):
People have been burned by evals in the past. People have done evals badly, so then they didn't trust it anymore, and then they're like, "Oh, I'm anti evals."
Lenny Rachitsky (00:00:36):
What are a couple of the most common misconceptions people have with evals?
Hamel Husain (00:00:39):
The top one is, "We live in the age of AI. Can't the AI just eval it?" But it doesn't work.
Lenny Rachitsky (00:00:45):
A term that you used in your posts that I love is this idea of a benevolent dictator.
Hamel Husain (00:00:49):
When you're doing this open coding, a lot of teams get bogged down in having a committee do this. For a lot of situations, that's wholly unnecessary. You don't want to make this process so expensive that you can't do it. You can appoint one person whose taste that you trust. It should be the person with domain expertise. Oftentimes, it is the product manager.
Lenny Rachitsky (00:01:09):
Today, my guests are Hamel Husain and Shreya Shankar. One of the most trending topics on this podcast over the past year has been the rise of evals. Both the chief product officers of Anthropic and OpenAI shared that evals are becoming the most important new skill for product builders. And since then, ...