Edwin Chen

Episode #84

Founder and CEO

Surge AI

📈Growth & Metrics🎯Product Strategy

📝Full Transcript

14,629 words

Lenny Rachitsky (00:00:00): You guys hit a billion in revenue in less than four years with around 60 to 70 people. You're completely bootstrapped, haven't raised any VC money. I don't believe anyone has ever done this before. Edwin Chen (00:00:10): We basically never wanted to play the Silicon Valley game. I always thought it was ridiculous. I used to work at a bunch of the big tech companies and I always felt that we could fire 90% of the people and we would move faster because the best people wouldn't have all these distractions. So when we start Surge, we wanted to build it completely differently with a super small, super elite team. Lenny Rachitsky (00:00:26): You guys are by far the most successful data company out there. Edwin Chen (00:00:29): We essentially teach AI models what's good and what's bad. People don't understand what quality even means in this space. They think you could just throw bodies at a problem and get good data, that's completely wrong. Lenny Rachitsky (00:00:40): To a regular person, it doesn't feel like these models are getting that much smarter constantly. Edwin Chen (00:00:43): Over the past year, I've realized that the values that the companies have will shape the model. I was asking Claude to help me drop an email the other day. And after 30 minutes, yeah, I think it really crafted me the perfect email and I sent it. But then I realized that I spent 30 minutes doing something that didn't matter at all. If you could choose the perfect model behavior, which model would you want? Do you want a model that says, "You're absolutely right. There are definitely 20 more ways to improve this email," and it continues for 50 more iterations or do you want a model that's optimizing for your time and productivity and just says, "No. You need to stop. Your email's great. Just send it and move on"? Lenny Rachitsky (00:01:14): You have this hot take that a lot of these labs are pushing AGI in the wrong direction. Edwin Chen (00:01:18): I'm wor...

💡 Key Takeaways

1Quality in AI data cannot be reduced to checklists; it requires defining 'taste' and subjective excellence (e.g., distinguishing 'Nobel Prize poetry' from 'technically correct poetry').
2Stop optimizing for engagement metrics (time spent, clicks) in AI products; optimize for the user's ultimate goal (e.g., sending the email quickly vs. iterating for 30 minutes).
3Move beyond static evaluations to 'RL Environments'—simulated worlds where agents must solve dynamic problems (like a server outage) rather than answer multiple-choice questions.
4Ignore public leaderboards like LMSYS/Chatbot Arena; they optimize for 'vibes' and formatting (markdown, emojis) rather than accuracy and reasoning.
5Build 'artifacts' within chat interfaces—mini-apps or UIs that allow users to take action on the AI's output immediately.
6Hyper-efficiency is possible: Surge achieved massive scale by hiring a small, elite team of 'researchers who code' rather than building layers of management.
7Don't pivot for market fit; build the specific product that only your unique intersection of skills (e.g., math + linguistics + CS) allows you to build.

📚Methodologies (3)

The 'Deep Quality' Evaluation Framework

by Edwin Chen

📈 Growth & Metrics

A methodology for defining and measuring quality that moves beyond binary correctness to subjective excellence. It treats data evaluation as a search for the 'best of the best' rather than just filtering out the 'worst of the worst'.

Core Principles

1.Reject Binary Checklists: Don't just ask 'Does it have 8 lines?'. Ask 'Does it move the reader? Is the imagery novel?'
2.Signal Triangulation: Use implicit metadata (keystrokes, time-on-task, edit history) alongside explicit output to judge worker quality.
3.Expert-Tier Annotation: Use domain experts (Nobel physicists, teachers) who can evaluate the *reasoning* path, not just the final answer.
+1 more...

"We basically never wanted to play the Silicon Valley game... We essentially teach AI models what's good and what's bad. People don't understand what quality even means in this space."

#'deep#quality'#evaluation

View Deep Dive →

The Trajectory-Based RL Environment

by Edwin Chen

🎯 Product Strategy

A shift from static Q&A training to dynamic simulations where agents must navigate a 'world' to achieve a goal. Success is measured not just by the outcome, but by the efficiency and logic of the path taken.

Core Principles

1.Simulate the Full Stack: Create environments with tools (Slack, Jira, Terminal) rather than just text boxes.
2.Reward the Trajectory, Not Just the End State: penalized models that 'reward hack' (guess correctly by luck) or take inefficient paths.
3.Inject Chaos: Introduce dynamic failures (e.g., 'AWS goes down mid-task') to test resilience and recovery.
+1 more...

"It's almost like building a video game with a fully fleshed out universe... models need to perform right actions and modify the environment and interact over longer time horizons."

#trajectory-based#environment#strategy

View Deep Dive →

The 'True North' Objective Function

by Edwin Chen

🎯 Product Strategy

A strategic framework for defining what the AI should actually optimize for, ensuring alignment with human advancement rather than dopamine loops.

Core Principles

1.Identify the 'Lazy' Proxy: Recognize metrics like 'time spent' or 'number of turns' as potential negative signals in an AI context.
2.Define the User's End State: Does the user want a 30-minute conversation about an email, or do they want the email sent?
3.Inject Personality/Values: Explicitly decide on the model's stance (e.g., Sycophantic vs. Direct, Concise vs. Verbose).
+1 more...

"Do you want a model that says, 'You're absolutely right... and continues for 50 more iterations' or do you want a model that's optimizing for your time... and just says, 'No. You need to stop. Your email's great. Just send it.'"

#'true#north'#objective

View Deep Dive →

← Browse All Episodes