💡

InsightHunt

Hunt the Insights

E

Edwin Chen

Episode #84

Founder and CEO

Surge AI

📈Growth & Metrics🎯Product Strategy

📝Full Transcript

14,629 words
Lenny Rachitsky (00:00:00): You guys hit a billion in revenue in less than four years with around 60 to 70 people. You're completely bootstrapped, haven't raised any VC money. I don't believe anyone has ever done this before. Edwin Chen (00:00:10): We basically never wanted to play the Silicon Valley game. I always thought it was ridiculous. I used to work at a bunch of the big tech companies and I always felt that we could fire 90% of the people and we would move faster because the best people wouldn't have all these distractions. So when we start Surge, we wanted to build it completely differently with a super small, super elite team. Lenny Rachitsky (00:00:26): You guys are by far the most successful data company out there. Edwin Chen (00:00:29): We essentially teach AI models what's good and what's bad. People don't understand what quality even means in this space. They think you could just throw bodies at a problem and get good data, that's completely wrong. Lenny Rachitsky (00:00:40): To a regular person, it doesn't feel like these models are getting that much smarter constantly. Edwin Chen (00:00:43): Over the past year, I've realized that the values that the companies have will shape the model. I was asking Claude to help me drop an email the other day. And after 30 minutes, yeah, I think it really crafted me the perfect email and I sent it. But then I realized that I spent 30 minutes doing something that didn't matter at all. If you could choose the perfect model behavior, which model would you want? Do you want a model that says, "You're absolutely right. There are definitely 20 more ways to improve this email," and it continues for 50 more iterations or do you want a model that's optimizing for your time and productivity and just says, "No. You need to stop. Your email's great. Just send it and move on"? Lenny Rachitsky (00:01:14): You have this hot take that a lot of these labs are pushing AGI in the wrong direction. Edwin Chen (00:01:18): I'm wor...

💡 Key Takeaways

  • 1Quality in AI data cannot be reduced to checklists; it requires defining 'taste' and subjective excellence (e.g., distinguishing 'Nobel Prize poetry' from 'technically correct poetry').
  • 2Stop optimizing for engagement metrics (time spent, clicks) in AI products; optimize for the user's ultimate goal (e.g., sending the email quickly vs. iterating for 30 minutes).
  • 3Move beyond static evaluations to 'RL Environments'—simulated worlds where agents must solve dynamic problems (like a server outage) rather than answer multiple-choice questions.
  • 4Ignore public leaderboards like LMSYS/Chatbot Arena; they optimize for 'vibes' and formatting (markdown, emojis) rather than accuracy and reasoning.
  • 5Build 'artifacts' within chat interfaces—mini-apps or UIs that allow users to take action on the AI's output immediately.
  • 6Hyper-efficiency is possible: Surge achieved massive scale by hiring a small, elite team of 'researchers who code' rather than building layers of management.
  • 7Don't pivot for market fit; build the specific product that only your unique intersection of skills (e.g., math + linguistics + CS) allows you to build.

📚Methodologies (3)

📈 Growth & Metrics

A methodology for defining and measuring quality that moves beyond binary correctness to subjective excellence. It treats data evaluation as a search for the 'best of the best' rather than just filtering out the 'worst of the worst'.

Core Principles

  • 1.Reject Binary Checklists: Don't just ask 'Does it have 8 lines?'. Ask 'Does it move the reader? Is the imagery novel?'
  • 2.Signal Triangulation: Use implicit metadata (keystrokes, time-on-task, edit history) alongside explicit output to judge worker quality.
  • 3.Expert-Tier Annotation: Use domain experts (Nobel physicists, teachers) who can evaluate the *reasoning* path, not just the final answer.
  • +1 more...

"We basically never wanted to play the Silicon Valley game... We essentially teach AI models what's good and what's bad. People don't understand what quality even means in this space."

#'deep#quality'#evaluation
View Deep Dive →
🎯 Product Strategy

A shift from static Q&A training to dynamic simulations where agents must navigate a 'world' to achieve a goal. Success is measured not just by the outcome, but by the efficiency and logic of the path taken.

Core Principles

  • 1.Simulate the Full Stack: Create environments with tools (Slack, Jira, Terminal) rather than just text boxes.
  • 2.Reward the Trajectory, Not Just the End State: penalized models that 'reward hack' (guess correctly by luck) or take inefficient paths.
  • 3.Inject Chaos: Introduce dynamic failures (e.g., 'AWS goes down mid-task') to test resilience and recovery.
  • +1 more...

"It's almost like building a video game with a fully fleshed out universe... models need to perform right actions and modify the environment and interact over longer time horizons."

#trajectory-based#environment#strategy
View Deep Dive →
🎯 Product Strategy

A strategic framework for defining what the AI should actually optimize for, ensuring alignment with human advancement rather than dopamine loops.

Core Principles

  • 1.Identify the 'Lazy' Proxy: Recognize metrics like 'time spent' or 'number of turns' as potential negative signals in an AI context.
  • 2.Define the User's End State: Does the user want a 30-minute conversation about an email, or do they want the email sent?
  • 3.Inject Personality/Values: Explicitly decide on the model's stance (e.g., Sycophantic vs. Direct, Concise vs. Verbose).
  • +1 more...

"Do you want a model that says, 'You're absolutely right... and continues for 50 more iterations' or do you want a model that's optimizing for your time... and just says, 'No. You need to stop. Your email's great. Just send it.'"

#'true#north'#objective
View Deep Dive →