Edwin Chen

Founder and CEO

Surge AI

📈 Growth & Metrics (1)🎯 Product Strategy (2)

Key Takeaways

1.Quality in AI data cannot be reduced to checklists; it requires defining 'taste' and subjective excellence (e.g., distinguishing 'Nobel Prize poetry' from 'technically correct poetry').
2.Stop optimizing for engagement metrics (time spent, clicks) in AI products; optimize for the user's ultimate goal (e.g., sending the email quickly vs. iterating for 30 minutes).
3.Move beyond static evaluations to 'RL Environments'—simulated worlds where agents must solve dynamic problems (like a server outage) rather than answer multiple-choice questions.
4.Ignore public leaderboards like LMSYS/Chatbot Arena; they optimize for 'vibes' and formatting (markdown, emojis) rather than accuracy and reasoning.
5.Build 'artifacts' within chat interfaces—mini-apps or UIs that allow users to take action on the AI's output immediately.
6.Hyper-efficiency is possible: Surge achieved massive scale by hiring a small, elite team of 'researchers who code' rather than building layers of management.
7.Don't pivot for market fit; build the specific product that only your unique intersection of skills (e.g., math + linguistics + CS) allows you to build.

Methodologies(3)

The 'Deep Quality' Evaluation Framework

by Edwin Chen

📈 Growth & Metrics

A methodology for defining and measuring quality that moves beyond binary correctness to subjective excellence. It treats data evaluation as a search for the 'best of the best' rather than just filtering out the 'worst of the worst'.

Core Principles

1.Reject Binary Checklists: Don't just ask 'Does it have 8 lines?'. Ask 'Does it move the reader? Is the imagery novel?'
2.Signal Triangulation: Use implicit metadata (keystrokes, time-on-task, edit history) alongside explicit output to judge worker quality.
3.Expert-Tier Annotation: Use domain experts (Nobel physicists, teachers) who can evaluate the *reasoning* path, not just the final answer.
+1 more...

"We basically never wanted to play the Silicon Valley game... We essentially teach AI models what's good and what's bad. People don't understand what quality even means in this space."

#'deep#quality'#evaluation

View Deep Dive →

The Trajectory-Based RL Environment

by Edwin Chen

🎯 Product Strategy

A shift from static Q&A training to dynamic simulations where agents must navigate a 'world' to achieve a goal. Success is measured not just by the outcome, but by the efficiency and logic of the path taken.

Core Principles

1.Simulate the Full Stack: Create environments with tools (Slack, Jira, Terminal) rather than just text boxes.
2.Reward the Trajectory, Not Just the End State: penalized models that 'reward hack' (guess correctly by luck) or take inefficient paths.
3.Inject Chaos: Introduce dynamic failures (e.g., 'AWS goes down mid-task') to test resilience and recovery.
+1 more...

"It's almost like building a video game with a fully fleshed out universe... models need to perform right actions and modify the environment and interact over longer time horizons."

#trajectory-based#environment#strategy

View Deep Dive →

The 'True North' Objective Function

by Edwin Chen

🎯 Product Strategy

A strategic framework for defining what the AI should actually optimize for, ensuring alignment with human advancement rather than dopamine loops.

Core Principles

1.Identify the 'Lazy' Proxy: Recognize metrics like 'time spent' or 'number of turns' as potential negative signals in an AI context.
2.Define the User's End State: Does the user want a 30-minute conversation about an email, or do they want the email sent?
3.Inject Personality/Values: Explicitly decide on the model's stance (e.g., Sycophantic vs. Direct, Concise vs. Verbose).
+1 more...

"Do you want a model that says, 'You're absolutely right... and continues for 50 more iterations' or do you want a model that's optimizing for your time... and just says, 'No. You need to stop. Your email's great. Just send it.'"

#'true#north'#objective

View Deep Dive →