💡

InsightHunt

Hunt the Insights

A

Alexander Embiricos

Episode #7

Product Lead for Codex

OpenAI

🚀Career & LeadershipExecution

📝Full Transcript

17,512 words
Lenny Rachitsky (00:00:00): You lead work on Codex. Alexander Embiricos (00:00:01): Codex is OpenAI's coding agent. We think of Codex as just the beginning of a software engineering teammate. It's a bit like this really smart intern that refuses to read Slack, doesn't check Datadog unless you ask it to. Lenny Rachitsky (00:00:12): I remember Karpathy tweeted the gnarliest bugs that he runs into that he just spends hours trying to figure out nothing else has solved, he gives it to Codex, lets it run for an hour and it solves it. Alexander Embiricos (00:00:21): Starting to see glimpses of the future where we're actually starting to have Codex be on call for its own training. Codex writes a lot of the code that helps manage its training run, the key infrastructure. So we have a Codex code review that's catching a lot of mistakes. It's actually caught some pretty interesting configuration mistakes. One of the most mind-blowing examples of acceleration, the Sora Android app, like a fully new app, we built it in 18 days and then 10 days later, so 28 days total, we went to the public. Lenny Rachitsky (00:00:45): How do you think you win in this space? Alexander Embiricos (00:00:47): One of our major goals with Codex is to get to proactivity. If we're going to build a super system, has to be able to do things. One of the learnings over the past year is that for models to do stuff, they're much more effective when they can use a computer. It turns out the best way for models to use computers is simply to write code. And so we're kind of getting to this idea where if you want to build any agent, maybe you should be building a coding agent. Lenny Rachitsky (00:01:04): When you think about progress on Codex, I imagine you have a bunch of evals and there's all these public benchmarks. Alexander Embiricos (00:01:10): A few of us are constantly on Reddit. There's praise up there and there's a lot of complaints. What we can do is as a product team just try to always think ab...

📚Methodologies (3)

The Software Engineering Teammate

by Alexander Embiricos

🚀 Career & Leadership

A conceptual model for the evolution of AI agents, moving from reactive tools to proactive partners. It frames the AI not as a static utility but as a colleague that gains context, trust, and autonomy over time.

Core Principles

  • 1.Treat the agent like a new intern: verify work initially, then build trust.
  • 2.Proactivity: The agent should eventually monitor signals and act without prompts.
  • 3.Contextual Integration: The agent must access tools (Datadog, Slack) to be effective.
  • +1 more...

"We think of Codex as just the beginning of a software engineering teammate. It's a bit like this really smart intern that refuses to read Slack, doesn't check Datadog unless you ask it to."

#software#engineering#teammate
View Deep Dive →
The Three-Layer Agent Stack

by Alexander Embiricos

Execution

A framework for building effective AI agents by synchronizing innovation across three distinct layers: the model intelligence, the API interface, and the product harness. Success requires tight integration rather than treating the model as a black box.

Core Principles

  • 1.Full-Stack Iteration: Features like 'compaction' require changes in Model, API, and Harness simultaneously.
  • 2.Harness Specificity: Agents perform best when the model is trained for the specific environment (e.g., Shell/Terminal vs. bespoke tools).
  • 3.Feedback Loops: Product usage (Harness) must inform model training.
  • +1 more...

"It turns out lets you just do a lot more and try many more experiments as to how these things will work together... shipping this compaction feature... actually meant working across all three things."

#three-layer#agent#stack
View Deep Dive →
Execution

A new software development lifecycle where code is generated rapidly from informal signals ('vibes', chat messages, social media) rather than formal specs. It prioritizes speed and iteration, using agents to prototype and refine code.

Core Principles

  • 1.Signals over Specs: Use existing communication (Slack, Tickets) as prompts.
  • 2.Vibe Code First: Generate rapid, throwaway prototypes to test ideas.
  • 3.Vibe Engineer Second: Refine the prototype into production-quality code (PRs).
  • +2 more...

"Designers... vibe coded a prototype... if we like it, they'll vibe engineer that prototype into an actual PR."

#coding#chatter-driven#development
View Deep Dive →