by Hamel Husain & Shreya Shankar
A qualitative research method adapted for AI logs. Instead of guessing failure modes, builders manually review production traces ('Open Coding') to tag issues freely, then cluster these tags into broader categories ('Axial Coding') to identify high-leverage areas for improvement.
Core Principles
- 1.Read traces manually until 'Theoretical Saturation' (when you stop finding new types of errors).
- 2.Write 'Open Codes' (free-form notes) on the first upstream error you see per trace.
- 3.Use an LLM to synthesize Open Codes into 'Axial Codes' (categories) to find the top failure modes.
"To build great AI products, you need to be really good at building evals. It's the highest ROI activity you can engage in."