I’m Josh McGrath, a post-training researcher at OpenAI. Lately, my world has been consumed by thinking models and search-related architectures. It’s a bit surreal to be back here—the last time we sat down, we were diving into the guts of GPT-4.1. Since then, it feels like we've lived through an entire generation of AI evolution.
Back during the 4.1 era, we were largely focused on what I’d call "non-thinking" models—specifically API-focused performance. But the focus has fundamentally shifted. We still release those classic models, of course, but the gravity of the research has moved toward something more complex, more deliberate.
"Do I want to make compute efficiency wins of 3%, or do I want to change the behavior by 40%?"
People often ask how I landed in post-training. Before OpenAI, my focus was pre-training data curation. But I started reading the papers and watching the news cycle, and I felt a shift in the air. Pre-training isn’t "dead," but it’s maturing into a game of marginal gains. For me, the excitement wasn't in squeezing out a tiny bit of compute efficiency; it was in the behavioral frontier.
Post-training is where the model actually learns how to *be*. It’s where the raw intelligence of the pre-trained weights is channeled into something useful, conversational, or capable of reasoning. It’s meant many late nights, but when you see a 40% jump in capability because of how you structured the post-training data, those nights feel justified.
![[State of AI Startups] Memory/Learning, RL Envs & DBT-Fivetran — Sarah Catanzaro, Amplify](/_next/image?url=https%3A%2F%2Fassets.flightcast.com%2FV2Uploads%2Fnvaja2542wefzb8rjg5f519m%2F01K4D8FB4MNA071BM5ZDSMH34N%2Fsquare.jpg&w=3840&q=75)