An audio version of my blog post, Thoughts on AI progress (Dec 2025)
The Scaling Paradox
If we’re actually close to a human-like learner, then the current path of "pre-baking" skills into models is a massive, expensive dead end. Either the AGI is imminent and these manual training loops are pointless, or it’s not, and we’re just building better Excel templates.
The PhD Supply Chain
Billions are being paid to experts to manually write reasoning paths. This isn't just "scaling"—it's a massive, brute-force injection of human intelligence into a system that can't yet learn it for itself.
The Learning Gap: Human vs. Frontier Models
It gives me the vibes of that old joke: "We're losing money on every sale, but we'll make it up in volume." We’re scaling an automated researcher that lacks the basic learning capabilities of a child.
Schleppy Training Loops
I was at dinner recently with an AI researcher and a biologist. The biologist was skeptical of short timelines. She described her day: looking at slides and deciding if a dot is a macrophage or just a bit of debris. The AI researcher immediately shot back, "Image classification is a textbook deep learning problem!"
But he missed the entire point. Human labor is valuable precisely because we don't need to build custom training loops for every micro-task.
It’s not productive to build a custom pipeline for how this specific lab prepares slides, and another for the next lab's specific mess, and another for the one after that. What you actually need is an AI that can learn from semantic feedback—"No, that's just a smudge, ignore it"—the way a human does.
The labs' current actions hint at a worldview where models will continue to fare poorly at generalization. They are pre-baking "consultant skills" and "Excel fluency" because they don't trust the model to pick it up on the job.
Diffusion lag is Cope.
People say AIs aren't everywhere because technology takes time to diffuse. I think that's a gloss. If these models were actually "humans on a server," they’d diffuse instantly. We’re using the "diffusion" excuse to hide the fact that the models simply lack the capabilities for broad economic value.
But wait—if the goalposts are moving, is that actually justified? Next, we look at why RL scaling might actually be the move, even if the "diffusion lag" argument is a fantasy...
The Revenue Reality Check
If we really had AGI, why aren't the labs making trillions? The "diffusion lag" argument is starting to look like cope.
People love to talk about how hard it is for big companies to adopt new tech. But let’s be real: an AGI would be the easiest hire in history. It doesn't need three months of onboarding; it reads your entire Slack in minutes and distills every skill from your existing AI fleet instantly.
"The hiring market for humans is a 'lemons market'—it’s risky and expensive. Spinning up a vetted AGI instance? That's zero-risk scaling."
The Trillion-Dollar Disparity
The reason labs are orders of magnitude off this figure? The models aren't human-level yet. Period.
"Goalpost shifting is justified when you realize the goal was smaller than you thought."
The RL Laundering Scheme
There’s this trend of trying to launder the prestige of pre-training scaling laws—which are basically physical laws at this point—to justify bullishness on Reinforcement Learning (RL). But the math doesn't look as clean.
The Broad Metric
"We need something like a million x scale up in total RL compute to give a boost similar to a single GPT level."
— Toby Broad, Researcher
We talk about software singularities where AI writes its own smarter successor, but we’re ignoring the most likely path: the messy, domain-specific grind of experience.
The "Hive Mind" Path
GPT-3 demonstrates Few-Shot learning. We think it’s solved; it wasn't.
Labs release "Continual Learning" features. It’s progress, but not the endgame.
Human-level on-the-job learning finally starts to iron out.
Experience is the Bottleneck
How do humans get better? Experience. Imagine agents deployed in specialized jobs, generating value, and then bringing those learnings back to a "Hive Mind" model for batch distillation.
This isn't a "one and done" achievement. It’s a slow bleed of intelligence into the economy. Satya might call it "game set match," but I suspect it's going to be a much fiercer, more competitive slog than the singularity theorists want to admit.
THE PODIUM ROTATES.
Every month, the Big Three rotate. Talent poaching, the SF rumor mill, and reverse engineering have neutralized any "runaway" advantage. No one is escaping the competition.
"Models keep getting more impressive at the rate that the short timelines people predict, but more useful at the rate that the long timelines predict."
Closing thoughts from the essay at dwarkesh.com

