Scaling Laws, Emergence and the Age of AGI
The miracle of neural scaling laws and emergence ensures that Artificial General Intelligence is destined to happen.
Sam Altman, CEO of OpenAI, just posted a manifesto brimming with optimism about The Intelligence Age. While it might sound ultra-optimistic, I wanted to provide some context.
First, we've never seen software whose intelligence and capability increase simply by throwing more compute and data at it, while keeping the algorithm largely the same. In AI, we call this phenomenon "Neural Scaling Laws."
Two scaling laws are in full swing:
1. Training-Time Scaling: Model performance improves with more training compute, as seen from GPT-2 to GPT-4.
2. Inference-Time Scaling: Performance improves with more inference steps, as seen with models like OpenAI's O1 and multi-agent systems. Training to think more deeply during inference enhances intelligence.
These are incredible discoveries. As Bezos noted on a Lex Fridman podcast, LLMs are a "discovery, not an invention". So are scaling laws.
Scaling laws let us build models with trillions of parameters, trained on massive GPU clusters, and continue scaling them indefinitely. The key factors are:
1. GPUs (#, compute/GPU)
2. Memory for parameters.
3. Computation time.
Researchers believe we can keep scaling these to train ever smarter models. But now we are starting to suspect that we can scale at inference time as well. If you can get more intelligent outcomes by increasing the length of inference iterations, then magical possibilities are in front of us. What if you could run an inference computation with the same compute intensity as today’s training computation?
Richard Sutton, in his 2019 essay "The Bitter Lesson", noted that building AI based on scalable methods that handle complex problems work better than hand design. Recent AI research shows we've found a truly scalable method with Transformers.
One of the amazing things that scale leads to is "emergence"—new skills suddenly appear. For instance, GPT-3, when scaled up, unexpectedly developed sophisticated coding abilities without explicit programming. That powerful skills can emerge just by scaling is astonishing. What new emergence would you see as you keep scaling inference iteration?
Ray Kurzweil's Law of Accelerating Returns posits that technological change is exponential, not linear. He noted we'd experience 20,000 years of progress in the 21st century at today's rate, with exponential growth compounding upon itself.
Emergent AI capabilities in coding and reasoning accelerate technological returns. Our abilities to engineer chips, networks, data centers, and optimize systems are now aided by AI. As humans and models collaborate, everything compounds faster.
If scaling laws and emergence hold, we're in for a whale of a time as a species. Hence, Sam Altman's article.
---
What are your thoughts on the impact of scaling laws and the future of AGI? I'd love to hear your perspectives.

