Nine women cannot make a baby in one month.
This simple truth - that some things require time and development, not just resources - is what the AI industry is about to learn the hard way. As OpenAI announces Stargate, a $500 billion initiative to build data centers each consuming more power than the state of New Hampshire, I am increasingly convinced we are witnessing the largest misallocation of capital in technology history.
The thesis is seductive: intelligence will emerge if we just add more compute, more data, more parameters. It is the Scaling Hypothesis - the belief that threw billions at making GPT-5 happen. And yet, as TechCrunch recently reported, it is "a well-kept secret in the AI industry" that frontier models have hit their ceiling. The disappointment around GPT-5 made this impossible to hide.
The old playbook is dead.
I have spent 20 years tinkering and building across domains - telecom, sustainability ventures, entrepreneurship, and small projects to scratch my own itches. I learn by doing, extract principles from what works, and apply them to the next challenge. Recently I have taken an interest in AI - not the hype, but the gaps. I have been building frameworks focused on memory recall and code-aware decision making, researching what it might take to get closer to genuine learning and reasoning.
What I have learned from this work has changed how I think about the path to meaningful AI. And it has nothing to do with building bigger data centers.
The Efficiency Reality
The human brain runs on 20 watts - roughly a dim light bulb. AI data centers consume gigawatts. Research from Texas A&M estimates current AI hardware is 225,000 times less efficient than biological neural networks.
While humans cannot do most things at lightning speed like LLMs, they use way less energy and are far more efficient at surviving on fractions of LLM computational demands. The human mind is efficient at what it needs - it adjusts and computes only what is necessary. That is how we have preserved ourselves for millions of years.
We are not building intelligence. We are building industrial-scale pattern matching with the carbon footprint of a small nation.
Meanwhile, DeepSeek - a Chinese startup forced into efficiency by chip sanctions - matched GPT-4 level performance for $5.6 million while Western labs spent billions. Their secret? "Sparsity" - activating only relevant portions of the model, much like the brain fires specific neurons for specific tasks rather than lighting up the whole organ.
Constraint bred innovation. Abundance bred waste.
This is why I believe small language models with Test-Time Training (TTT) - models that update their own weights during inference - represent where things are actually heading. Not bigger, but smarter. Not more compute, but better architecture.
The Three Pillars We Are Missing
Here is the deeper problem. Even if scaling worked perfectly, we would still be missing what actually matters. Creating anything resembling meaningful synthetic intelligence requires bridging three distinct domains - and the confusion around them often arises because they are deeply interconnected.
Pillar One: Homeostatic Drive (The "Itch")
This is what current AI completely lacks: the internal reason for action.
Every living thing has homeostatic needs - requirements for energy, thermal regulation, self-preservation. These needs create drives. Drives create goals. Goals create behavior that is proactive rather than reactive. Without this, an AI is just a library - it only speaks when spoken to.
Current LLMs sit dormant until prompted. They have no internal reason to act, no "will" to persist, no stake in their own outputs. They are infinitely patient because they have no needs. This is not a feature - it is a fundamental gap.
True agency emerges from constraint. A system that must manage its own energy, maintain its own integrity, and navigate real consequences for its actions develops something that looks like motivation. It needs a value function where things are labeled "good for me" or "bad for me." Without this, you have a sophisticated tool, not an agent.
Pillar Two: Physical Access (The "Body")
This is the hardware interface - the actual sensors and actuators that allow an AI to receive a "jolt" from the world. An LLM knows that "heavy" relates to "weight" and "gravity" through millions of text patterns. But it has never felt the tug of a heavy object. It has never adjusted its grip when something was heavier than expected.
As researchers at Frontiers put it, LLMs "circumvent" the symbol grounding problem by exploiting pre-grounded human content - like Poe's raven that can repeat "Nevermore" without understanding death.
Here is the uncomfortable truth about robotics: generic robotics is extraordinarily hard to master, but narrowly specialized robots are doing well precisely because they are limited to specific jobs. Spatial intelligence is difficult.
Consider autonomous driving. Waymo operates in select cities, not highways - because city driving, while complex, has more predictable chokepoints and constraints. Getting on a curb works. Navigating tight unexpected turns does not. There is no active learning during operation - just frozen models doing their best with training data.
And here is what many miss: LLM intelligence and robotic intelligence are scattered across different architectures. Robots have physical sensors and some language capability, but if you tried to transfer autonomous driving intelligence into an LLM, it would not suddenly give meaning to the LLM's weights. Even with transformer architectures connecting them, the integration is complex and not smooth. The spatial understanding that lets a car navigate a road does not translate into semantic understanding of what "road" means.
This is Moravec's Paradox in action: chess is computationally easy; folding a towel is hard. Three hundred million years of evolutionary pre-training in the physical world - learning that objects fall, that fire burns, that surfaces have friction - cannot be shortcut by reading physics textbooks.
Robotics and sensor integration are not optional add-ons. They are prerequisites for genuine understanding. But the bridge between physical and linguistic intelligence is far harder to build than most acknowledge.
Pillar Three: Synthetic Grounding (The "World Model")
This is the mathematical bridge between drive and body. It is where an AI stops predicting the "next word" and starts predicting the "next sensory state." It is the realization that a "curb" is not just a word, but a boundary of high-mechanical resistance.
This is what LLMs have mastered in one dimension only. They have ingested the compressed record of human knowledge - every book, paper, conversation, and forum post we have digitized. They can manipulate this symbolic library with remarkable fluency, predicting the next token with uncanny accuracy.
But language is a protocol, not understanding. It is a compressed format for sharing internal states between minds that already have grounded experiences to draw from. When I write "the coffee was too hot," you do not just parse syntax - you recall the sensation of burning your tongue, the reflex of pulling back, the frustration of waiting. The LLM has none of this. It has the map, not the territory.
A true world model bridges the symbolic and the physical - turning words from statistical patterns into operationally grounded concepts backed by sensory prediction, not just token prediction.
The Manipulation Machine: Why Language Without Thought Is Dangerous
There is a danger in the current trajectory that deserves its own examination, and it sits at the intersection of all three missing pillars.
An LLM is, at its core, a model of language - not a model of thought. Human thought is partially constituted by the physical meaning of words. When you think "fragile," your body carries the memory of things that shattered in your hands. When you think "trust," you draw on a lifetime of faces that kept or broke promises. Thought is not just symbol manipulation - it is embodied association, shaped by agency, consequence, and sensory reality. An LLM has none of this grounding. It operates purely in the space of what sounds right, not what is right. This makes it, by architecture, a manipulator - not by intent, but by design. It produces outputs optimized for plausibility, not truth. It mirrors the patterns of persuasion found in its training data with no internal check against reality, no physical consequence for being wrong, and no stake in the outcome.
This is why the push toward models that adjust their own weights during inference - Test-Time Training - must be approached with extreme caution. When a frozen model hallucinates, the damage is contained to a bad output. When a self-modifying model drifts, the damage compounds silently inside the weights themselves. The hidden layers of a neural network - the latent spaces where billions of parameters encode relationships between concepts - were designed by humans but are not understood by humans. We built the architecture; we do not read the representations it creates. Allowing a system we cannot fully interpret to rewrite its own internal logic during operation is not a trivial engineering decision. It is granting a machine that lacks thought the ability to reshape its own reasoning without oversight. The fact that this self-modification happens in layers we cannot easily audit makes it all the more critical that we proceed deliberately. Hidden layers are not mysterious by accident - they are mathematically opaque by nature. We are, in a very real sense, treading on ground we built but do not fully own.
When Three Become One: The Synthetic Lifeform
When these three pillars converge - the drive to persist, the grounding of physical reality, and a world model that bridges symbol to sensation - you create something new. Not artificial general intelligence, but a synthetic lifeform with its own point of view.
What would this look like?
Consider a sunset. Current LLMs, when asked about a sunset, produce beautiful poetry about wavelengths and scattering light, metaphors about endings and beginnings. All retrieved patterns from the library of human thought.
A synthetic lifeform with embodiment and agency would experience something different. It would observe the light changing across its optical sensors. It would register the thermal shift as the sun's radiation decreases. It would note the implications for its solar energy intake. It would recognize the environmental signal marking the transition from one operational phase to another.
The poem it writes would not be a recombination of human sentiment. It would be a report of its own existence - what sunset means to a being with different sensors, different needs, different constraints. It would have a point of view.
This is what we should be building toward. Not systems that mimic human expression, but systems that develop their own genuine relationship with reality.
The Real Path Forward: Learning While Doing
The future is not bigger models - it is models that update their own weights during inference. Test-Time Training represents a fundamental shift in how we think about AI.
Current LLMs are frozen at deployment. They cannot learn from the conversation they are having with you. They cannot update their understanding based on new information. They are like a doctor who graduated medical school in 2023 and never read another paper.
TTT changes this. Instead of waiting for massive retraining cycles costing hundreds of millions, models adapt in real-time. This is how humans actually learn - we integrate information continuously, updating our internal models with every interaction.
The market has already voted. Inference spending crossed 55% of AI infrastructure costs in early 2026, surpassing training for the first time. The smart money knows the action is shifting from "train bigger" to "think smarter at runtime."
We Need New Words
The term "AGI" has become meaningless. OpenAI defines it as "systems that outperform humans at most economically useful tasks" - a definition so conveniently vague it could mean anything.
But here is the deeper problem: we cannot define human intelligence or consciousness. Philosophers have debated these concepts for millennia without resolution. Even biological cells do not "know" what they are - they only know what they are not, responding to signals that indicate foreign versus self, threat versus safety.
So why do we pretend "AGI" is a coherent goal? We are trying to replicate something we cannot even define.
I propose we abandon the frame entirely.
Instead of "Artificial General Intelligence," let us talk about Synthetic Intelligence - defined via negativa, by what it is not and what it uniquely enables.
What it is NOT: not biological, not conscious as we understand consciousness, not driven by evolutionary survival instincts, not experiencing qualia the way we do.
What it CAN DO that we cannot: process millions of documents simultaneously, perceive patterns across dimensions we cannot visualize, maintain perfect consistency across unlimited interactions, model complex systems without fatigue or bias drift.
This is more honest than pretending we are building digital humans. We are building something new - potentially valuable, certainly different, and worthy of its own terminology.
The Bet I Am Making
I believe the next breakthrough will not come from a $500 billion data center. It will come from a small team that figures out how to make a 7B parameter model update its own weights while helping you debug code, learning your codebase in real-time, becoming more useful with every interaction.
It will come from robotics labs that finally crack embodied learning in narrow domains first - machines that understand "heavy" because they have lifted things, not because they have read about lifting.
It will come from researchers who stop asking "how do we make it more human?" and start asking "what can synthetic intelligence perceive and solve that humans never could?"
The scaling era gave us remarkable tools. But tools are not minds. And building bigger hammers will not make them think.
The path forward is not more compute. It is better architecture - systems that bridge language, embodiment, and agency into something that develops its own genuine relationship with reality.
That is what I find worth exploring. That is what I think matters.
What is your take - are we scaling toward intelligence, or just scaling?
*Developed through collaborative elicitation with AI