Container Orchestration
📖 Tutorial

Is the AI Industry's Transformer Obsession Blocking True AGI?

Last updated: 2026-05-01 09:54:47 Intermediate
Complete guide
Follow along with this comprehensive guide

Big AI labs are pouring billions into transformer models, betting they can scale their way to human-level general intelligence. But skeptics like Ben Goertzel, who coined the term AGI, argue this narrow focus may be a dead end. In this Q&A, we explore the risks, limitations, and alternative paths that could lead to true artificial general intelligence.

1. What are transformer models and why has the AI industry bet so heavily on them?

Transformer models are a type of neural network architecture that has become the foundation of virtually every major AI system today—from ChatGPT to Google Gemini. They use a mechanism called self-attention to process sequential data, excelling at understanding context in language, images, and even code. The industry's bet stems from an empirical observation: scale works. By feeding these models enormous datasets and using backpropagation to adjust millions (or billions) of parameters, performance has consistently improved. This success has created a feedback loop—companies like OpenAI, Google DeepMind, and Microsoft have allocated virtually all their R&D and capital expenditure toward refining and scaling transformers. The logic is simple: if more data and compute yield smarter AI, why stop? But this single-minded focus comes with risks, as we'll explore next.

Is the AI Industry's Transformer Obsession Blocking True AGI?
Source: www.fastcompany.com

2. Who is Ben Goertzel and why is he skeptical of the transformer-only approach?

Dr. Ben Goertzel is a leading AI researcher who literally wrote the book on AGI—his 2005 work Artificial General Intelligence (co-authored with DeepMind co-founder Shane Legg) coined the term. He argues that the commercial AI industry is making a massive strategic error. “They are betting everything on copying GPT in various permutations,” he says, “which is a waste of resources because all these LLMs are doing about the same thing.” In his view, transformers lack key attributes needed for true general intelligence, such as the ability to learn continuously from new experiences (see Question 4). Goertzel believes AGI could emerge within a few years, but only if the field diversifies beyond scaling current models. He points to the homogeneity of research as a warning sign: when one approach works, everyone doubles down, starving alternative paradigms of attention and funding.

3. What are the economic risks of concentrating resources on transformer models?

Training a state-of-the-art transformer now costs billions of dollars in compute alone. For example, GPT-4's training budget reportedly exceeded $100 million. And operating these models at scale requires enormous ongoing energy and hardware investments. The problem is one of diminishing returns. As models grow larger, each incremental gain in intelligence becomes more expensive—a phenomenon known as the “scaling law plateau.” If the returns eventually fail to justify the cost, the entire industry could face a crisis. Because the financial stakes are so high, labs have little room to invest in fundamentally different approaches (like neuromorphic computing or symbolic AI). This creates a dangerous monoculture where failure of the transformer paradigm would mean a catastrophic waste of resources. Goertzel warns that this concentration is “putting all eggs in one basket.”

4. What is the main technical limitation of transformer models for achieving AGI?

The most significant limitation, according to Goertzel, is the absence of continual learning. Humans don’t need to be retrained from scratch every time they encounter new information—we update our knowledge and adapt in real time. Transformers, by contrast, revert to their baseline parameters after each interaction. A chat with an LLM today changes nothing about its internal model for tomorrow. They cannot integrate new experiences into their permanent knowledge base without an expensive retraining cycle. This means that even the largest transformer lacks the ability to truly “learn” from ongoing interactions in a human-like way. Researchers at DeepMind, Microsoft, and Ilya Sutskever’s Safe Superintelligence are exploring alternative architectures (like Hybrid neuro-symbolic systems or Liquid neural networks) that might enable real-time parameter updates. But for now, the mainstream industry remains focused on scaling transformers, which may be a dead end for AGI.

5. Are there existing research efforts exploring alternatives to transformers?

Yes. While the mainstream media focuses on the GPT arms race, several prominent institutions are investing in alternative paradigms. Ben Goertzel notes that Google DeepMind has “incredible diversity within their AI team” including a “deep bench” of experience with architectures like differentiable neural computers, memory-augmented networks, and neurorobotics. Ilya Sutskever’s new venture, Safe Superintelligence, is reportedly exploring continual learning without forgetting. Microsoft Research is investigating neuro-symbolic AI that combine neural networks with logical reasoning. Sakana AI, a Tokyo-based startup, recently released agents that combine multiple frontier models to achieve emergent behaviors. The challenge is that these efforts receive a fraction of the funding directed at scaling transformers. The industry’s financial gravity pulls all resources toward the proven path, making it harder for revolutionary ideas to mature quickly.

6. Could AGI arrive sooner than expected if the industry changes course?

Goertzel remains optimistic that AGI could emerge within the next few years—but not if we stick solely to transformers. He believes that moving beyond simple scaling and embracing architectural diversity could unlock the human-level generalization required for true AGI. For example, adding mechanisms for continual learning and integrating symbolic reasoning could solve problems transformers currently struggle with, like causal understanding and abstraction. Some research suggests that combining transformer-like attention with recurrent loops or spiking neural networks might produce systems that learn from each interaction. If a fraction of the $100+ billion currently invested in compute were redirected toward these novel approaches, breakthroughs could come much sooner. However, this requires a industry-wide willingness to admit that the current path may be insufficient—a cognitive shift that is often hard in a field driven by competitive benchmarks and quarterly results.

7. What practical steps could the AI industry take to diversify its research?

First, funding agencies and corporate labs should allocate a fixed percentage of their budgets (e.g., 20%) to non-transformer architectures. Second, academia and startups focused on alternative AI (neuro-symbolic, evolutionary, probabilistic) need more access to compute—currently dominated by big tech’s transformer-centric clusters. Third, the community should revive benchmarks for AGI-relevant capabilities like continual learning, causal reasoning, and generalization across domains. Currently, most benchmarks measure performance on static datasets, favoring transformers. Fourth, companies like Google DeepMind should openly share findings from their diverse internal projects (as Goertzel notes they have great diversity) to seed new lines of inquiry. Finally, investors must accept longer timelines—true AGI may require a decade of exploration rather than a year of scaling. Without these changes, the AI industry risks investing heavily in a paradigm that may never achieve human-level intelligence.