To set the stage for potential threats from AI systems, Hinton countered common misconceptions about how AI models work, including views that large language models like GPT-4 and others don’t truly understand language or that they are glorified auto-complete. “The idea that these language models just store a whole bunch of text, that they train on them and pastiche them together — that idea is nonsense,” he said.
The little language model from 1985 that could
Instead, Hinton underscored that the large language models are descendants of what Hinton terms a “little language model.” Hinton created this model nearly four decades ago. The fundamental mechanisms of that 1985 model, which predicted the next word in a three-word string, were broadly similar to modern large language models.
Hinton’s early model, despite its simplicity, laid the groundwork for today’s state-of-the-art multimodal models. That model learned to assign features to words, starting with random assignments and then refining them through context and interaction. This process, he maintains, is essentially how modern large language models operate, albeit on a grander scale. Back in 1985, Hinton’s model had just around 1,000 weights and was trained on only 100 examples. Fast forward to today, and “machines now go about a million times faster,” Hinton said. Modern large language models are also vastly larger — with billions or trillions of parameters.
How LLMs mirror our understanding of language
Hinton’s work, along with that of other AI innovators such as Yann LeCun, Yoshua Bengio, and Andrew Ng, laid the groundwork for modern deep learning. A more recent development, the publication of the “Attention Is All You Need” paper in 2017, has profoundly transformed our understanding of language processing and natural language processing (NLP).
The transformer’s self-attention mechanism enables direct modeling of relationships between all words in a sentence, regardless of their position, leading to a significant gain in computers’ ability to understand and replicate human text. Yet the conclusion that such models truly understand language has faced resistance, especially from prominent figures in linguistics such as Noam Chomsky, who maintains that human language acquisition is rooted in innate structures and capacities that AI systems fundamentally lack.
Beyond autocomplete: Debunking the skeptics
Despite the capabilities of generative AI models, widespread skepticism persists. Critics often dismiss these models as merely sophisticated versions of “autocomplete.” Hinton, however, strongly disputes this notion, tracing the fundamental ideas behind today’s models back to his early work on language understanding.
Hinton’s influential efforts involved merging two seemingly incompatible theories of meaning: the semantic feature theory and the structuralist theory. In the semantic feature theory, words are understood through a collection of inherent characteristics. For example, words like “lion” and “tiger” share similar semantic features such as being large, carnivorous, and feline. This theory, rooted in psychology from the 1930s, observed that words with similar meanings have similar semantic features.
On the other hand, there is what Hinton describes as “the structuralist theory, the symbolic AI theory, which is that the meaning of a word is how it relates to other words in your head.” In essence, humans have “a whole bunch of propositions in some logical language, and this word co-occurs with other words in these propositions in various roles, and that’s what defines its meaning.”
Hinton’s breakthrough was in recognizing that “those two theories appear incompatible, but you can actually make them work together.” By combining these perspectives, Hinton laid the groundwork for a more comprehensive understanding of language, which would later drive substantial performance gains in modern AI language models trained on massive troves of texts and other inputs.
The AI language mirror
The implications for creating models that can understand human language were profound, with implications for both computing and understanding how humans acquire language in the first place. “The little language model I introduced in 1985 wasn’t introduced as a technology for doing language,” Hinton explained. “It was introduced as a theory of how people learn what words mean. So actually, the best model we have of how people understand language is these large language models.”
“The symbolic AI people will tell you they’re nothing like us, that we understand language in quite a different way, by using symbolic rules. But they could never make it work, and it’s very clear that we understand language in much the same way as these large language models,” Hinton said.
Evolving from surface-level pattern-matching
“The big language models try to predict the next word using the features of the words in the context using a multi-layer neural net. So it’s not just autocomplete.” How large language models do work is often misunderstood, Hinton said. “Another objection comes from the symbolic AI camp: it’s just autocomplete,” he added. “We understand autocomplete; it was easy to understand.”
Hinton, a British-Canadian, uses “fish and chips” as an example of how autocomplete could work. “If you saw ‘fish and,’ you’d look through and say, ‘Hey, ‘fish and chips’ occurs many times, so ‘chips’ is a good thing to predict.'” This kind of pattern matching, based on frequency and proximity of words, is how traditional autocomplete systems functioned.
“But that’s not how autocomplete works anymore,” Hinton explained. He emphasized that modern large language models have moved far beyond such primitive mechanisms. “The big language models predict the next word using the features of the words in context through a multi-layer neural net.”
In other words, large language models “understand text by taking words, converting them to features, having features interact, and then having those derived features predict the features of the next word — that is understanding,” Hinton said.
On confabulation — in humans and AI
Even the flaws of AI, like the tendency to generate incorrect information, or “hallucinate” or confabulate, parallel human cognition. “Well, people hallucinate all the time… That’s what memory is like in us,” Hinton observed. “It’s also what memory is like in large language models, and that’s why they can confabulate.”
To be sure, AI companies and developers are employing various strategies to reduce hallucinations in large language models. But such confabulations remain a real weakness in both how humans and large language models deal with information. Hinton points out that just as humans often reconstruct memories rather than retrieve exact details, AI models generate responses based on patterns rather than recalling specific facts.
To illustrate this, Hinton recalls the case of John Dean’s testimony during the Watergate scandal. “There’s a wonderful case that’s been highly analyzed with John Dean’s testimony. He testified in Watergate, and he didn’t know there were tapes,” Hinton said. “He testified about all these meetings in the Oval Office. And these meetings never happened. They were confabulated meetings, but he was telling the truth as he remembered it.”
Hinton uses this example to underscore the point that both human memory and AI can produce plausible but inaccurate reconstructions of events. “We don’t store memories; we generate them,” Hinton quipped.
Tell Us What You Think!