As impressive as they may be, the latest AI systems are still no match for humans. Benjamin Grewe pushes for tomorrow’s intelligent machines to learn the way young children do.
Throughout time, people have dreamt of creating human-like intelligent machines. We’ve been hearing recently about GPT3 - a new AI speech system from San Francisco Its developers claim that it can answer general questions, correct and complete texts, and even write them itself, without any task-specific training. GPT3 is so good that the texts it generates can scarcely be distinguished from those written by a human. So what do we make out of this?
Learning (from) the whole Internet
GPT3 is an artificial neuronal network that is trained with a text data set of 500 billion character strings drawn from the entire Internet (filtered), Wikipedia and several digitised book collections. That’s a wealth of knowledge, which humans just can’t match. But what exactly does GPT3 do with this massive data? In what’s known as self-supervised learning, the language network simply learns to generate the next word, based on a given section of text. The algorithm then repeats itself and can predict which word is most likely to come next. In this way it iteratively writes a complete sentences or texts.
Generally speaking, the following holds for modern AI speech systems: the larger the network and the more connections between the artificial neurons, the better they learn. GPT3 has a remarkable 175 billion of such connection parameters. In comparison, Google’s famous BERT network is made up of only 255 million. Yet the human brain has 1014 synaptic connections - which means it outstrips GPT3 by a factor of 10,000!
For me, the many shortcomings of GPT3 demonstrate the problem of modern high-performance artificial neural networks. Grammatically, almost every generated text is perfect; even the content is logically consistent over several sentences. Longer texts, however, often make little sense in terms of content. It’s not enough to just predict the next word. To be truly intelligent, a machine would have to conceptually understand the tasks and goals of a text. The GPT3 language system is thus by no means capable of answering all general questions; it just doesn’t come close to human-like intelligence.
Humans learn more than just statistical patterns
In my opinion, GPT3 also highlights another problem of today’s AI research. Current intelligent systems and algorithms are incredibly good at processing big datasets, recognising statistical patterns or reproducing these. The drawback lies in the extreme specialisation of the learning algorithms. Learning the meaning of a word only from text and using it grammatically correct is not enough. Let’s take "dog", for example; even if we teach a machine that this term relates to other words such as Dachshund, a St. Bernard and a Pug, for humans the word dog resonates with a lot more meaning. Its numerous connotations are derived from a variety of real, physical experiences and memories. This is why the human language system can read between the lines, deduce the writer’s intention and interpret a text.
How humans learn - and what we can learn from it
The Swiss psychologist Jean Piaget described how children develop intellectually throughout the course of childhood. Children learn by reacting to their environment, interacting with it and observing it. In doing so, they pass through various stages of cognitive development that build upon each another. What’s significant here is that sensorimotor intelligence, from the reflex mechanism to targeted action, is the first to develop. Only much later does a child acquire the ability to speak, to relate facts logically or even to formulate abstract, hypothetical thoughts, such as when replaying experiences.
I’m convinced that to make decisive progress in machine learning, we have to orient ourselves towards the way humans learn and develop. Here physical interaction with the environment plays a key role. One possible approach would be: we design or simulate interactive, human-inspired robots that integrate a variety of sensory inputs and learn autonomously in a real or virtual environment. Information from the musculoskeletal system and from visual, auditory and haptic sensors would be then integrated so that consistent schemata can be learned. Once simple schemata have been learned, the algorithm gradually supplements these with an abstract speech system. In this way, the initially learned schemata can be further abstracted, adapted and linked to other abstract concepts.
In summary, children learn fundamentally different compared to today’s AI systems and, although they process quantitatively less data, they still achieve more than any AI. According to its developers, GPT3 is probably reaching the limits of what’s possible with the amount of training data. This also shows that highly specialised learning algorithms with even more data won’t significantly improve machine learning. And by the way, this blog post was written by a human and it’ll be a long while before a machine can do that.