The requirements of real biological neural networks are modest compared to the complex deep neural networks used in machine learning, which come with substantial memory and energy demands. The Zenke group have developed a new method in machine learning called Neural Tangent Transfer to make a sparse neural network which performs almost as well as densely connected deep neural network on various learning tasks, but at a heavily reduced computing cost.
The brains of newborn babies are already highly structured, but the connectivity between neurons is very sparse. Out of some 80 billion neurons most of them only talk to about 1,000-10,000 of their peers. Yet, the architecture supports rapid learning as the child grows. To understand the principles that underlie this information processing and rapid learning in neural networks, the Zenke group use computational and theoretical approaches from deep learning - a field of machine learning that uses artificial neural networks, which learn automatically through experience - to investigate what makes biological neural networks so good at learning, despite their sparse connections. This kind of work is important for both neuromorphic engineering - trying to build electronics that are similarly power-efficient to the brain - and to fathom the fundamental principles by which brain networks are connected at a developmental stage, which is crucial to further our understanding of computation in the brain.
How machines learn
The researchers started by building artificial neural network models and trained them to do a specific task. In this case, it was a classification task - in which the network must take input data (like images), process it (recognize patterns in the photos) and give an output (tell us what is in each image). To that end, the real-world input - the images - are translated into a numerical representation, which propagates through the multiple layers of the artificial neural network until it reaches the output layer, where the answer of the classification is represented by whichever output neuron is most active, and signals the answer is ‘dog’ for example.
Fig. 1. A deep neural network carrying out a classification task on a sample of images. Links carry the sig-nal from one node to another, boosting or dampening them according to each connection’s weight. The stronger the weight, the more important the connection is to learning. ( Learning Multiple Layers of Fea-tures from Tiny Images , Alex Krizhevsky, 2009.)
Deep neural networks are great at learning when they have many more connections to begin with than necessary to solve the problem. In many cases, most of the neurons can be pruned after learning, with only a minor effect on their computational performance. These connectivity patterns can be thought of as "winning lottery tickets’. Identifying these winning connectivity patterns in a network that has already been trained is easy - that would be like writing down the lottery numbers after the draw has been shown on TV. But the computational costs are already spent doing this training, so in the end you don’t “win? anything.
The team wanted to see if they could make a sparse neural network with one of these “winning? connectivity patterns that is going to be good at learning, without ever training it - a virtual version of the real biological brains we are born with.
Fig. 2. A representation of the difference between a deep and sparse neural network.
Making a sparse network that works as well as a deep network
To that purpose, they created an algorithm called the Neural Tangent Transfer and used it to determine those connectivity structures - out of all possible configurations with a limited budget of neurons and connections - which could learn best on subsequent classification tasks. They applied the neural tangent transfer algorithm to large deep neural networks and instructed it to decide which 1% of the critical connections to keep and which 99% to prune in order to make a sparse neural network. The algorithm made these decisions using something called the Neural Tangent Kernel - a mathematical function that characterizes the change of the network’s response when it learns from a new piece of information.
Only after this pruning, did the researchers test the newly ‘sparse’ neural networks on various classification tasks and assess how well they did. As hoped, the team found that the sparse neural networks that had undergone Neural Tangent Transfer were capable of carrying out these tasks just as well as initial deep neural networks with 100 times more connections. Zenke said: “We have known for some time now that the initial wiring diagram of a neural network has a profound impact on its ability to learn. Yet, we did not know how to establish effective wiring diagrams from first principles. This work now overcomes this hurdle by proposing a practical algorithm based on a rigorous mathematical framework.’
The work highlights the importance of intricate pre-wiring of neural circuits before learning, especially when these circuits are resource-constrained, i.e., by a limited number of connections. This knowledge also has applications for building biologically inspired neural networks in electronics. Such electronics often operate under resource constraints and having principled ways of setting up the initial connectivity in neuromorphic chips will make these devices more robust and energy efficient.
Additionally, this study is a crucial step in making machine learning a more useful tool for understanding the brain. Deep learning models are among the best models of information processing in the brain, but their architectures differ in important respects. Removing such architectural differences allows the building of better models which can be studied in-silico to give us insights into complex brain function.
These findings were presented by PhD candidate Tianlin Liu at this summer’s International Conference on Machine Learning 2020.
’ View the paper
’ View the talk