Deep learning has created a resurgence of interest in neural networks and their application to everything from Internet search to self-driving cars. Results published in the scientific and technical literature show better-than-human accuracy on real-world tasks that include speech and facial recognition. Fueled by modern massively parallel computing technology, it is now possible to train very complex multi-layer neural network architectures on large data sets to an acceptable degree of accuracy. This is referred to as deep-learning, as these multi-layer networks interpose many neuronal processing layers between the input data and the output results calculated by the neural network — hence the use of the word deep in the deep-learning catchphrase. The resulting trained networks can be extremely valuable, as they have the ability to perform complex, real-world pattern recognition tasks very quickly on a variety of low-power devices including sensors, mobile phones, and FPGAs, as well as quickly and economically in the data center. Generic applicability, high accuracy (sometimes better than human), and ability to be deployed nearly everywhere explains why scientists, technologists, entrepreneurs and companies are all scrambling to take advantage of deep-learning technology.
Machine learning went through a similar bandwagon stage in the 1980s where superlatives were lauded on the technology and futurists discussed how machine learning was going to change the world. The genesis of the 1980s machine- earning revolution was a seminal paper by Hopfield and Tank, “Neural Computation of Decisions in Optimization Problems,” which showed that good solutions to a wide class of combinatorial optimization problems could be found using networks of biology-inspired neurons. In particular, Hopfield and Tank demonstrated they could find good solutions to intractably large versions of the NP-Complete traveling salesman problem. The advent of backpropagation by Rummelhart, Hinton and Williams allowed the adjustment of weights in a ‘neurone-like’ network so the network could be trained to solve a computational problem from example data. In particular, the ability of neural networks to adjust their weights to learn all the logic functions required to build a general purpose computer — including the non-linear XOR truth table — showed that artificial neural networks (ANNs) are computationally universal devices that can, in theory, be trained to solve any computable problem. I like to joke that machine learning made me one of the hardest working lazy men you would ever meet, as I was willing to work very hard to make the computer teach itself to solve a complex problem.
Nettalk, a beautiful example by Terry Sejnowski and Charles Rosenberg, showed that it was possible to teach a neural network to perform tasks at a human-level of complexity — specifically to read English text aloud. Even grade-school children immediately grasp the implications of machine learning through the NetTalk example, as people can literally hear the ANN learn to read aloud. Further, it is easy to show that the ANN had ‘learned’ a general solution to the problem of reading aloud, as it could correctly pronounce words that it had never seen before. I use NetTalk as a stellar example of how scientists can create simple and intuitively obvious examples to communicate their research to anyone.
The bandwagon faded for ANNs during the mid-1990s as overblown claims and a lull in the development of parallel computing exceeded both the patience of funding agencies and limited the size and complexity of the problems that could be addressed. Neural networks faded from the scientific limelight, while research continued to both expand and mature the technology. Still, examples such as Star Trek’s Commander Data preserved the popular perception of the potential of neural network technology.
The development of low-cost massively parallel devices like GPUs sparked a resurgence in the popularity of neural network research. Instead of spending $30M to purchase a 60 GF/s (billion flop/s) Connection Machine, modern researchers can now purchase a TF/s (trillion flop/s) capable GPU for around a hundred dollars. The parallel mapping pioneered by Farber on the Connection Machine at Los Alamos allows the computationally expensive training step to very efficiently map any SIMD architecture, be it a GPU or the vector architecture of an Intel Xeon or Intel Xeon Phi processor. Near-linear scalability in a distributed computing environment means that most computational clusters can achieve very efficient, near-peak performance during the training phase. For example, the 1980s mapping used on a Connection Machine was able to achieve over 13 PF/s (1015 flop/s) average sustained performance on the OakRidge National Laboratory Titan supercomputer. The ability to run efficiently on large numbers of either vector or GPU devices means that researchers can work with complex neural networks and large data sets to solve problems — sometimes as well or better than humans.
“The ability to run efficiently on large numbers of either vector or GPU devices means that researchers can work with complex neural networks and large data sets to solve problems — sometimes as well or better than humans.”
Convolutional neural networks (CNNs) are a form of ‘deep’ neural network architecture popularized by Yann LeCun and others in 1998. CNNs are behind many of the deep-learning successes that have been reported recently in image and speech recognition. Again inspired by biology, these neural networks find features in the data that permit correct classification or recognition of the training images without the help of a human. The lack of dependence on prior knowledge and human effort is considered a major advantage of CNNs over other approaches.
It’s difficult to argue with the success of CNNs and deep-learning technology in general. For example, deep-learning-based methods are now better at recognizing faces than humans. Biadu demonstrated accurate speech recognition in noisy environments at the NVIDIA GPU Technology Conference. Companies like Google are operating self-driving test vehicles at a number of locations around the world, as are a number of other technology and automotive companies.
Validation is always a key issue in research and especially when dealing with machine-learning and image-recognition algorithms, as I pointed out in my 2009 Scientific Computing article, Validation: Assessing the Legitimacy of Computational Results, “A surprising challenge with computer vision research is that people quite easily perform visual association and recognition tasks themselves and, hence, can easily fool themselves into mistakenly believing an algorithm or method has general efficacy.” Just like NetTalk, image recognition clearly demonstrates the potential in the technology in an intuitively obvious fashion. It also means that we make assumptions about what the CNN is doing that may not be correct.
For example, an early image recognition project from the 1980s tried to distinguish cars from tanks. The accuracy was very high across a number of data sets, yet the real-world performance was abysmal. It turned out that the pictures of tanks were — on average — taken during a cloudy day while the car pictures were — again on average — taken during a sunny day. Distinguishing cloudy from sunny days turned out to be the main ‘feature’ that the neural net used to distinguish between cars and tanks.
The concern over being fooled is real — especially with deep-learning and image recognition algorithms being utilized in self-driving cars that can veer out of control or make mistakes that can harm people and damage property. Google currently validates their self-driving car software by having the software drive over three million simulated miles a day in the data center in combination with roughly 10,000 to 15,000 miles of actual on-the-road autonomous driving per week. Apparently the data collected during each real-world driving session is added to the data set used for simulated driving. While this sounds impressive, the challenge lies in deciding if that is sufficient validation to allow autonomous cars to operate safely in the real world.
Fail-safes are important. The Netherlands recently started test-driving its first driverless minibuses as a prelude to a planned full-time autonomous passenger service between the towns of Wageningen and Ede in the Gelderland province. The system has a built in fail-safe, as all vehicle movements will be monitored from a central control room. Control of the vehicle can be handed over to a human controller if difficulties arise. Officials acknowledge that further research is also needed into insurance and liability issues, as well as human behavior, traffic control and legislation.
However, CNNs and deep learning represent a fantastic new opportunity, and combined with inexpensive computing hardware, mean that people from all around the world can work with this technology.
“CNNs and deep learning represent a fantastic new opportunity, and combined with inexpensive computing hardware, mean that people from all around the world can work with this technology.”
Numerous tutorials can be found on the Internet. For example, Google is providing a free Udacity deep-learning course. New hardware designs with spectacular performance are also being developed. The IBM SyNAPSE chip, for example, uses 5.4 billion transistors to implement one million programmable neurons that can run in biological real time — yet a single chip requires only 70mW (0.07 watts) of power to operate. The SyNAPSE chip has been called, ‘a bee brain on a chip.’ IBM has even implemented a cluster of these chips that roughly approximates the number of neurons contained in the rat brain. For those who are interested, IBM offers a training class in their ‘SyNAPSE University.’
Rob Farber is a global technology consultant and author with an extensive background in machine-learning and a long history of working with national labs and corporations engaged in both HPC and enterprise computing. He may be reached at editor@ScientificComputing.com.