No Multilayer Perceptrons!
By Daniel J. Dick
What are “Perceptrons”?
Around 1957, a Computer Scientist named Frank Rosenblatt at Cornell came up with a machine called, “The Perceptron”. It was an actual, physical machine. You can read more on this on Wikipedia.
This was initially a machine with actual physical motors and potentiometers (volume controls–variable resisters–things that usually have knobs for raising and lowering volumes on radios and amplifiers).
In this device, if I am not mistaken, there was an array of 20x20 photo cells, or 400 inputs. The inputs input was given a volume control, so to speak. So, 400 inputs and 400 volume controls. And now these had little servo motors to turn the volume controls up or down.
Now, you take the outputs of all these volume controls and add them up. And then you put the output through a threshold function. If the output is, say, 1 volt or higher, you call it a win or a “yes”. If it’s less, you call it a loss or a “no”.
Training a Perceptron
Now let’s train this network of volume controls and motors to recognize a G.
Let’s say you have a few training samples:
- An uppercase, lower, cursive upper, cursive lowercase G.
- Same for a W.
- All ten numerals.
And now you start feeding each of these to the Perceptron’s photo cells. You have four situations:
- You show it a G. And it says “yes”.
- You show it a G. It says “no”.
- You show it something other than a G. It says “yes”.
- You show it something other than a G. It says “no”.
For 1 and 4, everything is great. It answered correctly.
For 2 and 3, you have to adjust the volume controls.
- For 2, you need to turn up the volume controls since the Perceptron is saying “no”.
- For 3, you need to turn down the volume controls since the Perceptron is saying “yes”.
But how much?
You want to turn up or down the volume controls for the photocells that are producing the biggest output. But you do not necessarily want to change things too quickly as you would like the Perceptron to stabilize. So you might only want to turn the volume controls 1/2 as much or 1/10 as much depending on what *learning rate* you want.
Multi-Layer Perceptrons
One term that Geoffrey Hinton apologized for coining, one that he regretted coining, was the term “multi-layer perceptron”. This term is such a misnomer and we should probably never use that term. And yet it is easy to understand why somebody would use it.
But, technically, there are no such things as “multi-layer perceptrons”.
What’s wrong with that term?
Perceptrons adjust their weights by multiplying a learning rate by the input values and adding or subtracting the result from the weights.
Normal feed-forward networks are similar in appearance but completely different in how they update weights. It was shown by Minsky and Papert that stacking up purely linear feed-forward layers does not give you any additional abilities without the use of a non-linear activation layer separating the layers. Mathematically, they collapse down to an equivalent to a single layer.
Geoffrey Hinton and Yan Lecunn discovered a backpropagation method involving derivatives. The networks layers could be pure feed-forward with almost any standard non-linear feed-forward layer, or a convolution layer or an recursion layer with or without memory or gates for remembering or forgetting, and still, by setting up an error or loss function, you could back-propagate and adjust the weights throughout all the layers in a batch, mini-batch, or stochastic/online gradient descent manner with various optimizations, regularization, and so on.
But simply put, the so-called “multi-layer perceptron” is not a perceptron.
A Great Video on the Perceptron
I searched YouTube far and wide for a video on the Perceptron that would be fun, easy, and attention holding, and frankly, I didn’t find any except this one. The others seemed more like graduate courses at the university level, and this is with the very beginning, most elementary of topics.
This video below by Paolo Ricciuti from The Coding Train is the most simple and enjoyable video I have seen.
This one, I hope, will be more fun and easy to understand. Please visit his youtube channel, like, and subscribe to appreciate his work on making this available if you like it!
For More on Neural Networks
- Stanford - Neural Networks gives a short, interesting, easy to understand introduction to the perceptron.
- Coursera rigorous courses by Andrew Ng on machine learning.
- Geoffrey Hinton from University of Toronto. My favorite but may be hard to find now.
- Towards Data Science gives a great introduction and history on the perceptron.
- Fast.ai offer a free, easy, practical to get started in machine learning.
I love my professors from Stanford and University of Toronto, and they also presented the subject excellently with great rigor, but the courses are probably better for someone pursuing a graduate degree in artificial intelligence.