Prerequisites: Introduction to Supervised Learning.
Inspired by the structure of the human brain, Neural Networks are among the most commonly used models for supervised learning tasks. The building blocks of these networks are simple ‘cells’, or ‘neurons’, which can be interconnected in various ways.
A single neuron has multiple inputs and a single output, and can be represented by the following model:
To calculate the output of the neuron, each of the input values is multiplied with a matching weight; the results are accumulated, and a ‘bias’ (or a ‘threshold’) value is subtracted from the sum. The result is then mapped to the output via a non-linear ‘activation’ function that aims to ‘push’ the output value towards one of the ends of the scale – either ‘high’ or ‘low’ for any given input. This mechanism aims to emulate the operation of the actual brain neuron: when the weighted sum of the inputs is above a certain threshold, the neuron ‘fires’ (sends a series of pulses from its output).
As mentioned in Introduction to Supervised Learning, each supervised learning model consists of a set of tunable parameters and an algorithm that tunes these parameters to achieve the required result. In this simple model, also called ‘Perceptron’, the parameters are the weights of the inputs and the bias value; the Perceptron algorithm adjusts these parameters in an attempt to reduce the error between the actual output value and the required one. Similarly, in the actual brain, changes in the weights and threshold values are associated with learning and memorizing activities.
A more capable model can be constructed using several layers of interconnected neurons, as shown in the following neural classifier:
The nodes in the input layer only serve to receive the feature values and connect each of them to every neuron in the next layer. Therefore, the number of nodes in the input layer is determined by the number of features, but no actual operation takes place in these nodes.
In a neural classifier, each node in the output layer represents one of the possible labels; thus, the number of the nodes in this layer is determined by the number of labels. A Softmax function can be applied to the nodes of the output layer, to guarantee that the values of each output is between 0 and 1, and the sum of the output values is 1; this enables us to interpret the value of each output as the probability of the matching label to be the correct one for the current input.
The true power and complexity of this classifier comes from the (one or more) ‘hidden’ layers. Each of these layers could be of an arbitrary size. As the number of hidden layers grows, the network gets ‘deeper’ and is capable of performing an increasingly more complex and non-linear mapping between the network’s inputs and the outputs.
The family of learning algorithms that are typically used to adjust the parameters of the multilayer perceptron models is called Backpropagation. The idea behind these algorithms is to minimize the error between the actual outputs and the desired ones by ‘propagating’ the output error back into the various layers of the model, from the output layer inwards. the various weights are adjusted, so that the weights that are more ‘responsible’ for the error get adjusted the most.
Advanced Topics in Neural Networks
Advances made in recent years have turned the Backpropagation algorithms extremely powerful; this has enabled the use of a large number of hidden layers, creating particularly ‘deep’ neural networks; hence the name ‘Deep-Learning’.
In a ‘deep’ neural network, each layer combines several simpler abstract concepts that were learned by the nodes of the previous layer, and produces higher level concepts. For example, in an image processing task of a person’s face, the first layer may learn to detect edges in different orientations. The next layer will combine these into lines and corners. The next layer will be able to detect more complex shapes, and so on, up to the point of detecting facial features such as nose and lips, and then combining these into the complete concept of a face.
Convolutional Neural Networks
The Multilayer Perceptron discussed earlier is sometimes called a “Fully Connected Neural Network”, as each node is connected to every node in the following layer. When the input values represent image pixels, the count of nodes and connections can become excessive. Inspired by biological processes, Convolutional Neural Network (CNN or ConvNet) models are able to reduce this count by making the assumption that the inputs represent two-dimensional data (e.g. images), which allows them, for example, to treat nearby pixels differently than pixels that are far apart. As a result, these models have proved especially successful when it comes to image processing tasks.
The CNN architecture utilizes three main types of layers. Besides fully connected layers, similar to the hidden layers in the Multilayer Perceptron, and pooling (down-sampling) layers, which aggregate several outputs of neurons from the preceding layer into a single neuron in the next layer, the most important type is the convolutional layer. A convolutional layer has neurons arranged in 3 dimensions (width, height and depth), and consists of several ‘filters’, where each filter is responsible for detecting a single feature (such as an edge in a particular orientation). The filter covers a small area of the input (called the ‘receptive field’), and ‘slides’ that field over the entire two dimensional input, effectively scanning the input for that particular feature. This way, the filter applies the same small set of weights to the entire input.
Capsule Networks (or ‘Caps Net’) is a new, experimental neural network architecture that holds the promise to improve upon the performance of Convolutional Networks in image processing tasks. Some of the anticipated advantages are reducing the amount of training data, better handling of crowded scenes, and preserving image information that CNNs tend to lose, such as precise position and size of objects and relationship between objects.
The building blocks of these networks are small groups of neurons called ‘capsules’; each capsule learns to detect a particular object (such as a triangle) and emits information regarding the presence of this object and its orientation. Capsules in consequent layers attempt to assemble more complex objects, using the simpler objects as building blocks. This is done by utilizing a ‘routing by agreement’ algorithm that matches the orientation information of building blocks to decide if they belong together.
Recurrent Neural Networks and LSTM
The neural network we have seen so far are all considered ‘feed-forward’ networks, as the information flows in one direction, from the inputs to the outputs. In contrast, recurrent neural networks (RNN) can have outputs connected back into the inputs. This way, previous output values can be taken into an account, in addition to the current input values. This enables complex behaviors, such as memory and maintaining a state, which makes these networks suitable for processing a sequence of events and predicting the next events.
RNNs typically consist of a chain of repeating simple modules of neural network, where the output of each module is connected to the input of the next, along with an external input:
Many of the RNNs’ applications are in the field of Natural Language Processing (NLP) – such as language translation, speech to text and sentiment analysis. RNNs can also perform ‘creative’ tasks such as automatically generating image captions and composing music.
The most commonly used type of RNN is currently the LSTM (Long Short-Term Memory) Networks, which are capable of learning long-term dependencies.