Machine Learning Concepts with Java and DeepLearning4J

Introduction to Supervised Learning

Prerequisites: None.

What is Machine Learning?

When we say, ‘Machine Learning’, the ‘Machine’ part usually insinuates a computer program that receives inputs and produces outputs. We often refer to this program as the ‘model’.

Our goal is to ‘train’ that model to produce the correct outputs for given inputs, without explicitly programming it for that particular set of inputs.

During this training process, the model ‘learns’ the mapping between the inputs and the outputs by adjusting its internal parameters. More about it in the next section.

What is Supervised Learning?

One common way to ‘train’ the model is by providing it with a set of inputs, for which the desired output is known. For each of these inputs, we ‘tell’ the model what the desired output is, so it can adjust, or ‘tune’ itself, aiming to eventually produce the desired output for each of the given inputs. This ‘tuning’ is at the heart of the learning process.

But what is actually being tuned? well, each model has parameters that can affect the mapping between the input and the output, and the values of these parameters can be tuned.
For example, if the model was implementing a decision tree, it could contain several IF-THEN sentences formulated as
“IF <input x value> IS LARGER THEN <some threshold value> THEN <go to some target branch>”.

Both the threshold value and the identity of the target branch are parameters that can be adjusted, or ‘tuned’, during the learning process.

Learning, then, is the process of tuning the parameters of the model to achieve the desired results. Each type of model has an accompanying learning algorithm that carries out this tuning.

Since our goal is to have the actual output of the model match the desired output, a typical learning algorithm will measure the difference (or ‘error’) between the actual output and the desired output; the algorithm then will seek to minimize this error.

To summarize, the process of learning is no more than a pretty name for an optimization process, in which a specially tailored algorithm is tuning the parameters of a model, to minimize the error between desired output to actual output.


Classification tasks are a common case of supervised learning. Here we need to decide to which category a certain input belongs. Each category is represented by a single output (called ‘label’), while the inputs are called ‘features’.

For example, in the well-known Iris Flower dataset, there are four features: Petal Length, Petal Width, Sepal Length and Sepal width, representing measurements manually taken off of actual Iris flowers.

At the output end, there are three labels: ‘Iris Setosa’, ‘Iris Virginica’ and ‘Iris Versicolor’, representing three different types of Iris.

When input values are present, representing measurements that were taken from a given Iris flower, we expect the output of the correct label to go ‘high’ and the other two to go ‘low’:

While in the case of the Iris Flower classification task there were three possible outcomes, a particularly common case is binary classification, where only two labels exist. These are sometimes represented by a single ‘binary’ output, where a ‘high’ level output denotes one label, and a ‘low’ output denotes the other label.

Real life examples of classification tasks include approval of bank loans and credit cards, email spam detection, handwritten digit recognition, face recognition and many more.

A classic case of a multi-value classification is handwritten digit recognition. For example, the MNIST database of handwritten digits contains a total of 70,000 samples of handwritten samples of the ten digits. The dimensions of each sample are 28×28 pixels (total of 784 pixels), and each pixel is represented with a grey-level between 0 and 255.

A classifier for this task will have 784 inputs (one for each pixel), whose values can be between 0-255, and ten outputs (one for each digit), whose values are either ‘0’ or ‘1’. For each given sample (784 pixels), we expect the output representing the correct digit to be at ‘1’ and the rest at ‘0’:



In contrast to classification tasks, the model for regression tasks has only one output, providing a continuous value; given the input values, the model is expected to predict the value of the output.


Real life examples of regression include predicting the value of stocks, the quality of wine, or the market price of a house:

Supervised Learning Algorithms

As mentioned above, each supervised learning model consists of a set of tunable parameters and an algorithm that tunes these parameters to achieve the required result.

Some common supervised learning models/algorithms include:

  • Decision trees
  • Random forests
  • Support Vector Machines
  • Neural Networks

The next post will cover Neural Networks in detail.

Tagged , , , ,

About Eyal Wirsansky

Eyal Wirsansky is a senior software developer, an artificial intelligence consultant and a genetic algorithms specialist, helping developers and startup companies learn and utilize artificial intelligence techniques. Eyal is the author of the book 'Hands-On Genetic Algorithms with Python' (Packt).
View all posts by Eyal Wirsansky →

Leave a Reply

Your email address will not be published. Required fields are marked *