Machine Learning Concepts with Java and DeepLearning4J

First DL4J Example – MNIST Classifier with a Single Layer MLP

Prerequisites: Introduction to DeepLearning4J, knowledge of Java.

DLJ4 comes with a large amount of examples. Based on one of them, our first neural network code example is an MLP classifier for handwritten digit recognition.

MNIST Classification Task

The neural network in this example takes on the classification task of the MNIST database of handwritten digits. This database consists of numerous handwritten samples of the ten digits. The dimensions of each sample are 28×28 pixels (a total of 784 pixels), and each pixel is represented with a grey-level value between 0 and 255.

The MNIST database is divided to two sets: 60,000 samples that are used for training a model, and 10,000 that are used to test the trained model.

The Suggested Model

The model used in this example is a MultiLayer Perceptron with a single hidden layer. The input layer consists of 784 nodes, one for each given pixel, and the output layer has 10 neurons representing the ten digit classes. In between them resides the hidden layer, consisting of 1000 neurons. This setup is depicted the diagram below:

The Code

Our sample code resides in, which is part of the Ai4Java Github repository.

The code is setup as a Maven project, and the pom.xml file contains dependencies for DL4J (our Deep Learning library of choice), ND4J (the underlying mathematical library) and SLF4J (for logging).

The main file used for this example is the class SingleLayerMLP; lets take a close look at this class.

Numerical Settings

The first few lines in the main() method are responsible for setting up various numerical values. The rngSeed value serves as a seed for the random number generator used for initializing the model weights as well as for shuffling the dataset samples. Setting the seed to a predetermined value enables us to reproduce the same results whenever we run this example.


The next variables control the batch size and the number of epochs used during the training of the model:


Each epoch is a complete pass over all the training samples; the samples, however, are grouped in batches (sometimes referred to as ‘mini-batches’), all of which are used in every iteration of the training process. In our case, since we have 60,000 training samples and the batch size is set to 125, it will take 480 iterations to complete one epoch.

Data Preparation

To prepare the data, samples are first taken from the MNIST database. The classes DataSet and DataSetIterator, which are part of the ND4J library, enable us to create and manipulate datasets; these datasets are then used for training and testing the DL4J neural network. The MnistDataSetIterator class enables us to access the MNIST database directly:


The second argument in the constructor distinguishes between the train and test portion of the database, while the third argument affects the random shuffling of the samples.

Creating the Neural Network

In the create() method, the MultiLayerConfiguration class encapsulates the information used to create the actual neural network. It uses the Builder class, which presents a fluent interface that helps keep the code concise and readable. After setting the random seed, a couple of parameters are set for the algorithm that is used to train the network. The updater() call determines the algorithm used to update the weights, while l2() sets a regularization coefficient for the weights, helping to keep their values small.


Next comes the part that determines the architecture of the neural network, describing it layer by layer. The call to list() returns a ListBuilder instance, ready to accept the list of layer configurations.


The first layer described, layer 0, is the hidden layer. It implicitly contains the definition of the input layer by setting the nIn() value to the number of inputs. the nOut() value determines the number of nodes in this layer. The activation function of the neurons in this layer is rectified linear unit (ReLU), which is currently the most common choice for use in hidden layers. The weights of this layer are initialized using the Xavier algorithm, ensuring that the initial weights are not too small or too large, so they can propagate the signals through the layers without shutting down or saturating the neurons.

The output layer’s parameters differ in two aspects from those of the hidden layer:

  • This layer is configured with a ‘loss function‘ (also called a ‘cost function’), which is used to calculate the ‘error’ between the actual outputs and the desired outputs. The training algorithm attempts to minimize this ‘error’, by propagating it back through the network’s layers and and adjusting the various weights according to their contribution to the error. Here, the loss function used is Negative Log Likelihood.
  • The activation function is Softmax rather than RELU. Softmax is often used in classifiers’ output layers, as it results with each of the output neurons producing a value between 0 and 1, while the sum of the outputs is exactly 1. This enables us to interpret each output value as the probability of the class represented by that output to be the chosen class, or the ‘confidence’ of the network in that class being chosen for the given inputs. The negative Log Likelihood used as the loss function works well with Softmax, as it enhances errors where the network’s confidence in the (wrong) output is high.

Next come two training-related settings:


Pretraining means setting the initial values of the weights based on previous training (e.g. with a small initial dataset). In feed-forward networks such as the one in this example, it is not particularly useful. Backpropagation is the algorithm of choice for our type of network.

Training the Neural Network

In the train() method, a listener is first set, that will output the network’s score every 100 iterations. Then the training is carried out by repeatedly calling the fit() method of the model with the same training set, once for each epoch:


When running the program (by launching the main() method), the relevant output will look similar to this:


The ScoreIterationListener is called every 100 iterations as was set; and as we calculated earlier, it takes 480 iterations to complete one epoch, which is reflected in the output.

The score value shown at each line represents the value of the loss function, or ‘error’. This value is expected to generally decrease during the training, but will not necessarily decrease at every step of the way.

Evaluating the Trained Neural Network

In the evaluate() method, the trained model is evaluated using the test set, which was not used until this point:


The evaluation is done batch-by-batch, while the results are accumulated and aggregated, using the Evaluation instance:


Then, the stats() method of that instance is called, and outputs a multitude of information. First, the classification results for each of the digits (0..9) are printed out. This is sometimes referred to as the Confusion Matrix:


Finally, the results are summarized using several the standard measures of Accuracy, Precision, Recall and F1 Score, that are calculated from the confusion matrix above:


In our case, these measure are all around 98.4% (or 1.6% error), which is considered reasonable for this type of network working with the MNIST database benchmark, as can be seen here.

What’s Next?

A good way to get a better feel for the way Neural Networks work is to experiment with the values of the various parameters of the model and the training algorithm. For example, the number of nodes in the hidden layer, the l2() and updater() values, the weight-initialization algorithm and the number of epochs. It could also be interesting to add another hidden layer (or several hidden layers) to the network.

In future posts we will see how to visualize the training process and the results achieved, and try out more advanced networks to perform the same task.

About Eyal Wirsansky

Eyal Wirsansky is a senior software developer, an artificial intelligence consultant and a genetic algorithms specialist, helping developers and startup companies learn and utilize artificial intelligence techniques. Eyal is the author of the book 'Hands-On Genetic Algorithms with Python' (Packt).
View all posts by Eyal Wirsansky →

2 thoughts on “First DL4J Example – MNIST Classifier with a Single Layer MLP

  1. Thank you for the article. I’m an experienced Java developer and have an interest in using DL4J for an application idea. I’ve been using the MnistClassifier example in DL4J to train a set of images I’ve extracted from golf scorecards. (FIrst thing I did was write some code to extract the scores into 28×28 images for each player). Anyway, I’ve had some success with a very small training set – I just need to gather more images from other scorecards to have good data for training. But once that is done my goal is to preload the model and evaluation “real” scores against the model to then calculate the golfer’s score. So if I have 18 scores (images of scores) but at runtime do not “know” whether they are 4, 5, 6, etc., – so I don’t have the labels for them yet – can you advise as to what API or object/method/approach to utilize? Thanks.

  2. Hello Mike, and thank you for reading and commenting!
    Can you provide more information about your experiment?
    I am not completely clear about the issue you described. Do you mean that only some of the labels are available at training time?


Leave a Reply

Your email address will not be published. Required fields are marked *