# How to Train an Artificial Neural Network Part 3 by Ray Sulewski

*Published on: 12/6/2018**Author: Ray Sulewski, Senior Consultant, Award Solutions*

This is Part 3 of a four-part series on training an ANN model. In Part 1 of this series, we explored the data that would be used to train the model. In Part 2, we looked at the training methodology. In Part 3, we will explore the role of the activation function in the model and the role of backpropagation for training the weights.

Activation of the Data

We put the data activation function in the output layer. We have three neurons in the output layer: one for each of the three possible critter classifications.

So, what’s happening in these output layer neurons?

During machine learning, the output layer takes data from the input layer and performs calculations to guess (“predict”) the critter described by the data inputs. In this simple ANN, the output layer is doing two calculations: a summation and an activation function.

The first function implemented in the output layer is the summation of the products of all the feature data and the associated feature weights for each training data sample in each category/class. This is used as input into the activation function.

The activation function helps to separate the important feature data values from the less important feature data values in the training data to learn how to determine the classification of the critters. This is called “activating the data”. The output of the activation function is sent to the next layer. In the critters machine, the output is the final output of the machine.

Let’s start with the summation formula. For each training data set sample, the following summation is performed (e.g. for the Dog neuron in the output layer).

For simplicity, we have highlighted the calculations below that are executed for each training data sample for each possible class.

Where X = sample data, k is the sample number, i is the feature, W*dogi*, W*cati* and W*sqri*are the associated feature weights for the indicated class (dog, cat and squirrel):

Each of the calculations is performed in each of the neurons in the output layer.

Next, the activation function is executed. The activation function we chose to use is Softmax. The Softmax function is:

This function is often used for classification functions. The Softmax activation function squashes the input data to a value between 0 and 1 to calculate a probability value. The output of the Softmax function is a probability distribution, as the sum of the calculations for each of the classes equals 1. The Softmax output values are used to determine the probability that the training data describes a dog, cat or squirrel. The highest number wins.

This Softmax function is executed on each data sample. Here is an example for a dog, where k = sample number.

This is also calculated for cat and squirrel on each data sample.

This formula is calculating the probability of the data sample being one of the three classes that the machine is learning to classify. The probability of the data sample indicating dog, cat or squirrel is output from each of the three output layer neurons.

The model now looks like this:

Processing the Output

Let’s look at what is happening with the final machine output.

Some training software outside the ANN takes the probability values from the Softmax calculation for each data sample in each epoch from the output layer and compares the estimated result to the correct result. The estimated result is a determination of the maximum probability of the three calculated values.

For example, if the estimated probability results were the following:

Then the machine assumes that the training data sample is describing a cat. This is the data received from the output layer.

Since this machine is using supervised learning, the training software receiving the output compares the calculated result to the provided correct result and determines if any weight adjustments need to be made.

About the Author

Ray Sulewski is a Senior Consultant at Award Solutions. He joined Award Solutions in 2006, bringing his expertise in CDMA technologies and overall experience in real-time product development, delivery and support of wireless telecommunications systems. Ray has over 36 years of experience in the wireless telecom industry.

About Award Solutions, Inc.

Award Solutions is the trusted training partner to the world's best networks. We help companies tackle new technologies by equipping their teams with knowledge and skills. Award Solutions invests heavily in technology, research, engineering, and labs to ensure our customers make the most of their resource and network investments.

Award has expertise across all technologies that touch wireless: 5G, Artificial Intelligence, Machine Learning, Network Virtualization, Data Visualization, Data Manipulation, 4G LTE, and more.

Don’t forget to connect with Award Solutions on Twitter and LinkedIn. You can also check out Part 1 and 2 of this article at www.awardsolutions.com.