What are Artificial Neural Networks (ANNs)?
- We know that our brain has billions and billions of neurons connected together to process information received through our ears, eyes, and other sensory organs as inputs and gives a response as an output.
- Similarly, artificial neural networks have layers of neurons connected together to process the inputs, learn the task to perform, and give an output.
- An artificial neural network can be thought of as a combination of linear and nonlinear equations which we use to produce the output of our desire by training it on our dataset. We can expect our Neural Network to learn the underlying relations between the input / independent variables and the output / dependent variable.
- We are going to study fully connected neural networks where each and every neuron in a layer is connected to each and every neuron in its next layer. The below image shows a simple fully connected neural network:
Input Layer – Takes Inputs
Hidden Layer – Responsible for processing information
Output Layer – Gives Output
How Does A Neural Network Get Trained?
- In the training stage of the neural network, weights are assigned to each connection between neurons. These weights are learnable parameters that are updated to find the optimal values.
Here, we can see that wa1,wa2,wa3, and wa4 are weights assigned to the connections of the 1st node of the input layer, wb1,wb2,wb3,wb4 are weights assigned to the connections of the 2nd node of the input layer, and so on.
Training Of A Neural Network:
Training of a neural network contains 2 main steps:
Forward propagation is how neural networks make predictions. Input data is “forward propagated” through the network layer by layer to the output layer which makes a prediction.
In backpropagation, we propagate through the neural network backward i.e., from the output layer to the input layer, and update the weights and biases of the neural network.
Let’s understand this in detail with the help of an example.
Let’s consider a neural network with “N” inputs and a single neuron in the hidden layer
Step 1: Forward Propagation
- In forward propagation, the data points from the input layer are propagated to a single neuron where each input is multiplied with its respective weights and then summed together. Each neuron has also an error term called bias. The sum of the bias term and the linear combination of inputs and weights is the input to the single neuron as shown in the below image:
- In this step, we apply a nonlinear function to this linear combination. The functions we apply to these linear combinations are also known as Activation Functions. Activation Functions are supposed to introduce nonlinearity into our Neural Network. Simple linear functions in neural networks might not be helpful in learning complex patterns in data, hence we use non-linear activation functions to be able to learn complex patterns in our data.
In the image above, we have applied a sigmoid function which is one of the activation functions.
What if we have multiple neurons and multiple layers?
If we have multiple layers, each neuron receives the outputs of the previous layer as its inputs. For example, you can see in the below image that the neurons ‘A11’, ‘A12’, ‘A13’ and ‘A14’ receive ‘x1’, ‘x2’ and ‘x3’ as the inputs. And sequentially the neurons ‘A21’, ‘A22’, ‘A23’, and ‘A24’ receive the outputs of ‘A11’, ‘A12’, ‘A13’, and ‘A14’ as their inputs. Each neuron drawn in the below image is an encapsulated representation of the image above, i.e., each neuron is supposed to represent the linear equation and the activation function clubbed together.
Step 2: Calculate the Loss Function
After getting the output as a result from forward propagation, we will calculate the loss using the loss function. The weights and biases are updated in such a way that the loss function is minimized. There can be different types of loss functions depending on the nature of the problem. For example, for regression, we usually use mean squared error and for classification we use cross-entropy.
Step 3: Backpropagation
We try to reflect the error or cost term onto the weights of our Neural Network. Thus the way to do it is, we take the derivative of the cost with respect to a particular weight and then we shift the value of the weights in that direction as has been covered in the Functions And Derivatives section of the Pre Reads.
Where C is the error term and w is the weight we want to modify.
The algorithms used to update the weights and biases are known as Optimizers.
A few well-known optimizers are Gradient Descent, SGD, Batch SGD, etc.
Step 4: Repeat Forward Propagation and Backward Propagation until the cost function is minimized.
We repeat Forward Propagation and Backward Propagation until the cost/objective function is minimized.
The below graphic representation shows a single iteration of forward and backward propagation. In forward propagation, first, calculate the value for each node using the input layer and the activation functions. Secondly, make the predictions using the output layer and calculate the error/loss function using the predicted and the actual labels. In backward propagation, the weights and biases are updated using derivatives to optimize the loss function.