Backpropagation is a fundamental algorithm in training artificial neural networks. It is used to adjust the weights of the neural network based on the errors it makes during training.
A neural network is composed of layers of interconnected neurons, and each connection has an associated weight. During training, the network takes input data, makes predictions, compares those predictions to the actual target values, calculates the errors, and then updates the weights to minimize those errors. This process is repeated iteratively until the network's performance improves.
Backpropagation involves two main steps: the forward pass and the backward pass.
Forward Pass: In the forward pass, the input data is fed into the neural network, and the activations are computed layer by layer until the output layer is reached. This process involves a series of weighted sums and activation functions.
Backward Pass: In the backward pass, the errors are propagated backward through the network, and the gradients of the error with respect to each weight are calculated. These gradients indicate how much the error would change if we made small adjustments to the corresponding weight. The goal is to find the direction in which each weight should be adjusted to reduce the overall error.
Now, let's dive into the calculus used in backpropagation with a simple example of a single-layer neural network.
Example: Single-Layer Neural Network Consider a neural network with a single neuron (perceptron) and one input. Let's denote the input as x, the weight of the connection between the input and the neuron as w, the output of the neuron as y, and the target output as t. The activation function of the neuron is represented by the function f.
Forward Pass: The forward pass involves calculating the output of the neuron based on the given input and weight:
y = f(wx)
Backward Pass: In the backward pass, we calculate the gradient of the error with respect to the weight (dw). This gradient tells us how the error changes as we change the weight.
The error (E) between the output y and the target t is typically defined using a loss function (e.g., mean squared error):
E = 0.5 * (t - y)^2
Now, we want to find dw, the derivative of the error with respect to the weight w:
dw = dE/dw
Using the chain rule of calculus, we can calculate dw step by step:
dw = dE/dy * dy/dw
Calculate dE/dy: dE/dy = d(0.5 * (t - y)^2)/dy = -(t - y)
Calculate dy/dw: dy/dw = d(f(wx))/dw
Here, we need to consider the derivative of the activation function f with respect to its argument wx and the derivative of wx with respect to w.
Let's assume f(wx) is a sigmoid activation function: f(wx) = 1 / (1 + e^(-wx))
Then, the derivative of f with respect to its argument is: df/d(wx) = f(wx) * (1 - f(wx))
Now, we have dy/dw: dy/dw = df/d(wx) * d(wx)/dw = f(wx) * (1 - f(wx)) * d(wx)/dw
Calculate d(wx)/dw: wx = w * x d(wx)/dw = x
Now, putting it all together: dw = dE/dy * dy/dw = -(t - y) * f(wx) * (1 - f(wx)) * x
With this gradient, we can update the weight w to minimize the error. The weight update is done using a learning rate (η):
w_new = w_old - η * dw
The learning rate is a hyperparameter that controls the step size in the weight update.
This is the basic idea of backpropagation for a single-layer neural network. In practice, neural networks have multiple layers and more complex architectures, but the core calculus principles remain the same. The process of backpropagation is applied iteratively for each training sample to adjust the weights and improve the network's performance.