The Mechanics of Backpropagation

Computational Graph & Gradient Flow
Task Description: Computing analytical gradients can become cumbersome for complex functions. In neural networks, the most common method to compute the gradients of the loss function with respect to the model weights is by using backpropagation (BP). BP operationalises the chain-rule and is able to compute gradients of very complex functions and forms the backbone of optimisation in deep learning. In this task, you can track the different operations in the forward pass, and see how the chain rule is operationalised in the backward pass to compute the gradients.

1. Forward Pass

Logit: $z = x \cdot w$
Activation: $a = \sigma(z)$
Loss: $L = \frac{1}{2}(a - y)^2$

2. Backward Pass

$$\frac{\partial L}{\partial a} = (a - y)$$
$$\frac{\partial a}{\partial z} = \sigma(z)(1 - \sigma(z))$$
$$\frac{\partial z}{\partial w} = x$$
$\frac{\partial L}{\partial w} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial w}$
Result: 0.0