In neural network gradients are effectively error signals used to adjust weight. They travel backward through a neural network during backpropagation and two main type of issues arise:

  • Vanishing Gradients: When gradients become too small as they propagate back, layers stop learning effectively. This can occur in very deep networks where gradient values diminish layer by layer.
  • Exploding Gradients: When gradients grow too large, they can cause erratic updates and instability in training, making it hard to converge on an optimal solution.

Effective gradient flow is essential for training deep networks, allowing each layer to learn by adjusting its parameters based on the error signals. For example in Introduction to Transformers, the [[Introduction to Transformers#Layer normalization (Add and Norm)#Residual connections]] helps to improve the gradient flow