Gradient Descent

How machine learning models improve through iterative optimisation

Prerequisites:

Derivatives and Gradients Required — Gradient descent is applied calculus. The descent direction is the negative gradient, and the chain rule is what makes multi-layer training possible — both are developed in the Derivatives topic.
Vectors Required — The gradient is itself a vector in weight space; each update subtracts a scaled vector from the current weights. Vector operations (addition, scalar multiplication, magnitude) underpin every step.

foundational

◷ 12 min total ① 1 chapter ⬡ 1 playground

Chapters

foundational ◷ 12 min

The Learning Rate

Every gradient descent step has a size. The learning rate controls that size. Too small and training crawls. Too large and it diverges. Understanding this one parameter unlocks the rest of optimisation theory.

5 sections Start →