Gradient Descent

1

Initialize

Random weights are assigned. The model knows nothing — it starts at a random point on the loss surface.

2

Compute gradient

At each position, the model measures the slope — which direction makes the error decrease fastest?

3

Update weights

Weights shift in the direction of steepest descent. The learning rate controls how big each step is.

4

Momentum carries

Like a ball rolling downhill, momentum helps the model push through small bumps and flat regions.

5

Approaching a minimum

The gradient shrinks. Steps get smaller. The model is fine-tuning — making precise adjustments to its parameters.

6

Converged

The model has settled into a valley. Weights are stable. This configuration minimizes error — the model has learned.