From optimization intuition to output-layer probability, this section turns deep learning's core math into something you can actually reason about.
Optimization intuition
Gradient descent, backpropagation, and why signals can move through deep stacks.
Modeling foundations
Activation functions, loss design, and the assumptions hidden inside familiar formulas.
Probability at the output layer
Softmax, cross-entropy, and the gradient identities that make classification trainable.
Imagine you're blindfolded and dropped into a valley. Your only goal: reach the bottom.
How local derivatives multiply along a path and add across branches in a neural network.
The loss looks simple because the statistical story behind it is simple.
Stack enough affine layers without a nonlinearity and the whole network collapses into one layer.
Why logits become probabilities, why the exponential shows up, and why the gradient becomes prediction minus target.
A compact reference for losses, Softmax curvature, LogSumExp, and the calculus facts that keep resurfacing.