MathIsimple
Foundational Series

Deep Learning Explorations

From optimization intuition to output-layer probability, this section turns deep learning's core math into something you can actually reason about.

Optimization intuition

Gradient descent, backpropagation, and why signals can move through deep stacks.

Modeling foundations

Activation functions, loss design, and the assumptions hidden inside familiar formulas.

Probability at the output layer

Softmax, cross-entropy, and the gradient identities that make classification trainable.

All
Beginner
Intermediate
Advanced
Gradient Descent
10 min
Gradient Descent, Explained from First Principles

Imagine you're blindfolded and dropped into a valley. Your only goal: reach the bottom.

Gradient Descent
Deep Learning Basics
Optimization
Beginner
Read More
Chain Rule
11 min
The Chain Rule in Deep Learning: What Backpropagation Is Really Computing

How local derivatives multiply along a path and add across branches in a neural network.

Chain Rule
Backpropagation
Jacobian
Beginner
Read More
Linear Regression
12 min
Why Linear Regression Uses Squared Error

The loss looks simple because the statistical story behind it is simple.

Linear Regression
Squared Error
Maximum Likelihood
Beginner
Read More
Activation Functions
10 min
Why Deep Networks Need Activation Functions

Stack enough affine layers without a nonlinearity and the whole network collapses into one layer.

Activation Functions
ReLU
Expressivity
Beginner
Read More
Softmax
13 min
Softmax and Cross-Entropy, Without the Hand-Waving

Why logits become probabilities, why the exponential shows up, and why the gradient becomes prediction minus target.

Softmax
Cross-Entropy
Classification
Intermediate
Read More
PyTorch
11 min
PyTorch Softmax Regression: Why the Code Never Calls Softmax

Why logits go straight into CrossEntropyLoss, and why that is the numerically stable thing to do.

PyTorch
Softmax Regression
CrossEntropyLoss
Intermediate
Read More
Activation Functions
12 min
The Line That Nearly Froze AI: Why Activation Functions Matter

How XOR exposed the limits of linear perceptrons, and why nonlinear activations made deep learning viable.

Activation Functions
XOR
ReLU
Beginner
Read More
PyTorch
13 min
Pure PyTorch MLP Training, Line by Line

From weight initialization to backward() and optimizer.step(), what the training loop is mathematically doing.

PyTorch
MLP
Autograd
Intermediate
Read More
Regularization
13 min
From Overfitting to Regularization: Weight Decay and Dropout

Two regularizers, two mechanisms, one goal: reducing the gap between training performance and real generalization.

Regularization
Weight Decay
Dropout
Intermediate
Read More
L1 Loss
14 min
A Field Guide to Deep Learning Math

A compact reference for losses, Softmax curvature, LogSumExp, and the calculus facts that keep resurfacing.

L1 Loss
L2 Loss
Softmax
Advanced
Read More
Backpropagation
12 min
Neural Network Training: A Step-by-Step Computation Graph Example

Tracing forward propagation, branching gradients, and parameter updates by hand.

Backpropagation
Computation Graph
Gradient Descent
Beginner
Read More
Initialization
13 min
Why Deep Networks Fail to Train: Vanishing Gradients and Initialization

A deep multilayer perceptron can fail long before the optimizer gets a fair chance.

Initialization
Vanishing Gradients
Xavier Initialization
Intermediate
Read More
Distribution Shift
10 min
Distribution Shift: Why Models Fail in the Real World

The model did not get worse; the world just changed.

Distribution Shift
Machine Learning in Production
Covariate Shift
Intermediate
Read More
CNN
11 min
Why MLPs Struggle with Images, and Why CNNs Succeed

Once you flatten an image, you force the model to relearn basic visual structure from scratch.

CNN
Computer Vision
Multilayer Perceptron
Beginner
Read More
CNN
14 min
CNN Forward and Backward Passes: Why Convolution is actually Cross-Correlation

The operation we call 'convolution' is technically cross-correlation, and the backward pass reveals a beautiful mathematical symmetry.

CNN
Cross-Correlation
Backpropagation
Advanced
Read More
CNN
12 min
Padding, Stride, Channels, and Pooling in CNNs

How CNNs purposefully compress spatial information to build high-level semantic understanding.

CNN
Pooling
Channels
Beginner
Read More
AlexNet
10 min
From LeNet to AlexNet: Why Deep CNNs Finally Won

LeNet proved CNNs worked on simple tasks, but AlexNet proved they could conquer the real visual world.

AlexNet
LeNet
Deep Learning History
Beginner
Read More
VGG
12 min
VGG and the Power of Block-Based Network Design

VGG proved that stacking small 3x3 filters was a scalable, systematic way to build deep networks.

VGG
CNN Architecture
Deep Learning
Intermediate
Read More
NIN
11 min
Network In Network (NIN): Upgrading the Local Receptive Field

NIN asked a simple question: What if each local convolution filter was actually a tiny neural network?

NIN
1x1 Convolution
Global Average Pooling
Intermediate
Read More
Ask AI ✨