MathIsimple

CNNs & Computer Vision

6 articles

Convolutional networks, image-recognition architectures, and the engineering choices behind modern visual models.

RNNs & Sequence Modeling

5 articles

Recurrent networks, gated cells, attention, and sequence-to-sequence models for language and time series.

Foundations

13 articles

Gradients, losses, activations, and the math you keep needing once you go past the textbook chapter.

Gradient Descent
10 min
Gradient Descent, Explained from First Principles

Imagine you're blindfolded and dropped into a valley. Your only goal: reach the bottom.

Gradient Descent
Deep Learning Basics
Optimization
Beginner
Read More
Chain Rule
11 min
The Chain Rule in Deep Learning: What Backpropagation Is Really Computing

How local derivatives multiply along a path and add across branches in a neural network.

Chain Rule
Backpropagation
Jacobian
Beginner
Read More
Linear Regression
12 min
Why Linear Regression Uses Squared Error

The loss looks simple because the statistical story behind it is simple.

Linear Regression
Squared Error
Maximum Likelihood
Beginner
Read More
Activation Functions
10 min
Why Deep Networks Need Activation Functions

Stack enough affine layers without a nonlinearity and the whole network collapses into one layer.

Activation Functions
ReLU
Expressivity
Beginner
Read More
Softmax
13 min
Softmax and Cross-Entropy, Without the Hand-Waving

Why logits become probabilities, why the exponential shows up, and why the gradient becomes prediction minus target.

Softmax
Cross-Entropy
Classification
Intermediate
Read More
PyTorch
11 min
PyTorch Softmax Regression: Why the Code Never Calls Softmax

Why logits go straight into CrossEntropyLoss, and why that is the numerically stable thing to do.

PyTorch
Softmax Regression
CrossEntropyLoss
Intermediate
Read More
Activation Functions
12 min
The Line That Nearly Froze AI: Why Activation Functions Matter

How XOR exposed the limits of linear perceptrons, and why nonlinear activations made deep learning viable.

Activation Functions
XOR
ReLU
Beginner
Read More
PyTorch
13 min
Pure PyTorch MLP Training, Line by Line

From weight initialization to backward() and optimizer.step(), what the training loop is mathematically doing.

PyTorch
MLP
Autograd
Intermediate
Read More
Regularization
13 min
From Overfitting to Regularization: Weight Decay and Dropout

Two regularizers, two mechanisms, one goal: reducing the gap between training performance and real generalization.

Regularization
Weight Decay
Dropout
Intermediate
Read More
L1 Loss
14 min
A Field Guide to Deep Learning Math

A compact reference for losses, Softmax curvature, LogSumExp, and the calculus facts that keep resurfacing.

L1 Loss
L2 Loss
Softmax
Advanced
Read More
Backpropagation
12 min
Neural Network Training: A Step-by-Step Computation Graph Example

Tracing forward propagation, branching gradients, and parameter updates by hand.

Backpropagation
Computation Graph
Gradient Descent
Beginner
Read More
Initialization
13 min
Why Deep Networks Fail to Train: Vanishing Gradients and Initialization

A deep multilayer perceptron can fail long before the optimizer gets a fair chance.

Initialization
Vanishing Gradients
Xavier Initialization
Intermediate
Read More
Distribution Shift
10 min
Distribution Shift: Why Models Fail in the Real World

The model did not get worse; the world just changed.

Distribution Shift
Machine Learning in Production
Covariate Shift
Intermediate
Read More

Optimization & Training

3 articles

Why deep loss surfaces are hard, and how SGD, momentum, adaptive methods, and schedulers tame them.