Learn how to transform dense data into sparse representations through dictionary learning. Master the optimization framework that learns dictionaries and sparse coefficients simultaneously for efficient high-dimensional data processing.
Sparse representation expresses data as a linear combination of a small number of basis vectors (atoms) from a dictionary. Most coefficients are zero, making the representation efficient to store and compute with.
Given a sample and dictionary, find sparse coefficient such that:
where has mostly zeros (sparse).
Given dense samples where, learn:
Dictionary with atoms (columns). Each atom is a basis vector.
is user-specified (dictionary size). Typically (overcomplete dictionary).
For each sample , find sparse coefficient such that.
Most elements of should be zero (sparse).
Dictionary learning minimizes reconstruction error plus sparsity penalty:
First term: reconstruction error. Second term: L1 sparsity penalty (same as LASSO).
Dictionary learning has no closed-form solution. We use alternating optimization: fix one variable, optimize the other, and iterate.
For each sample , solve:
This is a LASSO problem! Use Proximal Gradient Descent (PGD) to solve.
With fixed , optimize:
where ,, and is the Frobenius norm.
Update each dictionary atom using SVD.
Repeat steps 1-2 until reconstruction error converges. The algorithm alternates between sparse coding (step 1) and dictionary update (step 2).
To update dictionary atom , we decompose the optimization:
For each atom (j-th column of B), fix all other atoms and optimize:
where is the residual matrix (error when using all atoms except ), and is the j-th row of (coefficients for atom ).
Solution: Perform SVD on . The optimal is the left singular vector corresponding to the largest singular value. This ensures the dictionary atom best explains the residual.
Dictionary learning is widely used in image processing. Consider learning a dictionary for 8×8 image patches (64-dimensional vectors).
Extract 10,000 image patches (8×8 pixels = 64 dimensions) from natural images. Learn dictionary with atoms.
After alternating optimization:
Learned dictionary enables image denoising, compression, and inpainting. Sparse representation separates signal from noise and enables efficient storage.
Sparse representation expresses data as a linear combination of few dictionary atoms, enabling efficient storage and computation.
Dictionary learning optimizes both dictionary and sparse coefficients simultaneously through alternating optimization.
Sparse coding (step 1) is a LASSO problem solved with PGD; dictionary update (step 2) uses SVD to find optimal atoms.
Dictionary learning is widely used in image processing, text analysis, and signal processing for denoising and compression.
The L1 sparsity penalty ensures most coefficients are zero, creating interpretable and efficient representations.
Overcomplete dictionaries () provide flexibility for sparse representation of diverse data patterns.