The Core Problem
Traditional autoencoders compress an image into a discrete point in a low-dimensional “latent space.”A Probabilistic Approach
Instead of mapping an input to a single point, a VAE maps it to a probability distribution (specifically a Gaussian).- The Encoder: Predicts parameters of the distribution: Mean () and Variance ().
- The Latent Space: By representing data as overlapping “clouds” rather than points, the space becomes continuous.
The Objective Function: ELBO
To train a VAE, we maximize the Evidence Lower Bound (ELBO). This objective balances reconstruction accuracy with latent space organization.1. Reconstruction Loss ()
This term measures how well the decoded image matches the original. Under a Gaussian assumption, this is typically implemented as Mean Squared Error (MSE):2. KL Divergence
This term forces the predicted distribution to be as close as possible to a Standard Normal Prior . For a univariate Gaussian, the closed-form solution is:The Tug-of-War: wants to separate data to ensure accuracy
(scattering), while wants to pull all data to the center
(overlapping). This tension creates a smooth, navigable latent space.
The Reparameterization Trick
In standard backpropagation, you cannot flow gradients through a random sampling operation (). To solve this, we move the randomness to an external variable .Mathematical Deduction
We define the latent vector as a deterministic function: By treating as a constant during the backward pass, we can calculate gradients for and directly:Capabilities & Trade-offs
Smooth Interpolation
You can “walk” between two latent vectors to seamlessly blend features
(e.g., changing a smile to a frown).
Data Generation
Generate entirely new samples by drawing random vectors from the standard
normal prior.
Limitations
- Blurriness: VAEs tend to produce softer images than GANs. This is because loss encourages the model to “average” its predictions when uncertain.
- Inference: While foundational for models like Stable Diffusion, vanilla VAEs struggle with high-resolution, sharp details without advanced modifications like VQ-VAEs.
Resources
- Original Paper: Auto-Encoding Variational Bayes
- Concepts: ELBO, Reparameterization Trick, Latent Variables.