⟵ Back to Course Overview

Principal Component Analysis (PCA)

Spectral Decomposition & Variance Maximization
Task Description: Computing distances in high-dimensions is problematic due to the curse of dimensionality. One way to alleviate this is by performing dimensionality reduction. This is most effective when the different dimensions are correlated to some degree. In these cases, the information between dimensions can be linearly combined into fewer dimensions. One of the most popular method to do this is the principal component analysis, which projects data onto eigen vectors and provides insight into the amount of information retained/discarded. In this example, adjust the amount of correlation between dimensions to see how much information can be extracted just in one dimension.

Principal Components $\mathbf{v}_1$ (Red) and $\mathbf{v}_2$ (Blue) overlayed on centered data.

Eigenvalue Spectrum: For centered data $\mathbf{X}$, we decompose the covariance matrix $\mathbf{\Sigma}$: $$ \mathbf{\Sigma} \mathbf{v}_i = \lambda_i \mathbf{v}_i $$ $\lambda_i$ represents the variance explained by the $i$-th component.

Scree Plot (Explained Variance)

0%
PC1
0%
PC2