Nik's Blog (v2025)

Dimensionality Reduction~2 min read

machine-learning dimensionality-reduction unsupervised-learning pca

Principal Component Analysis (PCA)

A statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

PCA is widely used for dimensionality reduction, feature extraction, and data visualization. It works by identifying the directions (principal components) along which the variance in the data is maximal.

The first principal component accounts for the largest possible variance in the data, and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components.

Covariance Matrix

Where Σ is the covariance matrix, xᵢ is an observation, and x̄ is the mean of the observations.

Eigenvectors and Eigenvalues

Here, v represents the eigenvectors (principal components) and λ represents the eigenvalues (variance explained).

Interactive PCA Visualization

python

Interactive Plot:

💡 Try: hover over elements, zoom with mouse wheel, pan by dragging, use toolbar buttons

Advantages of PCA

→ Reduces dimensionality, combating the curse of dimensionality.
→ Improves model performance by removing noise and redundant features.
→ Facilitates data visualization in lower dimensions.
→ Can speed up machine learning algorithms.

Disadvantages of PCA

→ Loss of interpretability: Principal components are linear combinations of original features.
→ Information loss: Some variance is discarded, potentially losing important details.
→ Sensitive to scaling: PCA is affected by the scale of features.
→ Assumes linearity: PCA works best when relationships between variables are linear.

Key Takeaways

→ PCA transforms data into a new coordinate system based on variance.
→ Principal components are orthogonal and ordered by explained variance.
→ Useful for dimensionality reduction, visualization, and noise reduction.
→ Requires feature scaling and can lead to loss of interpretability.