How are eigenvectors used in machine learning?

Principal Component Analysis (PCA) computes the eigenvectors of the data's covariance matrix. Each eigenvector is a direction in feature space; its eigenvalue is the variance of the data projected onto that direction. Sorting by eigenvalue and keeping only the top k eigenvectors gives the best k-dimensional compression of the original data. In transformers, attention heads can also be analysed through the eigenstructure of the attention weight matrices to understand which directions in token embedding space they amplify.

Matrices Chapter 3 of 3 · tap to browse

01 What Does a Matrix Do? 02 Matrix Multiplication 03 Eigenvectors

Eigenvectors — Directions That Don't Rotate

Q: What is an eigenvector?

An eigenvector of a matrix A is a nonzero vector v such that Av = λv — applying A to v gives back the same direction, scaled by λ. Geometrically, the eigenvector stays on its line through the origin: the transformation only stretches or flips it, never rotating it to a different direction. Every matrix has at most n independent eigenvectors in n-dimensional space.

Q: What does the eigenvalue tell you?

The eigenvalue λ measures how much the eigenvector stretches. λ = 2 doubles the vector's length in that direction. λ = 0.5 halves it. λ = −1 flips the direction without changing length. λ = 0 collapses the eigenvector to zero — the matrix destroys all information in that direction, making it singular.

Q: Does every matrix have eigenvectors?

Every 2×2 matrix has eigenvalues, but they may not be real numbers. A pure rotation matrix (other than 0° or 180°) has complex eigenvalues — no real eigenvector exists because every real vector changes direction under rotation. Symmetric matrices are the important special case: they always have real eigenvalues and their eigenvectors are perpendicular to each other.

Some vectors only stretch — never rotate. Those directions encode the geometry of every matrix.

Principal Component Analysis — the algorithm behind face recognition, genomic analysis, and data visualisation — finds the eigenvectors of a dataset's covariance matrix. Each principal component is an eigenvector, and its eigenvalue is the variance of the data along that direction.

Learning Objectives

1 Recognise the eigenvector equation Av = λv and identify what each symbol represents: A is the matrix, v is the eigenvector, and λ is the eigenvalue.
2 Explain geometrically what an eigenvector is: a direction that stays on the same line through the origin after the transformation, only changing in length.
3 Interpret the sign and magnitude of an eigenvalue to predict how the transformation acts on its eigenvector: positive stretches or shrinks, negative flips, zero collapses to zero.
4 Explain how PCA uses eigenvectors: the principal components are the eigenvectors of the data's covariance matrix, sorted by eigenvalue from largest to smallest.

¶ Narrative

Directions That Don't Rotate

Apply a matrix to most vectors and something changes: the vector rotates to a new direction, not just a new length. But a few special directions survive the transformation with their direction intact — they only stretch or shrink, never rotating. These are eigenvectors.

The word “eigen” is German for “own” or “characteristic.” An eigenvector is a matrix’s own direction — the direction the transformation acts along most simply.

The eigenvector equation

If v is an eigenvector of matrix A with eigenvalue λ, then:

The eigenvector equation

A v = λ v

Reading this left to right: multiply the matrix A by the vector v, and you get back the exact same vector v, scaled by the scalar λ. The direction of v is unchanged. Its length is multiplied by |λ|.

A is the matrix, v is the eigenvector (a direction preserved by the transformation), and λ is the eigenvalue (the scalar stretch factor). To see this concretely, take A = [[2, 1], [1, 2]] and test three directions. Applying A to (1, 1):

Worked example: eigenvector check

A (11) = (2 \cdot 1 + 1 \cdot 1 1 \cdot 1 + 2 \cdot 1) = (33) = 3 (11)

The output (3, 3) is exactly 3 times the input (1, 1) — same direction, three times longer. So (1, 1) is an eigenvector with λ = 3. Testing (1, 0): the output is (2, 1), which points in a different direction — not an eigenvector. Testing (1, −1): A(1, −1) = (1, −1), so λ = 1 and the vector is completely unchanged.

Second eigenvector

A (1 - 1) = (2 \cdot 1 + 1 \cdot (- 1) 1 \cdot 1 + 2 \cdot (- 1)) = (1 - 1) = 1 \cdot (1 - 1)

A = [[2, 1], [1, 2]] applied to three test directions. (1, 0) changes direction — not an eigenvector. (1, 1) stretches by λ = 3. (1, −1) is unchanged at λ = 1.

What eigenvalues mean

The eigenvalue λ determines the effect on the eigenvector’s length and direction:

λ value	Effect on eigenvector	Example
λ > 1	Stretches in that direction	λ = 2 doubles the length
0 < λ < 1	Shrinks in that direction	λ = 0.5 halves the length
λ = 1	Eigenvector is fixed — no change	Identity matrix: every vector is an eigenvector
λ = −1	Flips direction, same length	Reflection through origin along that axis
λ = 0	Collapses to zero — direction is destroyed	Singular matrix: that direction is lost

Why this matters in machine learning: eigenvalues tell you which directions carry the most information and which carry the least. In Principal Component Analysis — the main application we will develop at the end of this chapter — you keep the eigenvectors with the largest eigenvalues and discard the rest, which gives you the most faithful lower-dimensional version of your data. The same logic reappears when analysing the landscape of a loss function during training: directions of large eigenvalue curve sharply (take small steps there to avoid overshooting), and directions of eigenvalue near zero are almost flat (the model can drift along them without much penalty). In both cases the eigenvalue is a measurement of how much “pull” the transformation has along a given direction.

The same eigenvector (0.6, 0.8) under five eigenvalues: stretched, compressed, unchanged, flipped, and collapsed.

💡 Insight

A diagonal matrix’s eigenvectors are exactly the standard basis vectors. For A = [[3, 0], [0, 1]], multiplying A by (1, 0) gives (3, 0) — same direction, scaled by 3. Multiplying A by (0, 1) gives (0, 1) — unchanged. The diagonal entries 3 and 1 are the eigenvalues. Reading eigenvectors off a diagonal matrix requires no computation at all.

Not every matrix has real eigenvectors

A pure rotation matrix — one that turns every vector by the same angle θ (with θ ≠ 0° and θ ≠ 180°) — has no real eigenvectors. Every real vector changes direction under such a rotation; there is no stable direction in the plane. The eigenvalues of a rotation matrix are complex numbers that encode the rotation angle — they correspond to circular motion in the complex plane, not to any real direction in 2D space.

Symmetric matrices (where the entry at row i, column j equals the entry at row j, column i) are special: they always have real eigenvalues and their eigenvectors are always perpendicular to each other. The covariance matrices used in PCA — tables that encode how each pair of features in a dataset varies together — are symmetric by construction, and this is exactly why PCA always yields real, perpendicular principal components.

PCA: eigenvectors of a dataset

📖 History

Karl Pearson introduced principal component analysis in 1901 as a method for fitting lines and planes to point clouds in multi-dimensional space. The same mathematical structure now operates on datasets with hundreds of thousands of features — from face-recognition systems that represent faces as high-dimensional vectors and compress them with PCA, to genomic studies where PCA of genetic variation data separates population clusters across continents.

Real World

Principal Component Analysis finds the eigenvectors of a dataset’s covariance matrix — the matrix that encodes how much each pair of features varies together. Each eigenvector is a direction in feature space. The corresponding eigenvalue is the variance of the data projected onto that direction: the larger the eigenvalue, the more information that direction carries.

Compressing a 1000-feature dataset to 10 dimensions via PCA retains the 10 eigenvectors with the largest eigenvalues and discards the rest. This is the best 10-dimensional approximation of the data in the least-squares sense. In practice, PCA is computed via singular value decomposition of the data matrix rather than explicitly forming the covariance matrix, which avoids numerical issues with large feature counts.

80 data points drawn from a two-feature distribution with covariance [[4, 2], [2, 2]]. PC1 captures most variance; PC2 is perpendicular and captures the remainder.

eigenvectors.py

python

import numpy as np

A = np.array([[2, 1], [1, 2]])
eigenvalues, eigenvectors = np.linalg.eig(A)
print(eigenvalues)    # [3. 1.]
print(eigenvectors)   # columns are eigenvectors: (1/√2)[1, 1] and (1/√2)[1, −1]

# PCA via NumPy: eigenvectors of the covariance matrix
data = np.random.randn(100, 2) @ np.linalg.cholesky([[4, 2], [2, 2]]).T
cov = np.cov(data.T)
eigenvalues_pca, principal_components = np.linalg.eigh(cov)
# eigh is for symmetric matrices — always returns real eigenvalues

Common Mistake

Eigenvectors are not always the x and y axes. This is only true for diagonal matrices. For a symmetric matrix like [[2, 1], [1, 2]], the eigenvectors point along the diagonal directions (1, 1) and (1, −1), normalised — not along the coordinate axes. The eigenvectors depend entirely on the specific entries of the matrix.

In this section

What is an eigenvector?

An eigenvector of a matrix A is a nonzero vector v such that Av = λv — applying A to v gives back the same direction, scaled by λ. Geometrically, the eigenvector stays on its line through the origin: the transformation only stretches or flips it, never rotating it to a different direction. Every matrix has at most n independent eigenvectors in n-dimensional space.

What does the eigenvalue tell you?

The eigenvalue λ measures how much the eigenvector stretches. λ = 2 doubles the vector's length in that direction. λ = 0.5 halves it. λ = −1 flips the direction without changing length. λ = 0 collapses the eigenvector to zero — the matrix destroys all information in that direction, making it singular.

Does every matrix have eigenvectors?

Every 2×2 matrix has eigenvalues, but they may not be real numbers. A pure rotation matrix (other than 0° or 180°) has complex eigenvalues — no real eigenvector exists because every real vector changes direction under rotation. Symmetric matrices are the important special case: they always have real eigenvalues and their eigenvectors are perpendicular to each other.

Key Terms

Eigenvector Eigenvalue PCA

◎ Intuition

Look at the matrix [[2, 1], [1, 2]]. Pick a direction — any direction — and predict whether a vector pointing that way will stay pointing that way after the matrix is applied. Try the horizontal direction (1, 0), the diagonal (1, 1), and the anti-diagonal (1, −1). Which of those three do you think is an eigenvector? Make a prediction before you interact with the playground. Now predict the eigenvalues. For each direction you identified as an eigenvector, how much does it stretch or compress under [[2, 1], [1, 2]]? Does it double in length, halve, stay the same, or flip? Try to give a numeric guess for λ₁ and λ₂ before checking.

↺ Reflection

What Eigenvectors Reveal

A vector v is an eigenvector of matrix A when applying A leaves v on the same line through the origin. Formally, Av = λv for some scalar λ. The transformation scales v by λ but never rotates it to a new direction. Geometrically, the eigenvector is a fixed direction of the transformation: an axis along which the matrix acts purely as a stretch.

The sign of λ determines whether the vector flips. If λ = 3, the eigenvector triples in length pointing the same way. If λ = −2, it doubles in length and points the opposite way. If λ = 0, the eigenvector is crushed to zero — the matrix destroys that direction entirely, making it singular.

A rotation matrix [[cos θ, −sin θ], [sin θ, cos θ]] for θ ≠ 0° and θ ≠ 180° has no real eigenvectors. Every vector in the plane rotates by θ; none remains on its original line. The eigenvalues are complex numbers encoding the rotation angle — they correspond to no real direction in 2D space. This is why the vector field of a rotation matrix shows all arrows turning uniformly with no stable axis.

Symmetric matrices — where A[i][j] = A[j][i] for all i, j — are the most important special case. A fundamental property of symmetric matrices guarantees they always have real eigenvalues and that their eigenvectors are mutually perpendicular. A 2×2 symmetric matrix [[a, b], [b, d]] always has exactly two real eigenvalues and two perpendicular real eigenvectors, regardless of the specific entries.

The covariance matrix of a dataset is always symmetric. Its eigenvectors are the principal components — the directions in feature space along which the data has the most spread. The eigenvalues measure the variance along each eigenvector. Sorting eigenvectors by eigenvalue from largest to smallest and projecting the data onto the top k gives the best possible k-dimensional representation in the least-squares sense. This is why PCA consistently outperforms arbitrary linear projections for data compression: it finds the mathematically optimal directions by solving an eigenvalue problem.

Key Points

An eigenvector v satisfies Av = λv. Geometrically, it stays on the same line through the origin after transformation — only its length changes. The eigenvalue λ is the stretch factor: positive means same direction, negative means flipped, zero means collapsed.

A pure rotation matrix in 2D has no real eigenvectors. Every real vector changes direction under rotation; there is no stable axis in the plane. This is fundamentally different from scaling or shearing, both of which preserve at least one direction.

Symmetric matrices always have real eigenvalues and perpendicular eigenvectors. The covariance matrices used in PCA are always symmetric — which is why PCA always yields real, orthogonal principal components.

The first principal component of a dataset is the eigenvector of the covariance matrix with the largest eigenvalue — the direction along which the data varies most. Projecting onto the top k eigenvectors gives the best k-dimensional summary of the data.

✓ Checkpoint

Check Your Understanding

Four questions on eigenvectors, eigenvalues, and PCA. Select an answer, then reveal to see the explanation.

A matrix A has an eigenvector v with eigenvalue λ = −2. Which statement correctly describes what happens when A is applied to v?

A pure 2D rotation matrix (rotating by 45°) has two real eigenvectors.

In Principal Component Analysis (PCA), what are the principal components?

In the eigenvector equation Av = λv, what does the eigenvalue λ represent?