Mathematics for machine learning is the formal language underlying every algorithm that learns from data. It spans linear algebra, multivariate calculus, probability theory, information theory, and optimization β topics that appear in every paper, every framework, and every model architecture. Without this foundation it is possible to use ML tools as black boxes, but understanding it is what separates practitioners who can debug, design, and innovate from those who can only configure. The crucial mental model: most ML is simply optimization of a scalar loss function over a high-dimensional parameter space, and every concept here β from eigenvectors to KL divergence β is a tool for understanding or improving that optimization.
What This Cheat Sheet Covers
This topic spans 18 focused tables and 133 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Vectors, Matrices, and Tensors
Linear algebra gives ML its fundamental data structures. A scalar is a single number, a vector is an ordered array, a matrix is a 2-D grid, and a tensor generalizes to arbitrary dimensions β these are the objects that represent data, parameters, and transformations throughout every ML pipeline.
| Concept | Example | Description |
|---|---|---|
\mathbf{x} = [x_1, x_2, \ldots, x_n]^\top \in \mathbb{R}^n | β’ Ordered list of n real numbersβ’ represents a data point, a weight sequence, or an embedding | |
A \in \mathbb{R}^{m \times n}, element A_{ij} | β’ 2-D array of numbers β’ a dataset of m examples with n features forms an m \times n matrix | |
\mathcal{T} \in \mathbb{R}^{H \times W \times C} (e.g., RGB image) | β’ Generalization of matrices to k dimensionsβ’ the native data type in deep learning frameworks | |
\mathbf{u} \cdot \mathbf{v} = \sum_i u_i v_i = \lVert \mathbf{u} \rVert \lVert \mathbf{v} \rVert \cos\theta | β’ Produces a scalar β’ cosine similarity uses the normalized form to measure directional similarity between vectors | |
(AB)_{ij} = \sum_k A_{ik} B_{kj}, A \in \mathbb{R}^{m\times k}, B \in \mathbb{R}^{k\times n} | β’ Core neural-network operation β’ each layer applies a linear transformation A\mathbf{x} + \mathbf{b} via matrix-vector product |