ColBERT (Contextualized Late Interaction over BERT) is a neural information retrieval model that encodes queries and documents into per-token embedding matrices and scores relevance by comparing individual token vectors at query time. Unlike bi-encoders that compress entire texts into one vector, ColBERT's late interaction preserves token-level granularity, yielding retrieval accuracy close to cross-encoders while remaining orders of magnitude faster at scale. The key insight is that document embeddings can be pre-computed offline, leaving only a lightweight MaxSim aggregation step online β meaning the computational heavy-lifting is deferred until query time but still fast enough for interactive search. Understanding where ColBERT fits in the spectrum from BM25 through dense single-vector to full cross-encoder reranking is the prerequisite for choosing the right retrieval architecture.
What This Cheat Sheet Covers
This topic spans 14 focused tables and 93 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Late Interaction Concepts
The fundamental ideas behind late interaction define how ColBERT differs from every other retrieval paradigm. Mastering these concepts unlocks the rest of the architecture β compression strategies, indexing engines, and training objectives all follow from the same underlying design choices.
| Concept | Example | Description |
|---|---|---|
Query embeds offline; MaxSim runs at query time | Queries and documents are encoded independently into token-embedding matrices; a lightweight interaction step computes relevance at search time rather than inside a joint encoder. | |
\text{score}(Q,D) = \sum_{i} \max_{j} \cos(Q_i, D_j) | For each query token Q_i, finds its maximum cosine similarity across all document tokens D_j, then sums those per-token maxima into a final relevance score. | |
200-token doc β matrix of shape (200, 128) | Every token in a text is independently represented as a 128-dimensional vector; this preserves positional context unavailable in single-vector compression. | |
Query encoder + document encoder (shared BERT weights) | ColBERT uses a shared BERT encoder for both query and document sides, distinguished by prepended [Q] and [D] marker tokens during fine-tuning. |