Explainable AI (XAI) comprises techniques and methods designed to make AI model decisions transparent and interpretable to humans. As machine learning models—particularly deep neural networks and large language models—grow increasingly complex, their internal decision-making processes often become opaque "black boxes." XAI addresses this challenge by providing tools to understand which features drive predictions, how models arrive at specific outputs, and whether model behavior aligns with human intuition and fairness principles. Understanding interpretability is critical not only for debugging and improving models but also for building trust, ensuring regulatory compliance (such as GDPR's "right to explanation" or the EU AI Act), and detecting biases in high-stakes domains like healthcare, finance, and criminal justice. A major emerging frontier is mechanistic interpretability—named an MIT Technology Review breakthrough technology for 2026—which goes beyond explaining outputs to reverse-engineering the internal circuits and representations of AI systems.
What This Cheat Sheet Covers
This topic spans 22 focused tables and 160 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core XAI Concepts
| Concept | Example | Description |
|---|---|---|
Linear regression coefficients | • The degree to which a human can understand the cause of a model's decision • intrinsic to simple models like linear regression or decision trees. | |
SHAP values for a neural network | • The ability to provide post-hoc explanations for complex model predictions • applied after training to black-box models. | |
Feature importance ranking across dataset | Describes overall model behavior across the entire feature space — reveals which features are generally most influential. | |
LIME explanation for one prediction | Explains a single prediction by identifying which features contributed most to that specific instance's output. | |
LIME, SHAP | • Techniques that work with any model type • treat the model as a black box and analyze input-output relationships. | |
Attention visualization for Transformers | • Leverage a model's internal architecture (e.g., attention weights, gradients) for explanations • tailored to specific model families. |