Explainable AI (XAI) Cheat Sheet

Updated 2026-04-28

Next Topic: Feature Engineering Cheat Sheet

🧠Study flashcards on this topic114 cards · spaced repetition→

Explainable AI (XAI) comprises techniques and methods designed to make AI model decisions transparent and interpretable to humans. As machine learning models—particularly deep neural networks and large language models—grow increasingly complex, their internal decision-making processes often become opaque "black boxes." XAI addresses this challenge by providing tools to understand which features drive predictions, how models arrive at specific outputs, and whether model behavior aligns with human intuition and fairness principles. Understanding interpretability is critical not only for debugging and improving models but also for building trust, ensuring regulatory compliance (such as GDPR's "right to explanation" or the EU AI Act), and detecting biases in high-stakes domains like healthcare, finance, and criminal justice. A major emerging frontier is mechanistic interpretability—named an MIT Technology Review breakthrough technology for 2026—which goes beyond explaining outputs to reverse-engineering the internal circuits and representations of AI systems.

What This Cheat Sheet Covers

This topic spans 22 focused tables and 160 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core XAI ConceptsTable 2: SHAP (SHapley Additive exPlanations)Table 3: LIME (Local Interpretable Model-Agnostic Explanations)Table 4: Feature Importance MethodsTable 5: Gradient-Based Explanations (Deep Learning)Table 6: Attention-Based Explanations (Transformers & NLP)Table 7: Mechanistic InterpretabilityTable 8: Partial Dependence & Effect PlotsTable 9: Counterfactual ExplanationsTable 10: Intrinsically Interpretable ModelsTable 11: Model Distillation & Surrogate ModelsTable 12: Feature Interaction DetectionTable 13: Concept-Based ExplanationsTable 14: Anchors & Rule-Based ExplanationsTable 15: Example-Based ExplanationsTable 16: Evaluation Metrics for XAITable 17: XAI Libraries & ToolsTable 18: Application Domains & Use CasesTable 19: Regulatory & Ethical ConsiderationsTable 20: XAI for LLMs & Generative AITable 21: Challenges & LimitationsTable 22: Advanced & Emerging Techniques

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Core XAI Concepts

Before reaching for any specific technique, it helps to fix the vocabulary that organizes the whole field. The distinctions here are the ones you will keep returning to—interpretability versus explainability, global versus local scope, model-agnostic versus model-specific, and ante-hoc versus post-hoc—and each pair captures a real design choice about how and when you wring understanding out of a model. Mechanistic interpretability sits at the far end, raising the bar from "why this output?" to "what algorithm ran inside?"

Concept	Example	Description
Interpretability	Linear regression coefficients	• The degree to which a human can understand the cause of a model's decision • intrinsic to simple models like linear regression or decision trees.
Explainability	SHAP values for a neural network	• The ability to provide post-hoc explanations for complex model predictions • applied after training to black-box models.
Global explanations	Feature importance ranking across dataset	Describes overall model behavior across the entire feature space — reveals which features are generally most influential.
Local explanations	LIME explanation for one prediction	Explains a single prediction by identifying which features contributed most to that specific instance's output.
Model-agnostic methods	LIME, SHAP	• Techniques that work with any model type • treat the model as a black box and analyze input-output relationships.
Model-specific methods	Attention visualization for Transformers	• Leverage a model's internal architecture (e.g., attention weights, gradients) for explanations • tailored to specific model families.

Table 1: Core XAI Concepts

Concept	Example	Description
Interpretability	Linear regression coefficients	• The degree to which a human can understand the cause of a model's decision • intrinsic to simple models like linear regression or decision trees.
Explainability	SHAP values for a neural network	• The ability to provide post-hoc explanations for complex model predictions • applied after training to black-box models.
Global explanations	Feature importance ranking across dataset	Describes overall model behavior across the entire feature space — reveals which features are generally most influential.
Local explanations	LIME explanation for one prediction	Explains a single prediction by identifying which features contributed most to that specific instance's output.
Model-agnostic methods	LIME, SHAP	• Techniques that work with any model type • treat the model as a black box and analyze input-output relationships.
Model-specific methods	Attention visualization for Transformers	• Leverage a model's internal architecture (e.g., attention weights, gradients) for explanations • tailored to specific model families.