OpenRefine (formerly Google Refine) is a powerful, open-source desktop application for working with messy data, offering capabilities for cleaning, transforming, and extending datasets. Originally developed by Metaweb and later supported by Google before becoming an independent open-source project, OpenRefine operates through a browser-based interface while running locally on your computer, ensuring your data never leaves your machine. The tool excels at clustering algorithms for finding and merging near-duplicate entries, supports reconciliation against external services like Wikidata and VIAF, and provides a complete undo/redo history that makes all transformations reversible and reproducible. A key strength is OpenRefine's faceting and filtering system, which allows you to slice data along multiple dimensions simultaneously, and its GREL (General Refine Expression Language) for complex data transformations. Understanding that OpenRefine works in rows mode (where each row is independent) versus records mode (where multiple rows can be linked together) is fundamental to mastering multi-valued cell operations and maintaining relational structure during transformations.
What This Cheat Sheet Covers
This topic spans 20 focused tables and 236 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Facet Types
Facets are the primary way to explore and filter data in OpenRefine — they group values in a column and let you narrow your working set before applying transformations. Choosing the right facet type for your data's structure (text, number, date, or expression-based) determines what patterns you can find and what you can bulk-edit.
| Facet | Example | Description |
|---|---|---|
Column → Facet → Text facet | • Groups all unique text values with counts • allows bulk editing by clicking a value name, and includes/excludes specific entries. | |
Column → Facet → Numeric facet | • Creates a draggable range slider for filtering numbers • displays distribution; handles non-numeric cells as errors or blanks automatically. | |
Column → Facet → Timeline facet | • Visualizes date/time data on a draggable timeline slider • column must be date type | |
Column → Facet → Scatterplot facet | • Plots two numeric columns as X/Y coordinates • drag a rectangle to select and filter correlated rows |