OpenRefine (formerly Google Refine) is a powerful, open-source desktop application for working with messy data, offering capabilities for cleaning, transforming, and extending datasets. Originally developed by Metaweb and later supported by Google before becoming an independent open-source project, OpenRefine operates through a browser-based interface while running locally on your computer, ensuring your data never leaves your machine. The tool excels at clustering algorithms for finding and merging near-duplicate entries, supports reconciliation against external services like Wikidata and VIAF, and provides a complete undo/redo history that makes all transformations reversible and reproducible. A key strength is OpenRefine's faceting and filtering system, which allows you to slice data along multiple dimensions simultaneously, and its GREL (General Refine Expression Language) for complex data transformations. Understanding that OpenRefine works in rows mode (where each row is independent) versus records mode (where multiple rows can be linked together) is fundamental to mastering multi-valued cell operations and maintaining relational structure during transformations.
What This Cheat Sheet Covers
This topic spans 17 focused tables and 127 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Facet Types
| Facet | Example | Description |
|---|---|---|
Column β Facet β Text facet | β’ Groups all unique text values in a column with counts β’ allows bulk editing by clicking value names, selecting multiple choices, and excluding/including specific entries. | |
Column β Facet β Numeric facet | β’ Creates a draggable range slider for filtering numbers β’ displays min, max, and distribution β’ automatically handles non-numeric cells as errors or blanks. | |
Column β Facet β Timeline facet | β’ Visualizes date/time data on a draggable timeline slider β’ requires column to be formatted as date type β’ useful for filtering temporal ranges. | |
Column β Facet β Scatterplot facet | β’ Plots two numeric columns against each other as X/Y coordinates β’ allows selection by dragging rectangular regions to filter correlated data. |