R for Data Science and Tidyverse Cheat Sheet

Updated 2026-05-15

Next Topic: Raster Data Analysis with Rasterio and GDAL Cheat Sheet

R for Data Science combines the R programming language with the tidyverse, a collection of packages designed around consistent grammar and workflow principles for data manipulation, visualization, and analysis. The tidyverse provides tidy data as a unifying structure (observational units as rows, variables as columns) and emphasizes readable code through pipes and verb-based functions. At its core sits dplyr for data transformation, tidyr for reshaping, purrr for functional programming, ggplot2 for visualization, readr for fast I/O, stringr for text, lubridate for dates, forcats for factors, and broom for model output—all integrated with R Markdown and Quarto for reproducible reporting. Keep in mind that the native pipe |> (R ≥ 4.1) behaves slightly differently from magrittr's %>%—the native pipe doesn't auto-expose . and requires explicit function calls.

What This Cheat Sheet Covers

This topic spans 31 focused tables and 242 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: dplyr Core Verbs for Row and Column OperationsTable 2: dplyr Grouping and AggregationTable 3: dplyr Joins for Combining Data FramesTable 4: dplyr Advanced Column and Row OperationsTable 5: dplyr Select Helpers for Column SelectionTable 6: tidyr Reshaping Data with PivotsTable 7: tidyr Nested and List-Column DataTable 8: tidyr Missing Data HandlingTable 9: purrr Iteration and MappingTable 10: purrr List Manipulation and SelectionTable 11: purrr Functional Programming UtilitiesTable 12: lubridate Parsing Date-TimesTable 13: lubridate Extracting Date-Time ComponentsTable 14: lubridate Date-Time Arithmetic and ManipulationTable 15: lubridate Time SpansTable 16: stringr Detection and ExtractionTable 17: stringr Modification and ReplacementTable 18: stringr String Manipulation and AssemblyTable 19: stringr Advanced String OperationsTable 20: forcats Factor ReorderingTable 21: forcats Factor ModificationTable 22: forcats Factor UtilitiesTable 23: readr Data Import FunctionsTable 24: readr Column Type SpecificationsTable 25: readr Data Export FunctionsTable 26: tibble Creation and ConversionTable 27: tibble Manipulation and InspectionTable 28: broom for Tidy Model OutputTable 29: Base R Statistical Functions with Formula InterfaceTable 30: R Markdown and Quarto for Reproducible ReportingTable 31: Pipe Operators - Native vs Magrittr

Table 1: dplyr Core Verbs for Row and Column Operations

These are the workhorse verbs you reach for in almost every analysis—each one does a single, predictable thing to a data frame, and chaining them with the pipe is the heart of the dplyr grammar. Filter rows, select and reshape columns, sort, deduplicate, and collapse to summaries; learn these and most everyday wrangling falls into place.

Verb	Example	Description
filter	`df %>% filter(age > 30, city == "NYC")`	• Keeps rows that satisfy logical conditions • multiple conditions combine with AND by default
select	`df %>% select(name, age, starts_with("val"))`	• Picks columns by name or helper • can rename inline (e.g., `new = old`).
mutate	`df %>% mutate(total = price * quantity)`	• Creates new columns or modifies existing ones • transformations applied row-wise
summarise / summarize	`df %>% summarise(avg = mean(value), n = n())`	• Collapses rows into summary statistics • often combined with `group_by()`.
arrange	`df %>% arrange(desc(date), name)`	• Sorts rows by one or more columns • use `desc()` for descending order

Table 1: dplyr Core Verbs for Row and Column Operations

Verb	Example	Description
filter	`df %>% filter(age > 30, city == "NYC")`	• Keeps rows that satisfy logical conditions • multiple conditions combine with AND by default
select	`df %>% select(name, age, starts_with("val"))`	• Picks columns by name or helper • can rename inline (e.g., `new = old`).
mutate	`df %>% mutate(total = price * quantity)`	• Creates new columns or modifies existing ones • transformations applied row-wise
summarise / summarize	`df %>% summarise(avg = mean(value), n = n())`	• Collapses rows into summary statistics • often combined with `group_by()`.
arrange	`df %>% arrange(desc(date), name)`	• Sorts rows by one or more columns • use `desc()` for descending order