R for Data Science combines the R programming language with the tidyverse, a collection of packages designed around consistent grammar and workflow principles for data manipulation, visualization, and analysis. The tidyverse provides tidy data as a unifying structure (observational units as rows, variables as columns) and emphasizes readable code through pipes and verb-based functions. At its core sits dplyr for data transformation, tidyr for reshaping, purrr for functional programming, ggplot2 for visualization, readr for fast I/O, stringr for text, lubridate for dates, forcats for factors, and broom for model output—all integrated with R Markdown and Quarto for reproducible reporting. Keep in mind that the native pipe |> (R ≥ 4.1) behaves slightly differently from magrittr's %>%—the native pipe doesn't auto-expose . and requires explicit function calls.
What This Cheat Sheet Covers
This topic spans 31 focused tables and 242 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: dplyr Core Verbs for Row and Column Operations
These are the workhorse verbs you reach for in almost every analysis—each one does a single, predictable thing to a data frame, and chaining them with the pipe is the heart of the dplyr grammar. Filter rows, select and reshape columns, sort, deduplicate, and collapse to summaries; learn these and most everyday wrangling falls into place.
| Verb | Example | Description |
|---|---|---|
df %>% filter(age > 30, city == "NYC") | • Keeps rows that satisfy logical conditions • multiple conditions combine with AND by default | |
df %>% select(name, age, starts_with("val")) | • Picks columns by name or helper • can rename inline (e.g., new = old). | |
df %>% mutate(total = price * quantity) | • Creates new columns or modifies existing ones • transformations applied row-wise | |
df %>% summarise(avg = mean(value), n = n()) | • Collapses rows into summary statistics • often combined with group_by(). | |
df %>% arrange(desc(date), name) | • Sorts rows by one or more columns • use desc() for descending order |