R for Data Science combines the R programming language with the tidyverse, a collection of packages designed around consistent grammar and workflow principles for data manipulation, visualization, and analysis. The tidyverse provides tidy data as a unifying structure (observational units as rows, variables as columns) and emphasizes readable code through pipes and verb-based functions. At its core sits dplyr for data transformation, tidyr for reshaping, purrr for functional programming, ggplot2 for visualization, readr for fast I/O, stringr for text, lubridate for dates, forcats for factors, and broom for model output—all integrated with R Markdown and Quarto for reproducible reporting. Keep in mind that the native pipe |> (R ≥ 4.1) behaves slightly differently from magrittr's %>%—the native pipe doesn't auto-expose . and requires explicit function calls.
What This Cheat Sheet Covers
This topic spans 31 focused tables and 242 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: dplyr Core Verbs for Row and Column Operations
| Verb | Example | Description |
|---|---|---|
df %>% filter(age > 30, city == "NYC") | Keeps rows that satisfy logical conditions; multiple conditions combine with AND by default. | |
df %>% select(name, age, starts_with("val")) | Picks columns by name or helper; can rename inline (e.g., new = old). | |
df %>% mutate(total = price * quantity) | Creates new columns or modifies existing ones; transformations applied row-wise. | |
df %>% summarise(avg = mean(value), n = n()) | Collapses rows into summary statistics; often combined with group_by(). | |
df %>% arrange(desc(date), name) | Sorts rows by one or more columns; use desc() for descending order. |