Session 4
PMAP 8921: Data Visualization with R
Andrew Young School of Policy Studies
Summer 2023
Reproducibility
Reproducibility
Amounts
Reproducibility
Amounts
Proportions
Pivot Tables do the same thing!
More powerful
More powerful
Free and open source
More powerful
Free and open source
Reproducibility
Debt:GDP ratio
90%+ → −0.1% growth
Debt:GDP ratio
90%+ → −0.1% growth
Debt:GDP ratio = 90%+ → 2.2% growth (!!)
Septin 2
Membrane-Associated Ring Finger (C3HC4) 1
2310009E13
Septin 2
Membrane-Associated Ring Finger (C3HC4) 1
2310009E13
Septin 2
Membrane-Associated Ring Finger (C3HC4) 1
2310009E13
20% of genetics papers between 2005–2015 (!!!)
Don't touch the raw data
If you do, explain what you did!
Don't touch the raw data
If you do, explain what you did!
Use self-documenting, reproducible code
R Markdown!
Don't touch the raw data
If you do, explain what you did!
Use self-documenting, reproducible code
R Markdown!
Use open formats
Use .csv, not .xlsx
We are a lot better at visualizing
line lengths than angles and areas
The entire line length matters,
so don't truncate it!
The entire line length matters,
so don't truncate it!
Always start at 0
The entire line length matters,
so don't truncate it!
Always start at 0
(Or don't use bars)
#barbarplots
ggplot(animals, aes(x = animal_type, y = weight, color = animal_type)) + geom_point(position = position_jitter(height = 0), size = 1) + labs(x = NULL, y = "Weight") + guides(color = "none")
library(ggbeeswarm)ggplot(animals, aes(x = animal_type, y = weight, color = animal_type)) + geom_beeswarm(size = 1) + # Or try this too: # geom_quasirandom() + labs(x = NULL, y = "Weight") + guides(color = "none")
ggplot(animals, aes(x = animal_type, y = weight, color = animal_type)) + geom_boxplot(width = 0.5) + geom_point(position = position_jitter(height = 0), size = 1, alpha = 0.5) + labs(x = NULL, y = "Weight") + guides(color = "none")
ggplot(animals, aes(x = animal_type, y = weight, color = animal_type)) + geom_violin(width = 0.5) + geom_point(position = position_jitter(height = 0), size = 1, alpha = 0.5) + labs(x = NULL, y = "Weight") + guides(color = "none")
library(ggridges)ggplot(animals, aes(x = weight, y = animal_type, fill = animal_type)) + geom_density_ridges() + labs(x = "Weight", y = NULL) + guides(fill = "none")
Bar charts always start at zero
Bar charts always start at zero
Don't use bars for summary statistics.
You throw away too much information.
Bar charts always start at zero
Don't use bars for summary statistics.
You throw away too much information.
The end of the bar is often all that matters
We'll use a summarized version of the gapminder dataset as an example
library(gapminder)gapminder_continents <- gapminder %>% filter(year == 2007) %>% # Only look at 2007 count(continent) %>% # Get a count of continents arrange(desc(n)) %>% # Sort descendingly by count # Make continent into an ordered factor mutate(continent = fct_inorder(continent))ggplot(gapminder_continents, aes(x = continent, y = n, fill = continent)) + geom_col() + guides(fill = "none") + labs(x = NULL, y = "Number of countries")
Since the end of the bar is important, emphasize it the most
ggplot(gapminder_continents, aes(x = continent, y = n, color = continent)) + geom_pointrange(aes(ymin = 0, ymax = n)) + guides(color = "none") + labs(x = NULL, y = "Number of countries")
Show the individual observations as squares
# This has to be installed in a special way--you can't use the Packages panel. # Run this in your console:# devtools::install_github("hrbrmstr/waffle")library(waffle)ggplot(gapminder_continents, aes(x = continent, y = n, fill = continent)) + geom_waffle(aes(values = n), # geom_waffle() needs a special values aesthetic n_rows = 9, # It has lots of other options too flip = TRUE, na.rm = TRUE) + labs(fill = NULL) + coord_equal() + # Make all the squares square theme_void() # Use a completely empty theme
If exact counts are less important,
try a heatmap with geom_tile()
Sometimes we want to compare values across
a whole population instead of looking at raw counts
Sometimes we want to compare values across
a whole population instead of looking at raw counts
Only do this when it makes analytical sense!
Sometimes we want to compare values across
a whole population instead of looking at raw counts
Only do this when it makes analytical sense!
COVID-19 amounts vs. proportions
Perceptual issues with angle and fill space
Perceptual issues with angle and fill space
Only okay(ish) if there are a few easily distinguishable categories
Perceptual issues with angle and fill space
Only okay(ish) if there are a few easily distinguishable categories
Bar plots
Bar plots
Any of the alternatives to bar plots
Bar plots
Any of the alternatives to bar plots
Treemaps and mosaic plots
(but these can still be really hard to interpret)
Treemaps with the {treemapify} package
Mosaic plots with the {ggmosaic} package
Bar plots
Any of the alternatives to bar plots
Treemaps and mosaic plots
(but these can still be really hard to interpret)
Specialized figures like parliament plots
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Session 4
PMAP 8921: Data Visualization with R
Andrew Young School of Policy Studies
Summer 2023
Reproducibility
Reproducibility
Amounts
Reproducibility
Amounts
Proportions
Pivot Tables do the same thing!
More powerful
More powerful
Free and open source
More powerful
Free and open source
Reproducibility
Debt:GDP ratio
90%+ → −0.1% growth
Debt:GDP ratio
90%+ → −0.1% growth
Debt:GDP ratio = 90%+ → 2.2% growth (!!)
Septin 2
Membrane-Associated Ring Finger (C3HC4) 1
2310009E13
Septin 2
Membrane-Associated Ring Finger (C3HC4) 1
2310009E13
Septin 2
Membrane-Associated Ring Finger (C3HC4) 1
2310009E13
20% of genetics papers between 2005–2015 (!!!)
Don't touch the raw data
If you do, explain what you did!
Don't touch the raw data
If you do, explain what you did!
Use self-documenting, reproducible code
R Markdown!
Don't touch the raw data
If you do, explain what you did!
Use self-documenting, reproducible code
R Markdown!
Use open formats
Use .csv, not .xlsx
We are a lot better at visualizing
line lengths than angles and areas
The entire line length matters,
so don't truncate it!
The entire line length matters,
so don't truncate it!
Always start at 0
The entire line length matters,
so don't truncate it!
Always start at 0
(Or don't use bars)
#barbarplots
ggplot(animals, aes(x = animal_type, y = weight, color = animal_type)) + geom_point(position = position_jitter(height = 0), size = 1) + labs(x = NULL, y = "Weight") + guides(color = "none")
library(ggbeeswarm)ggplot(animals, aes(x = animal_type, y = weight, color = animal_type)) + geom_beeswarm(size = 1) + # Or try this too: # geom_quasirandom() + labs(x = NULL, y = "Weight") + guides(color = "none")
ggplot(animals, aes(x = animal_type, y = weight, color = animal_type)) + geom_boxplot(width = 0.5) + geom_point(position = position_jitter(height = 0), size = 1, alpha = 0.5) + labs(x = NULL, y = "Weight") + guides(color = "none")
ggplot(animals, aes(x = animal_type, y = weight, color = animal_type)) + geom_violin(width = 0.5) + geom_point(position = position_jitter(height = 0), size = 1, alpha = 0.5) + labs(x = NULL, y = "Weight") + guides(color = "none")
library(ggridges)ggplot(animals, aes(x = weight, y = animal_type, fill = animal_type)) + geom_density_ridges() + labs(x = "Weight", y = NULL) + guides(fill = "none")
Bar charts always start at zero
Bar charts always start at zero
Don't use bars for summary statistics.
You throw away too much information.
Bar charts always start at zero
Don't use bars for summary statistics.
You throw away too much information.
The end of the bar is often all that matters
We'll use a summarized version of the gapminder dataset as an example
library(gapminder)gapminder_continents <- gapminder %>% filter(year == 2007) %>% # Only look at 2007 count(continent) %>% # Get a count of continents arrange(desc(n)) %>% # Sort descendingly by count # Make continent into an ordered factor mutate(continent = fct_inorder(continent))ggplot(gapminder_continents, aes(x = continent, y = n, fill = continent)) + geom_col() + guides(fill = "none") + labs(x = NULL, y = "Number of countries")
Since the end of the bar is important, emphasize it the most
ggplot(gapminder_continents, aes(x = continent, y = n, color = continent)) + geom_pointrange(aes(ymin = 0, ymax = n)) + guides(color = "none") + labs(x = NULL, y = "Number of countries")
Show the individual observations as squares
# This has to be installed in a special way--you can't use the Packages panel. # Run this in your console:# devtools::install_github("hrbrmstr/waffle")library(waffle)ggplot(gapminder_continents, aes(x = continent, y = n, fill = continent)) + geom_waffle(aes(values = n), # geom_waffle() needs a special values aesthetic n_rows = 9, # It has lots of other options too flip = TRUE, na.rm = TRUE) + labs(fill = NULL) + coord_equal() + # Make all the squares square theme_void() # Use a completely empty theme
If exact counts are less important,
try a heatmap with geom_tile()
Sometimes we want to compare values across
a whole population instead of looking at raw counts
Sometimes we want to compare values across
a whole population instead of looking at raw counts
Only do this when it makes analytical sense!
Sometimes we want to compare values across
a whole population instead of looking at raw counts
Only do this when it makes analytical sense!
COVID-19 amounts vs. proportions
Perceptual issues with angle and fill space
Perceptual issues with angle and fill space
Only okay(ish) if there are a few easily distinguishable categories
Perceptual issues with angle and fill space
Only okay(ish) if there are a few easily distinguishable categories
Bar plots
Bar plots
Any of the alternatives to bar plots
Bar plots
Any of the alternatives to bar plots
Treemaps and mosaic plots
(but these can still be really hard to interpret)
Treemaps with the {treemapify} package
Mosaic plots with the {ggmosaic} package
Bar plots
Any of the alternatives to bar plots
Treemaps and mosaic plots
(but these can still be really hard to interpret)
Specialized figures like parliament plots