+ - 0:00:00
Notes for current slide
Notes for next slide

Amounts and Proportions

Session 4

PMAP 8921: Data Visualization with R
Andrew Young School of Policy Studies
Summer 2023

1 / 34

Plan for today

2 / 34

Plan for today

Reproducibility

2 / 34

Plan for today

Reproducibility

Amounts

2 / 34

Plan for today

Reproducibility

Amounts

Proportions

2 / 34

Reproducibility

3 / 34

Why am I making you learn R?

4 / 34

Why am I making you learn R?

Pivot Tables do the same thing!

Lord of the Rings PivotTable
4 / 34

Why am I making you learn R?

 

5 / 34

Why am I making you learn R?

 

More powerful

5 / 34

Why am I making you learn R?

 

More powerful

Free and open source

5 / 34

Why am I making you learn R?

 

More powerful

Free and open source

Reproducibility

5 / 34

Austerity and Excel

Reinhart and Rogoff abstract

Debt:GDP ratio
90%+ → −0.1% growth

6 / 34

Austerity and Excel

Reinhart and Rogoff abstract

Debt:GDP ratio
90%+ → −0.1% growth

The Path to Prosperity 2013 house budget
Paul Ryan's 2013 House budget resolution
6 / 34

Austerity and Excel

Thomas Herndon
Thomas Herndon
7 / 34

Austerity and Excel

Thomas Herndon
Thomas Herndon
7 / 34

Austerity and Excel

Reinhart Rogoff Table 1
8 / 34

Austerity and Excel

Reinhart Rogoff Table 1

Debt:GDP ratio = 90%+ → 2.2% growth (!!)

8 / 34

Genes and Excel

Septin 2

Membrane-Associated Ring Finger (C3HC4) 1

2310009E13

9 / 34

Genes and Excel

Septin 2

Membrane-Associated Ring Finger (C3HC4) 1

2310009E13

Numbers in Excel
9 / 34

Genes and Excel

Septin 2

Membrane-Associated Ring Finger (C3HC4) 1

2310009E13

Numbers in Excel

20% of genetics papers between 2005–2015 (!!!)

9 / 34

General guidelines

Don't touch the raw data

If you do, explain what you did!

10 / 34

General guidelines

Don't touch the raw data

If you do, explain what you did!

Use self-documenting, reproducible code

R Markdown!

10 / 34

General guidelines

Don't touch the raw data

If you do, explain what you did!

Use self-documenting, reproducible code

R Markdown!

Use open formats

Use .csv, not .xlsx

10 / 34

R Markdown in real life

11 / 34

Amounts

12 / 34

Yay bar plots!

We are a lot better at visualizing
line lengths than angles and areas

13 / 34

Oh no bar plots!

Fox News Obamacare enrollment
14 / 34

Oh no bar plots!

Fox News Obamacare enrollment
Obama graduation rate
14 / 34

Start at zero

The entire line length matters,
so don't truncate it!

15 / 34

Start at zero

The entire line length matters,
so don't truncate it!

Always start at 0

15 / 34

Start at zero

The entire line length matters,
so don't truncate it!

Always start at 0

(Or don't use bars)

15 / 34

Bar plots and summary statistics

#barbarplots

16 / 34

Bar plots and summary statistics

17 / 34

Bar plots and summary statistics

17 / 34

Show more data with strip plots

ggplot(animals,
aes(x = animal_type,
y = weight,
color = animal_type)) +
geom_point(position = position_jitter(height = 0),
size = 1) +
labs(x = NULL, y = "Weight") +
guides(color = "none")

18 / 34

Show more data with beeswarm plots

library(ggbeeswarm)
ggplot(animals, aes(x = animal_type,
y = weight,
color = animal_type)) +
geom_beeswarm(size = 1) +
# Or try this too:
# geom_quasirandom() +
labs(x = NULL, y = "Weight") +
guides(color = "none")

19 / 34

Combine boxplots with points

ggplot(animals, aes(x = animal_type,
y = weight,
color = animal_type)) +
geom_boxplot(width = 0.5) +
geom_point(position = position_jitter(height = 0),
size = 1, alpha = 0.5) +
labs(x = NULL, y = "Weight") +
guides(color = "none")

20 / 34

Combine violins with points

ggplot(animals, aes(x = animal_type,
y = weight,
color = animal_type)) +
geom_violin(width = 0.5) +
geom_point(position = position_jitter(height = 0),
size = 1, alpha = 0.5) +
labs(x = NULL, y = "Weight") +
guides(color = "none")

21 / 34

Overlapping ridgeplots

library(ggridges)
ggplot(animals, aes(x = weight,
y = animal_type,
fill = animal_type)) +
geom_density_ridges() +
labs(x = "Weight", y = NULL) +
guides(fill = "none")

22 / 34

General rules

Bar charts always start at zero

23 / 34

General rules

Bar charts always start at zero

Don't use bars for summary statistics.
You throw away too much information.

23 / 34

General rules

Bar charts always start at zero

Don't use bars for summary statistics.
You throw away too much information.

The end of the bar is often all that matters

23 / 34

Lots of alternatives

We'll use a summarized version of the gapminder dataset as an example

library(gapminder)
gapminder_continents <- gapminder %>%
filter(year == 2007) %>% # Only look at 2007
count(continent) %>% # Get a count of continents
arrange(desc(n)) %>% # Sort descendingly by count
# Make continent into an ordered factor
mutate(continent = fct_inorder(continent))
ggplot(gapminder_continents,
aes(x = continent, y = n, fill = continent)) +
geom_col() +
guides(fill = "none") +
labs(x = NULL, y = "Number of countries")

24 / 34

Alternatives: Lollipop charts

Since the end of the bar is important, emphasize it the most

ggplot(gapminder_continents,
aes(x = continent, y = n,
color = continent)) +
geom_pointrange(aes(ymin = 0, ymax = n)) +
guides(color = "none") +
labs(x = NULL, y = "Number of countries")

25 / 34

Alternatives: Waffle charts

Show the individual observations as squares

# This has to be installed in a special way--you can't use the Packages panel.
# Run this in your console:
# devtools::install_github("hrbrmstr/waffle")
library(waffle)
ggplot(gapminder_continents,
aes(x = continent, y = n,
fill = continent)) +
geom_waffle(aes(values = n), # geom_waffle() needs a special values aesthetic
n_rows = 9, # It has lots of other options too
flip = TRUE, na.rm = TRUE) +
labs(fill = NULL) +
coord_equal() + # Make all the squares square
theme_void() # Use a completely empty theme

26 / 34

Alternatives: Heatmaps

If exact counts are less important,
try a heatmap with geom_tile()

US births heatmap
27 / 34

Proportions

28 / 34

Why proportions?

Sometimes we want to compare values across
a whole population instead of looking at raw counts

29 / 34

Why proportions?

Sometimes we want to compare values across
a whole population instead of looking at raw counts

Only do this when it makes analytical sense!

29 / 34

Why proportions?

Sometimes we want to compare values across
a whole population instead of looking at raw counts

Only do this when it makes analytical sense!

COVID-19 amounts vs. proportions

29 / 34

Pie charts

Perceptual issues with angle and fill space

30 / 34

Pie charts

Perceptual issues with angle and fill space

Only okay(ish) if there are a few easily distinguishable categories

30 / 34

Pie charts

Perceptual issues with angle and fill space

Only okay(ish) if there are a few easily distinguishable categories

30 / 34

Alternatives

Bar plots

31 / 34

Alternatives

Bar plots

Any of the alternatives to bar plots

31 / 34

Alternatives

Bar plots

Any of the alternatives to bar plots

Treemaps and mosaic plots
(but these can still be really hard to interpret)

31 / 34

Treemaps and mosaic plots

Treemaps with the {treemapify} package

Treemap

Mosaic plots with the {ggmosaic} package

Mosaic
32 / 34

Alternatives

Bar plots

Any of the alternatives to bar plots

Treemaps and mosaic plots
(but these can still be really hard to interpret)

Specialized figures like parliament plots

33 / 34

Parliament plots

Parliament plots with the {ggparliament} package

US Senate parliament plot
UK parliament plot
34 / 34

Plan for today

2 / 34
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow