Slow down, simplify and do small things

advice
Posted

Wednesday July 19, 2023 at 12:33 PM

Hi everyone!

As we’re nearing the end of the course, your plots and data manipulation are becoming way more detailed and complex, which is good! Remember exercise 1, so long ago? All you had to do was this:

library(tidyverse)

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
  geom_point()

That’s all! Literally 3 lines of code.

But now with interactivity, maps, text analysis, your mini projects, your final project, and so on, your code is getting longer and more complex. You’ll have lots and lots of ggplot layers and functions chained together with %>%. You’ve learned so much!

It is incredibly tempting to write out all the code you want in one go and then try to run a complete chunk and hope that you got it all correct. And then when it’s not correct, you try to change a bunch of things, hoping that they’ll fix it and then they don’t and you stay stuck and frustrated.

Over the weekend while helping you all on Slack and e-mail, this was a very common sight! You’d have a chunk of code that was 20–30 lines with an error somewhere and couldn’t find what went wrong or what was broken.

Don’t do this!

Here’s my best piece of advice for making more complex plots and for figuring out how to fix errors:

Slow down, simplify, and do small things

Run your code incrementally (see this past post here for some video examples about how to run stuff incrementally.). Start with a super basic plot and run it, then add a layer for labels and run it, then add a layer to change the fill gradient and run it, then add a layer to change the theme and run it, and so on. It feels slow, but it helps you understand what’s going on and helps you fix things when they break.

This is not just my advice. Julia Evans’s fantastic The Pocket Guide to Debugging has the same piece of advice:

Page 39 from Julia Evans’s The Pocket Guide to Debugging

When something doesn’t work as expected, change just one thing at a time. Or even better, simplify it and then change one thing at a time.

Here’s a quick common example. Let’s say you have a plot like this and you want to use the plasma viridis scale for the colors of the points. It looks like it should work, but the colors aren’t right! Those are just the default colors!

library(tidyverse)

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
  geom_point() +
  labs(x = "Displacement",
       y = "Highway MPG",
       color = "Drive") +
  scale_fill_viridis_d(option = "plasma", end = 0.9) +
  theme_minimal() +
  theme(legend.position = "bottom")

Here’s the process I would go through to figure out what’s wrong and fix it: