── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.6 ✔ purrr 0.3.4
✔ tibble 3.1.7 ✔ dplyr 1.0.9
✔ tidyr 1.2.0 ✔ stringr 1.4.0
✔ readr 2.1.2 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
This is only the beginning!
Now as they say, this is only the beginning. This course covered an often overlooked area of statistics - missing data, and inside the world of missing data, we also covered yet another area that is often overlooked: How to handle, explore, and visualise missing values.
To continue your journey, and learn more about missing data, you should check out the naniar
package, which contains many useful functions to explore and evaluate your missing data, as well as numerous vignettes.
The visdat
package provides more than just heatmaps of missing data, and is well worth looking into to learn more about pre exploratory visualisation:
From here, to continue your journey, you might want to explore other workflows for imputing your missing data.
There are many ways to decide how to impute data. We didn’t have time for it in the course, but multiple imputation is another great area of research - to learn more about multiple imputation, I highly recommend Stefan van Buuren’s package, mice
, and his book, Flexible Imputation of Missing Data.
naniar.njtierney.com visdat.njtierney.com mice R package flexible imputation of missing data