The Missing Book

This book contains both practical guides on exploring missing data, as well as some of the deeper details of how naniar works to help you better explore your missing data. A large component of this book are the exercises that accompany each section in each chapter.

Author

Nicholas Tierney & Allison Horst

Published

April 7, 2022

Preface

Welcome

Welcome to The Missing (Data) Book! Through this book you will learn concepts and tools to explore, consider, and deal with missing values in your data.

What you will learn

After reading and completing the exercises in this book, you will be able to answer the following questions and apply them to your own data:

  • What are missing values, and why do we care about them?
  • How can I find and explore missing values in data?
  • How can I wrangle and tidy missing data?
  • How can I investigate why values are missing?
  • How can I impute missing values?

Prerequisites

For this course we assume you have:

  • Basic to intermediate experience with R
  • Experience creating plots using ggplot2
  • Experience using dplyr to manipulate data
  • Basic experience fitting linear models in R

Narrative story / example

Why care about missing data?

The best thing to do with missing data is to not have any

–Gertrude Mary Cox

As true as what Statistician Gertrude Mary Cox said, it is not the world we live in. Working with data means working with missing data. To be a great analyst you need to know how to deal with missing values.

Well, why should we care about missing data? Understanding how missing data work is important as they can have unexpected effects on your analysis. For example, fitting a linear model on data with missing values deletes chunks of data. This means your decisions aren’t based on the right evidence. Simiarly, we need to take care when we replace missing values, a process called imputation. Imputation has to be done very carefully. If we insert the wrong values, we can end up with poor estimates and decisions. Imagine substituting salt for sugar in a cake - the result is disastrous!

How to read this book

We have broken this book into 7 parts. Most of these parts each have accompanying exercises for you to complete online. These seven sections are:

  1. Introduction to Missing Data
  2. Missing Data Gotchya’s
  3. Explore Missing Values
  4. Cleaning Missing Data
  5. Representing Missing Data
  6. Mechanisms of Missingness
  7. Single Imputation of Missing Data

The book has been designed to be read in this order, as we build upon material in each section. And while seven sections might sound like a lot, these sections are all quite short!