# swirl Lesson 5: Missing Values

| Please choose a course, or type 0 to exit swirl.
1: R Programming
2: Take me to the swirl course repository!
Selection: 1
1: Basic Building Blocks      2: Workspace and Files
3: Sequences of Numbers       4: Vectors
5: Missing Values             6: Subsetting Vectors
7: Matrices and Data Frames   8: Logic
9: Functions                 10: lapply and sapply
11: vapply and tapply         12: Looking at Data
13: Simulation                14: Dates and Times
15: Base Graphics
Selection: 5
|                                                     |   0%
| Missing values play an important role in statistics and data
| analysis. Often, missing values must not be ignored, but
| rather they should be carefully studied to see if there's an
| underlying pattern or cause for their missingness.
...
|===                                                  |   5%
| In R, NA is used to represent any value that is 'not
| available' or 'missing' (in the statistical sense). In this
| lesson, we'll explore missing values further.
...
|=====                                                |  10%
| Any operation involving NA generally yields NA as the
| result. To illustrate, let's create a vector c(44, NA, 5,
| NA) and assign it to a variable x.
> x<-c(44,NA,5,NA)
|========                                             |  15%
| Now, let's multiply x by 3.
> 3*x
[1] 132  NA  15  NA
| Excellent job!
|===========                                          |  20%
| Notice that the elements of the resulting vector that
| correspond with the NA values in x are also NA.
...
|=============                                        |  25%
| To make things a little more interesting, lets create a
| vector containing 1000 draws from a standard normal
| distribution with y <- rnorm(1000).
> y<-rnorm(1000)
| Keep up the great work!
|================                                     |  30%
| Next, let's create a vector containing 1000 NAs with z <-
| rep(NA, 1000).
> z<-rep(NA,1000)
| You're the best!
|===================                                  |  35%
| Finally, let's select 100 elements at random from these 2000
| values (combining y and z) such that we don't know how many
| NAs we'll wind up with or what positions they'll occupy in
| our final vector -- my_data <- sample(c(y, z), 100).
> my_data <- sample(c(y, z), 100)
| You are really on a roll!
|=====================                                |  40%
| Let's first ask the question of where our NAs are located in
| our data. The is.na() function tells us whether each element
| of a vector is NA. Call is.na() on my_data and assign the
| result to my_na.
> my_na<- is.na(my_data)
| You are doing so well!
|========================                             |  45%
| Now, print my_na to see what you came up with.
> my_na
[1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE
[10]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE
[19]  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
[28]  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE
[37] FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE
[46] FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE
[55] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE
[64]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE
[73] FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
[82] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE
[91] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE
[100]  TRUE
| You nailed it! Good job!
|==========================                           |  50%
| Everywhere you see a TRUE, you know the corresponding
| element of my_data is NA. Likewise, everywhere you see a
| FALSE, you know the corresponding element of my_data is one
| of our random draws from the standard normal distribution.
...
|=============================                        |  55%
| In our previous discussion of logical operators, we
| introduced the == operator as a method of testing for
| equality between two objects. So, you might think the
| expression my_data == NA yields the same results as is.na().
| Give it a try.
> my_data == NA
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[20] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[39] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[58] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[77] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[96] NA NA NA NA NA
| You got it!
|================================                     |  60%
| The reason you got a vector of all NAs is that NA is not
| really a value, but just a placeholder for a quantity that
| is not available. Therefore the logical expression is
| incomplete and R has no choice but to return a vector of the
| same length as my_data that contains all NAs.
...
|==================================                   |  65%
| Don't worry if that's a little confusing. The key takeaway
| is to be cautious when using logical expressions anytime NAs
| might creep in, since a single NA value can derail the
| entire thing.
...
|=====================================                |  70%
| So, back to the task at hand. Now that we have a vector,
| my_na, that has a TRUE for every NA and FALSE for every
| numeric value, we can compute the total number of NAs in our
| data.
...
|========================================             |  75%
| The trick is to recognize that underneath the surface, R
| represents TRUE as the number 1 and FALSE as the number 0.
| Therefore, if we take the sum of a bunch of TRUEs and
| FALSEs, we get the total number of TRUEs.
...
|==========================================           |  80%
| Let's give that a try here. Call the sum() function on my_na
| to count the total number of TRUEs in my_na, and thus the
| total number of NAs in my_data. Don't assign the result to a
| new variable.
> sum(my_na)
[1] 46
| Great job!
|=============================================        |  85%
| Pretty cool, huh? Finally, let's take a look at the data to
| convince ourselves that everything 'adds up'. Print my_data
| to the console.
> my_data
[1]  1.28833738 -1.71566792 -2.10643156  0.52646288
[5]          NA -0.43369979 -0.73588709          NA
[9]          NA          NA          NA  0.04198942
[13]          NA          NA -0.07881950          NA
[17] -0.95076395  2.31028848          NA -0.73152858
[21]          NA          NA          NA  1.05294332
[25]          NA          NA          NA          NA
[29]          NA -0.59012503 -0.38427819          NA
[33]          NA          NA -0.48309105          NA
[37]  0.32180094 -1.05520535  0.27521195          NA
[41] -1.62668329          NA          NA -2.18910058
[45]  1.74314263 -0.07327235  1.49444965          NA
[49]  1.05747803          NA  0.71787615 -1.11394933
[53] -0.08661972  0.87982505 -0.51738201          NA
[57]  0.40314933          NA          NA  0.17416071
[61] -0.46598802          NA  0.01918326          NA
[65] -0.47643745          NA  0.96464973          NA
[69]  1.07043774          NA          NA  0.55960535
[73]  0.41772515          NA          NA -0.41373806
[77]  0.80355818 -1.01706160 -1.36218341 -1.55671571
[81]          NA  1.42576312 -0.49953533 -0.52994970
[85]  0.62437351          NA -0.53641307          NA
[89]          NA -0.08968300  0.18669337          NA
[93]          NA  1.05551995  1.19631709 -0.47725033
[97]          NA          NA -1.49508777          NA
| You're the best!
|================================================     |  90%
| Now that we've got NAs down pat, let's look at a second type
| of missing value -- NaN, which stands for 'not a number'. To
| generate NaN, try dividing (using a forward slash) 0 by 0
| now.
> 0/0
[1] NaN