| Please choose a course, or type 0 to exit swirl. 1: R Programming 2: Take me to the swirl course repository! Selection: 1 | Please choose a lesson, or type 0 to return to course menu. 1: Basic Building Blocks 2: Workspace and Files 3: Sequences of Numbers 4: Vectors 5: Missing Values 6: Subsetting Vectors 7: Matrices and Data Frames 8: Logic 9: Functions 10: lapply and sapply 11: vapply and tapply 12: Looking at Data 13: Simulation 14: Dates and Times 15: Base Graphics Selection: 5 | | 0% | Missing values play an important role in statistics and data | analysis. Often, missing values must not be ignored, but | rather they should be carefully studied to see if there's an | underlying pattern or cause for their missingness. ... |=== | 5% | In R, NA is used to represent any value that is 'not | available' or 'missing' (in the statistical sense). In this | lesson, we'll explore missing values further. ... |===== | 10% | Any operation involving NA generally yields NA as the | result. To illustrate, let's create a vector c(44, NA, 5, | NA) and assign it to a variable x. > x<-c(44,NA,5,NA) | Perseverance, that's the answer. |======== | 15% | Now, let's multiply x by 3. > 3*x [1] 132 NA 15 NA | Excellent job! |=========== | 20% | Notice that the elements of the resulting vector that | correspond with the NA values in x are also NA. ... |============= | 25% | To make things a little more interesting, lets create a | vector containing 1000 draws from a standard normal | distribution with y <- rnorm(1000). > y<-rnorm(1000) | Keep up the great work! |================ | 30% | Next, let's create a vector containing 1000 NAs with z <- | rep(NA, 1000). > z<-rep(NA,1000) | You're the best! |=================== | 35% | Finally, let's select 100 elements at random from these 2000 | values (combining y and z) such that we don't know how many | NAs we'll wind up with or what positions they'll occupy in | our final vector -- my_data <- sample(c(y, z), 100). > my_data <- sample(c(y, z), 100) | You are really on a roll! |===================== | 40% | Let's first ask the question of where our NAs are located in | our data. The is.na() function tells us whether each element | of a vector is NA. Call is.na() on my_data and assign the | result to my_na. > my_na<- is.na(my_data) | You are doing so well! |======================== | 45% | Now, print my_na to see what you came up with. > my_na [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE TRUE [10] TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE [19] TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE [28] TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE [37] FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE [46] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE [55] FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE [64] TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE [73] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE [82] FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE [91] FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE [100] TRUE | You nailed it! Good job! |========================== | 50% | Everywhere you see a TRUE, you know the corresponding | element of my_data is NA. Likewise, everywhere you see a | FALSE, you know the corresponding element of my_data is one | of our random draws from the standard normal distribution. ... |============================= | 55% | In our previous discussion of logical operators, we | introduced the `==` operator as a method of testing for | equality between two objects. So, you might think the | expression my_data == NA yields the same results as is.na(). | Give it a try. > my_data == NA [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [20] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [39] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [58] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [77] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [96] NA NA NA NA NA | You got it! |================================ | 60% | The reason you got a vector of all NAs is that NA is not | really a value, but just a placeholder for a quantity that | is not available. Therefore the logical expression is | incomplete and R has no choice but to return a vector of the | same length as my_data that contains all NAs. ... |================================== | 65% | Don't worry if that's a little confusing. The key takeaway | is to be cautious when using logical expressions anytime NAs | might creep in, since a single NA value can derail the | entire thing. ... |===================================== | 70% | So, back to the task at hand. Now that we have a vector, | my_na, that has a TRUE for every NA and FALSE for every | numeric value, we can compute the total number of NAs in our | data. ... |======================================== | 75% | The trick is to recognize that underneath the surface, R | represents TRUE as the number 1 and FALSE as the number 0. | Therefore, if we take the sum of a bunch of TRUEs and | FALSEs, we get the total number of TRUEs. ... |========================================== | 80% | Let's give that a try here. Call the sum() function on my_na | to count the total number of TRUEs in my_na, and thus the | total number of NAs in my_data. Don't assign the result to a | new variable. > sum(my_na) [1] 46 | Great job! |============================================= | 85% | Pretty cool, huh? Finally, let's take a look at the data to | convince ourselves that everything 'adds up'. Print my_data | to the console. > my_data [1] 1.28833738 -1.71566792 -2.10643156 0.52646288 [5] NA -0.43369979 -0.73588709 NA [9] NA NA NA 0.04198942 [13] NA NA -0.07881950 NA [17] -0.95076395 2.31028848 NA -0.73152858 [21] NA NA NA 1.05294332 [25] NA NA NA NA [29] NA -0.59012503 -0.38427819 NA [33] NA NA -0.48309105 NA [37] 0.32180094 -1.05520535 0.27521195 NA [41] -1.62668329 NA NA -2.18910058 [45] 1.74314263 -0.07327235 1.49444965 NA [49] 1.05747803 NA 0.71787615 -1.11394933 [53] -0.08661972 0.87982505 -0.51738201 NA [57] 0.40314933 NA NA 0.17416071 [61] -0.46598802 NA 0.01918326 NA [65] -0.47643745 NA 0.96464973 NA [69] 1.07043774 NA NA 0.55960535 [73] 0.41772515 NA NA -0.41373806 [77] 0.80355818 -1.01706160 -1.36218341 -1.55671571 [81] NA 1.42576312 -0.49953533 -0.52994970 [85] 0.62437351 NA -0.53641307 NA [89] NA -0.08968300 0.18669337 NA [93] NA 1.05551995 1.19631709 -0.47725033 [97] NA NA -1.49508777 NA | You're the best! |================================================ | 90% | Now that we've got NAs down pat, let's look at a second type | of missing value -- NaN, which stands for 'not a number'. To | generate NaN, try dividing (using a forward slash) 0 by 0 | now. > 0/0 [1] NaN | Your dedication is inspiring! |================================================== | 95% | Let's do one more, just for fun. In R, Inf stands for | infinity. What happens if you subtract Inf from Inf? > Inf-Inf [1] NaN | All that hard work is paying off! |=====================================================| 100%

### You may also like

- swirl Lesson 1: Basic Building Blocks
- swirl Lesson 2: Workspace and Files
- swirl Lesson 3: Sequences of Numbers
- swirl Lesson 4: Vectors
- swirl Lesson 5: Missing Values
- swirl Lesson 6: Subsetting Vectors
- swirl Lesson 7: Matrices and Data Frames
- swirl Lesson 8: Logic
- swirl Lesson 9: Functions
- swirl Lesson 10: lapply and sapply
- swirl Lesson 11: vapply and tapply
- swirl Lesson 12: Looking at Data
- swirl Lesson 13: Simulation
- swirl Lesson 14: Dates and Times
- swirl Lesson 15: Base Graphics