欢迎光临
我们一直在努力

swirl Lesson 5: Missing Values

| Please choose a course, or type 0 to exit swirl.
1: R Programming
2: Take me to the swirl course repository!
Selection: 1
| Please choose a lesson, or type 0 to return to course menu.
 1: Basic Building Blocks      2: Workspace and Files     
 3: Sequences of Numbers       4: Vectors                 
 5: Missing Values             6: Subsetting Vectors      
 7: Matrices and Data Frames   8: Logic                   
 9: Functions                 10: lapply and sapply       
11: vapply and tapply         12: Looking at Data         
13: Simulation                14: Dates and Times         
15: Base Graphics             
Selection: 5
  |                                                     |   0%
| Missing values play an important role in statistics and data
| analysis. Often, missing values must not be ignored, but
| rather they should be carefully studied to see if there's an
| underlying pattern or cause for their missingness.
...
  |===                                                  |   5%
| In R, NA is used to represent any value that is 'not
| available' or 'missing' (in the statistical sense). In this
| lesson, we'll explore missing values further.
...
  |=====                                                |  10%
| Any operation involving NA generally yields NA as the
| result. To illustrate, let's create a vector c(44, NA, 5,
| NA) and assign it to a variable x.
> x<-c(44,NA,5,NA)
| Perseverance, that's the answer.
  |========                                             |  15%
| Now, let's multiply x by 3.
> 3*x
[1] 132  NA  15  NA
| Excellent job!
  |===========                                          |  20%
| Notice that the elements of the resulting vector that
| correspond with the NA values in x are also NA.
...
  |=============                                        |  25%
| To make things a little more interesting, lets create a
| vector containing 1000 draws from a standard normal
| distribution with y <- rnorm(1000).
> y<-rnorm(1000)
| Keep up the great work!
  |================                                     |  30%
| Next, let's create a vector containing 1000 NAs with z <-
| rep(NA, 1000).
> z<-rep(NA,1000)
| You're the best!
  |===================                                  |  35%
| Finally, let's select 100 elements at random from these 2000
| values (combining y and z) such that we don't know how many
| NAs we'll wind up with or what positions they'll occupy in
| our final vector -- my_data <- sample(c(y, z), 100).
> my_data <- sample(c(y, z), 100)
| You are really on a roll!
  |=====================                                |  40%
| Let's first ask the question of where our NAs are located in
| our data. The is.na() function tells us whether each element
| of a vector is NA. Call is.na() on my_data and assign the
| result to my_na.
> my_na<- is.na(my_data)
| You are doing so well!
  |========================                             |  45%
| Now, print my_na to see what you came up with.
> my_na
  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE
 [10]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE
 [19]  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
 [28]  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE
 [37] FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE
 [46] FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE
 [55] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE
 [64]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE
 [73] FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
 [82] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE
 [91] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE
[100]  TRUE
| You nailed it! Good job!
  |==========================                           |  50%
| Everywhere you see a TRUE, you know the corresponding
| element of my_data is NA. Likewise, everywhere you see a
| FALSE, you know the corresponding element of my_data is one
| of our random draws from the standard normal distribution.
...
  |=============================                        |  55%
| In our previous discussion of logical operators, we
| introduced the `==` operator as a method of testing for
| equality between two objects. So, you might think the
| expression my_data == NA yields the same results as is.na().
| Give it a try.
> my_data == NA
  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [20] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [39] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [58] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [77] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [96] NA NA NA NA NA
| You got it!
  |================================                     |  60%
| The reason you got a vector of all NAs is that NA is not
| really a value, but just a placeholder for a quantity that
| is not available. Therefore the logical expression is
| incomplete and R has no choice but to return a vector of the
| same length as my_data that contains all NAs.
...
  |==================================                   |  65%
| Don't worry if that's a little confusing. The key takeaway
| is to be cautious when using logical expressions anytime NAs
| might creep in, since a single NA value can derail the
| entire thing.
...
  |=====================================                |  70%
| So, back to the task at hand. Now that we have a vector,
| my_na, that has a TRUE for every NA and FALSE for every
| numeric value, we can compute the total number of NAs in our
| data.
...
  |========================================             |  75%
| The trick is to recognize that underneath the surface, R
| represents TRUE as the number 1 and FALSE as the number 0.
| Therefore, if we take the sum of a bunch of TRUEs and
| FALSEs, we get the total number of TRUEs.
...
  |==========================================           |  80%
| Let's give that a try here. Call the sum() function on my_na
| to count the total number of TRUEs in my_na, and thus the
| total number of NAs in my_data. Don't assign the result to a
| new variable.
> sum(my_na)
[1] 46
| Great job!
  |=============================================        |  85%
| Pretty cool, huh? Finally, let's take a look at the data to
| convince ourselves that everything 'adds up'. Print my_data
| to the console.
> my_data
  [1]  1.28833738 -1.71566792 -2.10643156  0.52646288
  [5]          NA -0.43369979 -0.73588709          NA
  [9]          NA          NA          NA  0.04198942
 [13]          NA          NA -0.07881950          NA
 [17] -0.95076395  2.31028848          NA -0.73152858
 [21]          NA          NA          NA  1.05294332
 [25]          NA          NA          NA          NA
 [29]          NA -0.59012503 -0.38427819          NA
 [33]          NA          NA -0.48309105          NA
 [37]  0.32180094 -1.05520535  0.27521195          NA
 [41] -1.62668329          NA          NA -2.18910058
 [45]  1.74314263 -0.07327235  1.49444965          NA
 [49]  1.05747803          NA  0.71787615 -1.11394933
 [53] -0.08661972  0.87982505 -0.51738201          NA
 [57]  0.40314933          NA          NA  0.17416071
 [61] -0.46598802          NA  0.01918326          NA
 [65] -0.47643745          NA  0.96464973          NA
 [69]  1.07043774          NA          NA  0.55960535
 [73]  0.41772515          NA          NA -0.41373806
 [77]  0.80355818 -1.01706160 -1.36218341 -1.55671571
 [81]          NA  1.42576312 -0.49953533 -0.52994970
 [85]  0.62437351          NA -0.53641307          NA
 [89]          NA -0.08968300  0.18669337          NA
 [93]          NA  1.05551995  1.19631709 -0.47725033
 [97]          NA          NA -1.49508777          NA
| You're the best!
  |================================================     |  90%
| Now that we've got NAs down pat, let's look at a second type
| of missing value -- NaN, which stands for 'not a number'. To
| generate NaN, try dividing (using a forward slash) 0 by 0
| now.
> 0/0
[1] NaN
| Your dedication is inspiring!
  |==================================================   |  95%
| Let's do one more, just for fun. In R, Inf stands for
| infinity. What happens if you subtract Inf from Inf?
> Inf-Inf
[1] NaN
| All that hard work is paying off!
  |=====================================================| 100%

You may also like

转载请注明:满忘近 » swirl Lesson 5: Missing Values

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址