| Please choose a course, or type 0 to exit swirl. 1: R Programming 2: Take me to the swirl course repository! Selection: 1 | Please choose a lesson, or type 0 to return to course menu. 1: Basic Building Blocks 2: Workspace and Files 3: Sequences of Numbers 4: Vectors 5: Missing Values 6: Subsetting Vectors 7: Matrices and Data Frames 8: Logic 9: Functions 10: lapply and sapply 11: vapply and tapply 12: Looking at Data 13: Simulation 14: Dates and Times 15: Base Graphics Selection: 6 | | 0% | In this lesson, we'll see how to extract elements from a vector | based on some conditions that we specify. ... |= | 3% | For example, we may only be interested in the first 20 elements | of a vector, or only the elements that are not NA, or only those | that are positive or correspond to a specific variable of | interest. By the end of this lesson, you'll know how to handle | each of these scenarios. ... |=== | 5% | I've created for you a vector called x that contains a random | ordering of 20 numbers (from a standard normal distribution) and | 20 NAs. Type x now to see what it looks like. > x [1] -0.33405234 -0.49484149 NA NA NA [6] NA NA 0.61599044 NA -0.44508310 [11] -0.38448303 NA NA NA -0.82861770 [16] NA NA -1.17988002 NA NA [21] -1.61544872 0.42726934 -0.05340495 1.09906686 NA [26] NA NA NA -0.28733649 0.03352711 [31] 0.78400054 -1.85444549 NA -0.71313966 NA [36] 0.58638413 0.68991587 -2.03005637 NA -0.53542368 | That's correct! |==== | 8% | The way you tell R that you want to select some particular | elements (i.e. a 'subset') from a vector is by placing an 'index | vector' in square brackets immediately following the name of the | vector. ... |====== | 10% | For a simple example, try x[1:10] to view the first ten elements | of x. > x[1:10] [1] -0.3340523 -0.4948415 NA NA NA [6] NA NA 0.6159904 NA -0.4450831 | You nailed it! Good job! |======= | 13% | Index vectors come in four different flavors -- logical vectors, | vectors of positive integers, vectors of negative integers, and | vectors of character strings -- each of which we'll cover in this | lesson. ... |========= | 15% | Let's start by indexing with logical vectors. One common scenario | when working with real-world data is that we want to extract all | elements of a vector that are not NA (i.e. missing data). Recall | that is.na(x) yields a vector of logical values the same length | as x, with TRUEs corresponding to NA values in x and FALSEs | corresponding to non-NA values in x. ... |========== | 18% | What do you think x[is.na(x)] will give you? 1: A vector of TRUEs and FALSEs 2: A vector with no NAs 3: A vector of all NAs 4: A vector of length 0 Selection: 3 | Perseverance, that's the answer. |============ | 21% | Prove it to yourself by typing x[is.na(x)]. > x[is.na(x)] [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA | Nice work! |============= | 23% | Recall that `!` gives us the negation of a logical expression, so | !is.na(x) can be read as 'is not NA'. Therefore, if we want to | create a vector called y that contains all of the non-NA values | from x, we can use y <- x[!is.na(x)]. Give it a try. > y<-x[!is.na(x)] | All that hard work is paying off! |=============== | 26% | Print y to the console. > y [1] -0.33405234 -0.49484149 0.61599044 -0.44508310 -0.38448303 [6] -0.82861770 -1.17988002 -1.61544872 0.42726934 -0.05340495 [11] 1.09906686 -0.28733649 0.03352711 0.78400054 -1.85444549 [16] -0.71313966 0.58638413 0.68991587 -2.03005637 -0.53542368 | Excellent work! |================ | 28% | Now that we've isolated the non-missing values of x and put them | in y, we can subset y as we please. ... |================== | 31% | Recall that the expression y > 0 will give us a vector of logical | values the same length as y, with TRUEs corresponding to values | of y that are greater than zero and FALSEs corresponding to | values of y that are less than or equal to zero. What do you | think y[y > 0] will give you? 1: A vector of all the negative elements of y 2: A vector of all NAs 3: A vector of TRUEs and FALSEs 4: A vector of length 0 5: A vector of all the positive elements of y Selection: 5 | Your dedication is inspiring! |=================== | 33% | Type y[y > 0] to see that we get all of the positive elements of | y, which are also the positive elements of our original vector x. > y[y>0] [1] 0.61599044 0.42726934 1.09906686 0.03352711 0.78400054 [6] 0.58638413 0.68991587 | Excellent job! |===================== | 36% | You might wonder why we didn't just start with x[x > 0] to | isolate the positive elements of x. Try that now to see why. > x[x>0] [1] NA NA NA NA NA [6] 0.61599044 NA NA NA NA [11] NA NA NA NA 0.42726934 [16] 1.09906686 NA NA NA NA [21] 0.03352711 0.78400054 NA NA 0.58638413 [26] 0.68991587 NA | That's correct! |====================== | 38% | Since NA is not a value, but rather a placeholder for an unknown | quantity, the expression NA > 0 evaluates to NA. Hence we get a | bunch of NAs mixed in with our positive numbers when we do this. ... |======================== | 41% | Combining our knowledge of logical operators with our new | knowledge of subsetting, we could do this -- x[!is.na(x) & x > | 0]. Try it out. > x[!is.na(x) & x>0] [1] 0.61599044 0.42726934 1.09906686 0.03352711 0.78400054 [6] 0.58638413 0.68991587 | You got it right! |========================= | 44% | In this case, we request only values of x that are both | non-missing AND greater than zero. ... |=========================== | 46% | I've already shown you how to subset just the first ten values of | x using x[1:10]. In this case, we're providing a vector of | positive integers inside of the square brackets, which tells R to | return only the elements of x numbered 1 through 10. ... |============================ | 49% | Many programming languages use what's called 'zero-based | indexing', which means that the first element of a vector is | considered element 0. R uses 'one-based indexing', which (you | guessed it!) means the first element of a vector is considered | element 1. ... |============================== | 51% | Can you figure out how we'd subset the 3rd, 5th, and 7th elements | of x? Hint -- Use the c() function to specify the element numbers | as a numeric vector. > x[c(3,5,7)] [1] NA NA NA | You are doing so well! |=============================== | 54% | It's important that when using integer vectors to subset our | vector x, we stick with the set of indexes {1, 2, ..., 40} since | x only has 40 elements. What happens if we ask for the zeroth | element of x (i.e. x[0])? Give it a try. > x[0] numeric(0) | Nice work! |================================= | 56% | As you might expect, we get nothing useful. Unfortunately, R | doesn't prevent us from doing this. What if we ask for the 3000th | element of x? Try it out. > x[3000] [1] NA | You got it! |================================== | 59% | Again, nothing useful, but R doesn't prevent us from asking for | it. This should be a cautionary tale. You should always make sure | that what you are asking for is within the bounds of the vector | you're working with. ... |==================================== | 62% | What if we're interested in all elements of x EXCEPT the 2nd and | 10th? It would be pretty tedious to construct a vector containing | all numbers 1 through 40 EXCEPT 2 and 10. ... |===================================== | 64% | Luckily, R accepts negative integer indexes. Whereas x[c(2, 10)] | gives us ONLY the 2nd and 10th elements of x, x[c(-2, -10)] gives | us all elements of x EXCEPT for the 2nd and 10 elements. Try | x[c(-2, -10)] now to see this. > x[c(-2,-10)] [1] -0.33405234 NA NA NA NA [6] NA 0.61599044 NA -0.38448303 NA [11] NA NA -0.82861770 NA NA [16] -1.17988002 NA NA -1.61544872 0.42726934 [21] -0.05340495 1.09906686 NA NA NA [26] NA -0.28733649 0.03352711 0.78400054 -1.85444549 [31] NA -0.71313966 NA 0.58638413 0.68991587 [36] -2.03005637 NA -0.53542368 | That's a job well done! |======================================= | 67% | A shorthand way of specifying multiple negative numbers is to put | the negative sign out in front of the vector of positive numbers. | Type x[-c(2, 10)] to get the exact same result. > x[-c(2,10)] [1] -0.33405234 NA NA NA NA [6] NA 0.61599044 NA -0.38448303 NA [11] NA NA -0.82861770 NA NA [16] -1.17988002 NA NA -1.61544872 0.42726934 [21] -0.05340495 1.09906686 NA NA NA [26] NA -0.28733649 0.03352711 0.78400054 -1.85444549 [31] NA -0.71313966 NA 0.58638413 0.68991587 [36] -2.03005637 NA -0.53542368 | That's the answer I was looking for. |======================================== | 69% | So far, we've covered three types of index vectors -- logical, | positive integer, and negative integer. The only remaining type | requires us to introduce the concept of 'named' elements. ... |========================================== | 72% | Create a numeric vector with three named elements using vect <- | c(foo = 11, bar = 2, norf = NA). > vect<-c(foo=11,bar=2,norf=NA) | Keep working like that and you'll get there! |=========================================== | 74% | When we print vect to the console, you'll see that each element | has a name. Try it out. > vect foo bar norf 11 2 NA | That's the answer I was looking for. |============================================= | 77% | We can also get the names of vect by passing vect as an argument | to the names() function. Give that a try. > names(vect) [1] "foo" "bar" "norf" | All that hard work is paying off! |============================================== | 79% | Alternatively, we can create an unnamed vector vect2 with c(11, | 2, NA). Do that now. > vect2<-c(11,2,NA) | Excellent job! |================================================ | 82% | Then, we can add the `names` attribute to vect2 after the fact | with names(vect2) <- c("foo", "bar", "norf"). Go ahead. > names(vect2)<-c("foo","bar","norf") | All that practice is paying off! |================================================= | 85% | Now, let's check that vect and vect2 are the same by passing them | as arguments to the identical() function. > identical(vect,vect2) [1] TRUE | That's a job well done! |=================================================== | 87% | Indeed, vect and vect2 are identical named vectors. ... |==================================================== | 90% | Now, back to the matter of subsetting a vector by named elements. | Which of the following commands do you think would give us the | second element of vect? 1: vect["2"] 2: vect[bar] 3: vect["bar"] Selection: 3 | You're the best! |====================================================== | 92% | Now, try it out. > vect["bar"] bar 2 | Great job! |======================================================= | 95% | Likewise, we can specify a vector of names with vect[c("foo", | "bar")]. Try it out. > vect[c("foo","bar")] foo bar 11 2 | Nice work! |========================================================= | 97% | Now you know all four methods of subsetting data from vectors. | Different approaches are best in different scenarios and when in | doubt, try it out! ... |==========================================================| 100%

### You may also like

- swirl Lesson 1: Basic Building Blocks
- swirl Lesson 2: Workspace and Files
- swirl Lesson 3: Sequences of Numbers
- swirl Lesson 4: Vectors
- swirl Lesson 5: Missing Values
- swirl Lesson 6: Subsetting Vectors
- swirl Lesson 7: Matrices and Data Frames
- swirl Lesson 8: Logic
- swirl Lesson 9: Functions
- swirl Lesson 10: lapply and sapply
- swirl Lesson 11: vapply and tapply
- swirl Lesson 12: Looking at Data
- swirl Lesson 13: Simulation
- swirl Lesson 14: Dates and Times
- swirl Lesson 15: Base Graphics