Welcome to the exciting world of R! Whether this is your first time ever using R or you’re just a little rusty and wanting a refresher, this tutorial will help kick-start your R journey.
In this tutorial, we’ll cover some of the crucial basics for effectively using R for data analysis, such as:
Matrices won’t be explicitly covered in this tutorial; however, there is another tutorial on this site, (Matrix Algebra in R), that covers how to create and work with matrices.
## [1] "R version 4.0.2 (2020-06-22)"
## [1] Package Version
## <0 rows> (or 0-length row.names)
Opening R for the very first time can be a bit daunting, especially if you’re used to point-and-click software (e.g., SPSS). Let’s ease into using R by first realizing that it can function like a simple calculator. In the following code chunks (pieces of code embedded in this document), we’ll see what happens when we plug some simple arithmetic problems into R.
1 + 1
## [1] 2
15 - 8
## [1] 7
6 * 7
## [1] 42
18 / 9
## [1] 2
5^3
## [1] 125
We can see from this that inputting a simple equation into R will return the correct answer. How does R handle more complicated equations and the order of operations? As you can see below, R is able to correctly follow the order of operations.
5 + 3 * 2^2
## [1] 17
Additionally, we can compare different values together.
5 > 3
## [1] TRUE
15 < 12
## [1] FALSE
5 == 7 # This asks if the two are equal to each other
## [1] FALSE
13 != 14 # This asks if the two aren't equal to each other
## [1] TRUE
Basic arithmetic is nice, but we’re looking to do things that are a bit more complicated than that. The first step to accessing the more advanced functionality of R is getting comfortable with variabes. In R, we can store values as variables. We can do this by assigning the value to the variable name. For example, if we wanted x to be equal to 15, we could use the following code:
x <- 15
x # Input the name of a variable to print its value
## [1] 15
As you can see, x is now equal to 15. We can now begin using x in place of the number 15 in our work.
x + 7
## [1] 22
x / 3
## [1] 5
x^3
## [1] 3375
Let’s make another variable, y, with a value of 5.
y <- 5
y
## [1] 5
We can now perform the same arithmetic and comparisons with both of our variables.
x + y
## [1] 20
x - y
## [1] 10
x * y
## [1] 75
x / y
## [1] 3
x < y
## [1] FALSE
x == y * 3
## [1] TRUE
You may be wondering if it’s possible to store multiple values inside of a variable. We can, but we give it a different name - vector. Before we get into vectors, we have another topic that we should talk about.
To effectively use R, it is critical that you have a good working understanding of how to utilize functions. Functions are how you will accomplish a large portion of the actions that you likely wish to perform inside of R. R comes with a large variety of funtions inside the pre-installed core packages that are automatically loaded when you open R. However, these functions likely won’t meet all of your needs. In these cases, you need to install additional packages written by other R users.
Functions in R follow a general format:
name_of_function(argument1 = blah, argument2 = blah blah, etc.)
To start, we need to know which function we’re wanting to use, and sometimes this can be the most difficult part. Usually a quick “How do I ___ in R?” Google search will point you in the right direction. From here, we need to determine which arguments the function needs us to specify in order to run. R has a handy way of making the documentation on functions very accessible. Simply submit the following code:
?name_of_function
# Or sometimes the following if the previous returns nothing
??name_of_function
Running the above code will open up the R Documentation for the function in question. This documentation often offers a detailed look at the function, including all of the arguments as well as information on what the arguments mean. Let’s take a look at some R Documentation. In case you aren’t following along inside of R, this is what the following code will show you.
?sum
As you can see, we get a description of the function - basically, it adds together the elements passed to it. We can see that it has two arguments ...
and na.rm
. In this case, ...
is how we’ll specify all of the things that we’re wanting to sum. na.rm
allows us to specify if we want missing values to be removed; if not, then the function will return NA
if any missing values are present. Let’s try out the function. We’re going to sum the numbers from 1 to 5. In this code, the colon, :
, indicates “x, y, and all of the numbers inbetween.”
sum(... = 1:5, na.rm = FALSE)
## [1] 15
We can also see that we don’t have to specify the argument names, as long as we keep the information in the proper order.
sum(1:5, FALSE)
## [1] 15
When first getting started with new functions (or if you’re going to be sharing your code with other people), I’d strongly recommend specifying the argument names, however. It makes your code much easier to read and interpret - especially for other people who may not be familiar with the functions you’re using.
If you want to learn how to write your own functions, we have a tutorial available for that as well - Writing Functions in R.
In R, a vector is a unidimensional array of values. These values can be numeric (as we’ve explored already), character (such as words or phrases), or logical (TRUE or FALSE).
c()
To create a vector, we (often) need to use the concatenate function, c()
. We’ll start off by making a character vector that includes various dog breed
s. Notice that we need to put character data inside of quotation marks. We’ll also make vectors of the average weight
s (pounds) and height
s (inches), as well as size
category, for the females of each breed
. We’ll use these later in the tutorial when discussing factors and data frames.
breed <- c("German Shepherd", "Siberian Husky", "Beagle", "Golden Retriever", "Boston Terrier", "Shiba Inu")
breed
## [1] "German Shepherd" "Siberian Husky" "Beagle" "Golden Retriever"
## [5] "Boston Terrier" "Shiba Inu"
weight <- c(60, 43, 25, 60, 18, 17)
weight
## [1] 60 43 25 60 18 17
height <- c(23, 21, 14, 22, 16, 15)
height
## [1] 23 21 14 22 16 15
size <- c(3, 2, 1, 3, 1, 1)
size
## [1] 3 2 1 3 1 1
There are a number of cases in which we don’t need to use c()
when making vectors. For example, if we wish to make a vector out of the numbers from 1 to 6, we could simply do the following:
a <- 1:6
a
## [1] 1 2 3 4 5 6
In this code, the colon, :
, denotes all of the numbers from 1 to 6. However, if we wanted to make another vector with all of the numbers from 1 to 10, excluding 7, we’d need to go back to using c()
as shown below.
b <- c(1:6, 8:10)
b
## [1] 1 2 3 4 5 6 8 9 10
As you can see, this code concatenates the numbers from 1 to 6 and the numbers from 8 to 10.
rep()
The replicate function, rep()
, is useful for creating vectors that contain replicated values. For example, if we needed a vector that included 1 repeated 10 times, we could use either of the following:
d <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
d
## [1] 1 1 1 1 1 1 1 1 1 1
e <- rep(x = 1, times = 10)
e
## [1] 1 1 1 1 1 1 1 1 1 1
This function becomes much more useful as the size of our needed vectors increases. For example, if we needed a vector that included the numbers 1 - 5 repeated three times:
g <- c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
g
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
h <- rep(x = 1:5, times = 3)
h
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
If you look at the R Documentation for rep()
, you can see that it has two other possible arguments: each =
and length.out =
. These arguments are important for being able to get the most value out of rep()
. For example, if instead of wanting x
replicated a certain number of times (like with vector h
), we can use each =
to say that we want each element of x
replicated that number of times, like with vector i
below.
i <- rep(x = 1:5, each = 3)
i
## [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
We can also use the length.out =
argument to specify how long we want the resulting vector to be. For example, if we want to replicate 1:5
until the vector is 25 elements long, we can specify that with the following code for vector j
. Note that if the length specified in length.out
isn’t perfectly divisible by the length of x
, then rep()
will repeat x
until the vector is filled and then stop, regardless of the last element of x
replicated - as shown with vector k
.
j <- rep(x = 1:5, length.out = 25)
j
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
k <- rep(x = 1:5, length.out = 23)
k
## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3
In R, a factor refers to a type of data that is used with categorical variables. If you think back to the section on vectors, you may remember that we created a vector that represents the dogs’ size
classification. In this vector we used the numbers 1, 2, and 3, to represent these size
classifications. But what do those numbers mean? It’s time to find out (and also let R know)!
In this section, we’ll be using a new function, factor()
, to make sense of the values that we assigned earlier. Here’s the R Documentation. With this function, we have a number of arguments that we can use to specify various aspects of our categorical variables. We’ll use x =
to specify the vector we’re wanting to factor, levels =
to specify the unique values in the vector, labels =
to specify the labels that correspond to the values in levels
, and we can set ordered =
to TRUE
if the labels have an ordinal hierarchy. The original size
vector is also repeated here for reference.
size
## [1] 3 2 1 3 1 1
size.f <- factor(x = size, levels = c(1, 2, 3), labels = c("small", "medium", "large"), ordered = TRUE)
size.f
## [1] large medium small large small small
## Levels: small < medium < large
We can see that our new variable, size.f
(shorthand for the factored version of size
), contains the corresponding labels for the size
vector. Additionally, because we specified that there is an order to the values, we can use logical comparisons on the values. For example we can see if the first value in size.f
is larger than the second value in size.f
This code uses some simple subsetting, which we’ll discuss later.
size.f[1] > size.f[2]
## [1] TRUE
Now that we have all of our vectors the way that we want them to be, we should talk about data frames. Whether you’ve used R before or not, you’re probably familiar with the concept of a data frame - and you’ve probably even seen them or worked with them before. A data frame is R’s version of a dataset: such as data inside of SPSS or Excel. R comes with some built in data frames, so we’ll take a look at one of those while also exploring some functions for interacting with data frames. After that, we’ll use the vectors we made earlier to build our own dataframe.
Like mentioned earlier, R comes with some built in data frames. One of the most commonly used for illustrations is the mtcars
data; this is data from the 1974 Motor Trend US magazine. If you wish to view the full data frame in your own R window, you can use the following code, but since it’s a large amount of data, we’ll use some other functions for viewing portions of it here.
View(mtcars)
There are two very similar functions for getting a quick look at your data, and they are head()
and tail()
. As you may have guessed, head()
shows you the first rows in the data, and tail()
shows you the last rows. The n =
argument in both functions defaults to showing 6 rows; however, you are free to change that to however many rows you wish.
head(x = mtcars, n = 2)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
tail(x = mtcars, n = 2)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Maserati Bora 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
## Volvo 142E 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
Another useful function for getting a feel for your data is str()
, which shows you the structure of the data frame. As you can see, this includes all of the variable names, the type of variable (i.e., numeric, character), and a preview of the data.
str(object = mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
The last function we’ll look at here is summary()
. This will also show you the variable names, but with this function you can get an idea of the distributions of the variables, as well as an indication of missingness if there is any (there isn’t here).
summary(object = mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
One final note before moving on to the next section: As you continue your R journey, it will be important to know that you can specify a vector inside of a data frame by using the dollar sign, $
. In the code below, we’ll use $
to look at the mpg
variable in the mtcars
data frame.
mtcars$mpg
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
In this section, we’ll use our vectors that we created earlier to make a data frame. For this, we will use the data.frame()
function. In this function, we’ll specify the vectors that we want to include in our data frame.
doggos <- data.frame(breed, weight, height, size)
doggos
## breed weight height size
## 1 German Shepherd 60 23 3
## 2 Siberian Husky 43 21 2
## 3 Beagle 25 14 1
## 4 Golden Retriever 60 22 3
## 5 Boston Terrier 18 16 1
## 6 Shiba Inu 17 15 1
We can also add additional vectors to our existing data frames by using the column bind function, cbind()
doggos2 <- cbind(doggos, size.f)
doggos2
## breed weight height size size.f
## 1 German Shepherd 60 23 3 large
## 2 Siberian Husky 43 21 2 medium
## 3 Beagle 25 14 1 small
## 4 Golden Retriever 60 22 3 large
## 5 Boston Terrier 18 16 1 small
## 6 Shiba Inu 17 15 1 small
Instead of using cbind()
, we could choose to assign a new vector to our data frame with the $
that we learned about just a little bit ago. Notice that in this code we’re assigning size.f
to doggos$size.f
which will create a new variable in the data frame because there is not currently one that matches that name.
doggos$size.f <- size.f
doggos
## breed weight height size size.f
## 1 German Shepherd 60 23 3 large
## 2 Siberian Husky 43 21 2 medium
## 3 Beagle 25 14 1 small
## 4 Golden Retriever 60 22 3 large
## 5 Boston Terrier 18 16 1 small
## 6 Shiba Inu 17 15 1 small
If we don’t want to make our vectors separately, we could also choose to make the data frame in one chunk of code, as shown below.
doggos3 <- data.frame(breed = c("German Shepherd", "Siberian Husky", "Beagle", "Golden Retriever", "Boston Terrier", "Shiba Inu"),
weight = c(60, 43, 25, 60, 18, 17),
height = c(23, 21, 14, 22, 16, 15),
size = c(3, 2, 1, 3, 1, 1),
size.f = factor(x = size, levels = c(1, 2, 3),
labels = c("small", "medium", "large"),
ordered = TRUE))
doggos3
## breed weight height size size.f
## 1 German Shepherd 60 23 3 large
## 2 Siberian Husky 43 21 2 medium
## 3 Beagle 25 14 1 small
## 4 Golden Retriever 60 22 3 large
## 5 Boston Terrier 18 16 1 small
## 6 Shiba Inu 17 15 1 small
If you already have a dataset, we have a tutorial on how to import a wide range of files into R, as well as how to export your data from R.
Up to this point, we’ve covered three main types of data structures: variables, vectors, and data frames. The last type of data structure that we’ll discuss in this tutorial is lists. Lists are the most versatile type of data structure, because they can contain any other data structure. For example, we could have a list that contains a variable of my name, a vector of board game names, and a data frame of dog information. One of the beautiful things about lists is that the information inside of it doesn’t necessarily have to be related. You can use lists to store just about anything your heart desires. So let’s get started!
name <- "Kyle R. Ripley"
board_games <- c("Risk", "Gloomhaven", "Monopoly", "Island of El Dorado", "Scrabble", "Forbidden Sky")
Now that we have our variable (name
), vector (board_games
), and data frame (doggos
), we can make our list. We’ll do this with the list()
function. Notice that in the code, we’re giving the elements of our list names by specifying name_in_list = original_name
. We can also use the same name, as with doggos
.
random_list <- list(my_name = name, cool_games = board_games, doggos = doggos)
random_list
## $my_name
## [1] "Kyle R. Ripley"
##
## $cool_games
## [1] "Risk" "Gloomhaven" "Monopoly"
## [4] "Island of El Dorado" "Scrabble" "Forbidden Sky"
##
## $doggos
## breed weight height size size.f
## 1 German Shepherd 60 23 3 large
## 2 Siberian Husky 43 21 2 medium
## 3 Beagle 25 14 1 small
## 4 Golden Retriever 60 22 3 large
## 5 Boston Terrier 18 16 1 small
## 6 Shiba Inu 17 15 1 small
As you can see, list()
creates one object that has all of the other data stored inside of it. Did I mention that you can also store lists inside of other lists?
listception <- list(new_list = list("this", "is", "cool"), original_list = random_list)
listception
## $new_list
## $new_list[[1]]
## [1] "this"
##
## $new_list[[2]]
## [1] "is"
##
## $new_list[[3]]
## [1] "cool"
##
##
## $original_list
## $original_list$my_name
## [1] "Kyle R. Ripley"
##
## $original_list$cool_games
## [1] "Risk" "Gloomhaven" "Monopoly"
## [4] "Island of El Dorado" "Scrabble" "Forbidden Sky"
##
## $original_list$doggos
## breed weight height size size.f
## 1 German Shepherd 60 23 3 large
## 2 Siberian Husky 43 21 2 medium
## 3 Beagle 25 14 1 small
## 4 Golden Retriever 60 22 3 large
## 5 Boston Terrier 18 16 1 small
## 6 Shiba Inu 17 15 1 small
In the next section, subsetting, you’ll learn how to reference specific elements of your list, and this is where the real power of using lists resides.
We saw a very brief example of subsetting earlier in the factor section of this tutorial. Let’s revisit that example as an introduction to this section. Remember that we had just factored our size
variable, and we were wanting to compare two elements from our factored vector, size.f
. Here is our original code.
size.f[1] > size.f[2]
## [1] TRUE
In this code, you can see that we used brackets ([ ]
) to specify the element in the vector that we wanted to subset. We could use the same format to see which dog was third in our breed
vector.
breed[3]
## [1] "Beagle"
As with most things in R, we can choose to assign our subset to a new variable name.
third_dog <- breed[3]
third_dog
## [1] "Beagle"
Because vectors are unidimensional, we only need to specify one position in order to fully describe which element we’re wanting to subset. However, that isn’t the case for all types of data structures.
We can subset data frames in a similar fashion to how we subset vectors, but we need to specify the difference between rows and columns. In the following code we’re going to specify that we want the value in the 2nd row and 3rd column (Siberian Husky height). In most situations, including subsetting, you need to specify rows before columns - I remember this because of RC Cola, but there are many different mnemonic devices that I’ve heard for this.
husky_height <- doggos[2, 3]
husky_height
## [1] 21
Now that we know how to specify a single value, we can extrapolate that to single rows or columns. For example, if we wanted all of the info on huskies, we could use the following code:
doggos[2, ]
## breed weight height size size.f
## 2 Siberian Husky 43 21 2 medium
Because we didn’t specify a certain column, R will return all columns for that row. We can also specify that we don’t want a specific row or column by putting a -
in front of the number. With this code, we’ll get rid of the numeric value for size.
doggos[2, -4]
## breed weight height size.f
## 2 Siberian Husky 43 21 medium
This functionality also works for specifying columns. As shown below, we’ll subset the weight of all the breeds.
doggos[, 2]
## [1] 60 43 25 60 18 17
As you get more comfortable with subsetting, and with R in general, you’ll be able to do much more useful subsettings. For example, we want the breed, weight, and height of dogs that aren’t classified as small.
doggos
## breed weight height size size.f
## 1 German Shepherd 60 23 3 large
## 2 Siberian Husky 43 21 2 medium
## 3 Beagle 25 14 1 small
## 4 Golden Retriever 60 22 3 large
## 5 Boston Terrier 18 16 1 small
## 6 Shiba Inu 17 15 1 small
doggos[doggos$size != 1, 1:3]
## breed weight height
## 1 German Shepherd 60 23
## 2 Siberian Husky 43 21
## 4 Golden Retriever 60 22
As mentioned earlier, the real benefit of using lists comes from being able to subset out any piece of information stored in the list. Subsetting lists is a crucial skill, because many statistical analyses in R return their results as a list. If you’re comfortable subsetting, you’ll have a much easier time extracting your results.
For the sake of saving some scrolling, let’s go ahead and revisit our list from earlier.
random_list
## $my_name
## [1] "Kyle R. Ripley"
##
## $cool_games
## [1] "Risk" "Gloomhaven" "Monopoly"
## [4] "Island of El Dorado" "Scrabble" "Forbidden Sky"
##
## $doggos
## breed weight height size size.f
## 1 German Shepherd 60 23 3 large
## 2 Siberian Husky 43 21 2 medium
## 3 Beagle 25 14 1 small
## 4 Golden Retriever 60 22 3 large
## 5 Boston Terrier 18 16 1 small
## 6 Shiba Inu 17 15 1 small
As a forewarning, subsetting lists is more complicated than the other subsetting we’ve been doing up to this point, but it is definitely worth it.
Let’s get started with some of the ways that we can go about subsetting our list. We’ll start off simply by looking at the variable component. We can use the same single bracket, [ ]
, method as before to subset out the name variable, but if we do it this way, we get another list, not the actual value.
my_name1 <- random_list[1]
my_name1
## $my_name
## [1] "Kyle R. Ripley"
class(my_name1)
## [1] "list"
Because we used a single bracket, we’re telling R that we just want the first element of the list, which is a list that contains my name - we’re not actually asking for my name. In fact, we can continue to use single brackets indefinitely and make no progress, because we’ll just keep asking for the same list over and over again.
my_name1 <- random_list[1][1][1][1][1][1][1][1][1][1]
my_name1
## $my_name
## [1] "Kyle R. Ripley"
class(my_name1)
## [1] "list"
If we want the actual value stored in the list we need to either begin with different subsetting operators or continue subsetting. We’ll do both to show the difference. We’ll start by continuing with our current path.
my_name1
## $my_name
## [1] "Kyle R. Ripley"
my_name2 <- random_list[1][[1]]
my_name2
## [1] "Kyle R. Ripley"
class(my_name2)
## [1] "character"
As you can see, we used double brackets, [[ ]]
to pull out the actual element inside of the list. Let’s try subsetting a different way to see if we can get there easier.
my_name3 <- random_list[[1]]
my_name3
## [1] "Kyle R. Ripley"
class(my_name3)
## [1] "character"
For this, we can simply start by using double brackets. We’re telling R that we want to extract the first element in our random_list
and have it returned with it’s actual class (in this case, a single element character vector), not as a list (though sometimes we will use double brackets to extract lists from other lists).
Now that we know about double bracket subsetting, let’s take a look at the cool_games
vector in our list.
games <- random_list[[2]]
games
## [1] "Risk" "Gloomhaven" "Monopoly"
## [4] "Island of El Dorado" "Scrabble" "Forbidden Sky"
class(games)
## [1] "character"
As expected, we receive a character vector back, but what if we wanted to pull out a particular element of that vector? If you think back to subsetting vectors, this will work exactly the same. We’ll simply use single brackets to subset the vector returned by the double bracket subsetting. Let’s pull out Gloomhaven.
Gloomhaven <- random_list[[2]][2]
Gloomhaven
## [1] "Gloomhaven"
class(Gloomhaven)
## [1] "character"
So, if subsetting vectors inside of lists works the same as subsetting vectors, then subsetting data frames inside of lists probably functions the same too? Let’s try it out. How much does an average female German Shepherd weigh? Here we’re going to extract the data frame (the third element in our list) and then pull out the value in the 1st row and 2nd column.
GS_weight <- random_list[[3]][1, 2]
GS_weight
## [1] 60
class(GS_weight)
## [1] "numeric"
You’re really getting the hang of this! Now, remember that lists can also be inside of other lists? Let’s take a look at subsetting that. After this, you’ll be ready to subset anything thrown your way!
We’re going to go back to listception
- our list with 2 lists in it - and extract the character size (size.f
) of a Shiba Inu. I’ll also show you another handy trick for subsetting lists that may help if you tend to forget the order that your list elements are in. For this, we’re going to use double brackets to subset out the original list
from listception
. We’ll then use double brackets again to extract doggos
from original list
.
shiba_size <- listception[["original_list"]][["doggos"]][6, 5]
shiba_size
## [1] small
## Levels: small < medium < large
As shown, you can use the names of elements to subset lists - but when you get down to a vector or data frame, you’ll need to go back to the old way.
We’ll wrap up this tutorial with a brief discussion of packages. As mentioned earlier, R users are able to write their own functions in R. Some users choose to turn groups of functions that they’ve written (typically along a common theme, such as Bayesian latent variable analysis) into a package that other users can download, install, and use on their own computer.
If you find a package that you want to use, it’s incredibly easy to install them using the install.packages()
function. Simply include the name of the package in the function, and R will handle the rest.
install.packages("blavaan")
Once you’ve installed a package, you can choose when you want to use it. Packages won’t automatically be ready when you open R (unless they’re part of the core R packages mentioned earlier), so you need to let R know you want to use installed packages with the library()
function.
library(blavaan)
If you decide that you don’t need the package anymore during a session, you can use the detach()
function to remove it from your current library.
detach("package:blavaan", unload = TRUE)