1 Overview

Welcome to the exciting world of R! Whether this is your first time ever using R or you’re just a little rusty and wanting a refresher, this tutorial will help kick-start your R journey.

In this tutorial, we’ll cover some of the crucial basics for effectively using R for data analysis, such as:

  • Variables
  • Functions
  • Vectors
  • Factors
  • Data Frames
  • Lists
  • Subsetting
  • Packages

Matrices won’t be explicitly covered in this tutorial; however, there is another tutorial on this site, (Matrix Algebra in R), that covers how to create and work with matrices.

1.1 Packages (and versions) used in this document

## [1] "R version 4.0.2 (2020-06-22)"
## [1] Package Version
## <0 rows> (or 0-length row.names)

2 Basic Functionality

Opening R for the very first time can be a bit daunting, especially if you’re used to point-and-click software (e.g., SPSS). Let’s ease into using R by first realizing that it can function like a simple calculator. In the following code chunks (pieces of code embedded in this document), we’ll see what happens when we plug some simple arithmetic problems into R.

1 + 1
## [1] 2
15 - 8
## [1] 7
6 * 7
## [1] 42
18 / 9
## [1] 2
5^3
## [1] 125

We can see from this that inputting a simple equation into R will return the correct answer. How does R handle more complicated equations and the order of operations? As you can see below, R is able to correctly follow the order of operations.

5 + 3 * 2^2
## [1] 17

Additionally, we can compare different values together.

5 > 3
## [1] TRUE
15 < 12
## [1] FALSE
5 == 7 # This asks if the two are equal to each other
## [1] FALSE
13 != 14 # This asks if the two aren't equal to each other
## [1] TRUE

4 Functions

To effectively use R, it is critical that you have a good working understanding of how to utilize functions. Functions are how you will accomplish a large portion of the actions that you likely wish to perform inside of R. R comes with a large variety of funtions inside the pre-installed core packages that are automatically loaded when you open R. However, these functions likely won’t meet all of your needs. In these cases, you need to install additional packages written by other R users.

Functions in R follow a general format:

name_of_function(argument1 = blah, argument2 = blah blah, etc.)

To start, we need to know which function we’re wanting to use, and sometimes this can be the most difficult part. Usually a quick “How do I ___ in R?” Google search will point you in the right direction. From here, we need to determine which arguments the function needs us to specify in order to run. R has a handy way of making the documentation on functions very accessible. Simply submit the following code:

?name_of_function
# Or sometimes the following if the previous returns nothing
??name_of_function

Running the above code will open up the R Documentation for the function in question. This documentation often offers a detailed look at the function, including all of the arguments as well as information on what the arguments mean. Let’s take a look at some R Documentation. In case you aren’t following along inside of R, this is what the following code will show you.

?sum

As you can see, we get a description of the function - basically, it adds together the elements passed to it. We can see that it has two arguments ... and na.rm. In this case, ... is how we’ll specify all of the things that we’re wanting to sum. na.rm allows us to specify if we want missing values to be removed; if not, then the function will return NA if any missing values are present. Let’s try out the function. We’re going to sum the numbers from 1 to 5. In this code, the colon, :, indicates “x, y, and all of the numbers inbetween.”

sum(... = 1:5, na.rm = FALSE)
## [1] 15

We can also see that we don’t have to specify the argument names, as long as we keep the information in the proper order.

sum(1:5, FALSE)
## [1] 15

When first getting started with new functions (or if you’re going to be sharing your code with other people), I’d strongly recommend specifying the argument names, however. It makes your code much easier to read and interpret - especially for other people who may not be familiar with the functions you’re using.

If you want to learn how to write your own functions, we have a tutorial available for that as well - Writing Functions in R.

8 Lists

Up to this point, we’ve covered three main types of data structures: variables, vectors, and data frames. The last type of data structure that we’ll discuss in this tutorial is lists. Lists are the most versatile type of data structure, because they can contain any other data structure. For example, we could have a list that contains a variable of my name, a vector of board game names, and a data frame of dog information. One of the beautiful things about lists is that the information inside of it doesn’t necessarily have to be related. You can use lists to store just about anything your heart desires. So let’s get started!

name <- "Kyle R. Ripley"
board_games <- c("Risk", "Gloomhaven", "Monopoly", "Island of El Dorado", "Scrabble", "Forbidden Sky")

Now that we have our variable (name), vector (board_games), and data frame (doggos), we can make our list. We’ll do this with the list() function. Notice that in the code, we’re giving the elements of our list names by specifying name_in_list = original_name. We can also use the same name, as with doggos.

random_list <- list(my_name = name, cool_games = board_games, doggos = doggos)
random_list
## $my_name
## [1] "Kyle R. Ripley"
## 
## $cool_games
## [1] "Risk"                "Gloomhaven"          "Monopoly"           
## [4] "Island of El Dorado" "Scrabble"            "Forbidden Sky"      
## 
## $doggos
##              breed weight height size size.f
## 1  German Shepherd     60     23    3  large
## 2   Siberian Husky     43     21    2 medium
## 3           Beagle     25     14    1  small
## 4 Golden Retriever     60     22    3  large
## 5   Boston Terrier     18     16    1  small
## 6        Shiba Inu     17     15    1  small

As you can see, list() creates one object that has all of the other data stored inside of it. Did I mention that you can also store lists inside of other lists?

listception <- list(new_list = list("this", "is", "cool"), original_list = random_list)
listception
## $new_list
## $new_list[[1]]
## [1] "this"
## 
## $new_list[[2]]
## [1] "is"
## 
## $new_list[[3]]
## [1] "cool"
## 
## 
## $original_list
## $original_list$my_name
## [1] "Kyle R. Ripley"
## 
## $original_list$cool_games
## [1] "Risk"                "Gloomhaven"          "Monopoly"           
## [4] "Island of El Dorado" "Scrabble"            "Forbidden Sky"      
## 
## $original_list$doggos
##              breed weight height size size.f
## 1  German Shepherd     60     23    3  large
## 2   Siberian Husky     43     21    2 medium
## 3           Beagle     25     14    1  small
## 4 Golden Retriever     60     22    3  large
## 5   Boston Terrier     18     16    1  small
## 6        Shiba Inu     17     15    1  small

In the next section, subsetting, you’ll learn how to reference specific elements of your list, and this is where the real power of using lists resides.