1 Overview

In this tutorial, we’ll cover the basics of importing data from some of the most common data file structures and sources:

  • .csv
  • .txt
  • .xls
  • .xlsx
  • SAS files
  • SPSS files
  • STATA files
  • and importing any of the previously mentioned data structures directly from the internet

To this end, we’ll explore how to import files using the functions in base R as well as the functions inside of the tidyverse.

1.1 Packages (and versions) used in this document

## [1] "R version 4.0.2 (2020-06-22)"
##    Package Version
##      utils   4.0.2
##  tidyverse   1.3.0
##      readr   1.3.1
##      haven   2.3.1
##     readxl   1.3.1
##       httr   1.4.2
##      dplyr   1.0.1

2 Using Base R to Import Data

R comes with a number of functions that are helpful for importing data without having to install any additional packages (though you may wish to depending on the type of data you’re importing). We won’t cover all of the functions, but we’ll cover some of the commonly used ones which will also inform those not covered.

2.1 read.csv()

The first function that we’ll look at is read.csv(). This function is a wrapper around the more general read.table(), meaning that it functions the same but has different defaults. These defaults make read.csv() primed to read .csv files without needing to specify any of the read.table() arguments. Below you’ll find sample code for importing a .csv file with read.csv(); the code shows the default arguments of this function that make it handy for importing .csv files: header, separator, and decimal.

dat <- read.csv(file = "your/file/path/goes/here.csv", header = TRUE, sep = ",", dec = ".")

You may wonder if you need to specify all of those arguments each time you import a file. The answer is “no - unless the defaults don’t meet your current needs.” The following code would work just fine for most .csv files.

dat <- read.csv(file = "your/file/path/goes/here.csv")

2.1.1 read.csv2()

You may have noticed that there is an argument that specifies the character that represents the decimal separator. If you’re working with a data file that was generated in a region that uses a comma as the decimal separator - such as some South American or European countries - then there is a different wrapper function that you should consider: read.csv2. This function defaults to the following example code.

dat <- read.csv2(file = "your/file/path/goes/here.csv", header = TRUE, sep = ";", dec = ",")

Similar to read.csv, if the defaults meet your needs, you only need to specify the location of the file you wish to import.

dat <- read.csv2(file = "your/file/path/goes/here.csv")

2.2 read.delim()

The next function that we’ll look at is read.delim(). This function is also a wrapper around read.table() - this time optimized for reading in tab-delimited (tab-separated values) files. In the following code, we can see that the only difference between read.delim() and read.csv() is the separator.

dat <- read.delim(file = "your/file/path/goes/here.txt", header = TRUE, sep = "\t", dec = ".")

As always, if the defaults are acceptable, then you don’t need to specify them.

dat <- read.delim(file = "your/file/path/goes/here.txt")

2.2.1 read.delim2()

Like with read.csv(), read.delim() also has a counterpart for our friends that use the comma as their decimal separator - read.delim2(). For this function, only the decimal separtor is changed.

dat <- read.delim2(file = "your/file/path/goes/here.txt", header = TRUE, sep = "\t", dec = ",")

3 Using Readr to Import Data

The first few functions that we’re going to look at are very similar to those that are available in base R. You’ll notice that the tidyverse functions use an underscore (“_”) instead of a period (“.”) in the function names.

3.1 read_delim()

The most general of the tidyverse functions for importing these types of files is read_delim(). This function allows you to specify the delimiter that you’d like to use. It also has some other handy options that we haven’t yet gone into yet - some of these are available in the base R functions as well. Notice that some of the argument names are different - for example, “sep” in the base R functions is replaced with “delim”.

In this code, we’re creating a vector of names that we will assign to the columns in our imported data; this is only necessary if your data doesn’t already have a row of names. In the read_delim() code, we’re specifying that we want the new_names vector to be assigned as the column names in the data. We’re also specifying that we don’t want to import the first 10 rows with the skip = 10 argument. Additionally, we’re only wanting to import the next 15 rows: n_max = 15.

new_names <- c("name1", "name2", "name3")
dat <- read_delim(file = "your/file/path/goes/here.txt", delim = "\t", col_names = new_names, skip = 10, n_max = 15)

Like with the base R functions, we also have wrapper functions available that make importing certain types of data a bit easier.

3.2 read_csv()

The first of these wrapper functions that we’ll look at is read_csv(). This function has all of the same optional arguments that we saw with read_delim(), but we can just ignore those if we don’t plan on using them. As with read_delim(), column names are assumed to be present, so we don’t need to specify them if they are present.

dat <- read_csv(file = "your/file/path/goes/here.csv")

3.2.1 read_csv2()

We also have a separate function for those datasets that have commas as the decimal separator.

dat <- read_csv2(file = "your/file/path/goes/here.csv")

3.3 read_tsv()

If we’re wanting to import tab-separated value files, we can use read_tsv() which has the correct defaults for these files (you can always change these inside the function arguments if need be).

dat <- read_csv2(file = "your/file/path/goes/here.txt")

4 Using Readxl to Import Data

The second package from the tidyverse that we’re going to use for importing data is readxl. This package has a variety of functions that are useful for working with excel files inside of R.

4.1 read_excel()

This function can be used to read in files of type .xls or .xlsx, and it will determine the type of file based on the extension of the file you’re attempting to import. It’s arguments are very similar to those used in the readr functions - however, the file = argument is replaced with path =. We also have the sheet = argument that we can use to specify which sheet of the excel file we wish to import - this can be specified with a string name or with the integer of the sheet. If neither is supplied, it will import the first sheet.

dat <- read_excel(path = "your/file/path/goes/here.xlsx", sheet = 3)

4.2 excel_sheets()

If you can’t remember the names of the sheets in your excel file, you can use this function to retrieve a list of the sheet names in the file.

excel_sheets(path = "your/file/path/goes/here.xlsx")

5 Using Haven to Import Data

The third package from the tidyverse that we’ll be using is haven. This package provides functions that are useful for importing foreign data files - such as those from SPSS, Stata, and SAS.

5.1 For SPSS

The haven package has functions for importing .sav, .zsav, and .por files.

5.1.1 read_sav()

This function is used for importing both .sav and .zsav files.

dat <- read_sav(file = "your/file/path/goes/here.sav")
dat <- read_sav(file = "your/file/path/goes/here.zsav")

5.1.2 read_por()

This function is used for importing .por files.

dat <- read_por(file = "your/file/path/goes/here.por")

5.2 For Stata

5.2.1 read_dta()

This function is used for importing Stata data files into R.

dat <- read_dta(file = "your/file/path/goes/here.dta")

5.3 For SAS

This function is used for importing SAS data files into R. It can be used with both .sas7bdat and .sas7bcat files

dat <- read_sas(file = "your/file/path/goes/here.sas7bdat")
dat <- read_sas(file = "your/file/path/goes/here.sas7bcat")

6 Using Httr to Import Data Directly from the Internet

This is an incredibly handy feature of importing data in R. It allows you to pull data directly from online repositories without having to manually download the files and store them locally. With the httr package loaded, you should be able to use any of the previously discussed functions to import data from an internet repository. Just place the url of the file in the file = or path = argument.

6.1 Example

dat <- read_csv(file = "https://yourwebsite.com/path/to/the/data.csv")

6.2 download.file()

You can also use the download.file() function inside of base R to download files from the internet into your working directory.

download.file("https://yourwebsite.com/path/to/the/data.txt", 
              destfile = "local_data.txt")
dat <- read_tsv("local_data.txt")

7 Using Readr and Haven to Export Data

In addition to importing data into R, you may also wish to export your data. The tidyverse has a host of functions to make exporting datasets easy.

7.1 Using readr

The readr package has write_*() functions to accompany all of the read_*() functions that we examined earlier. We’ll first look at the most general function, write_delim().

In the following code, you can see that we specify the data that we’re wanting to export with the x = argument and the location for the new file to be saved with the path = argument. We can also specify the delimiter that we wish to use as well as the string to be used for missing data.

write_delim(x = dat, path = "your/file/path/goes/here.csv", delim = " ", na = "NA")

The other write_*() functions follow the same pattern as earlier, such that the defaults are changed to reflect specific data formats. Because of this, we can use the following functions without needing to specify the delimiter.

write_csv(x = dat, path = "your/file/path/goes/here.csv", na = "NA")
write_csv2(x = dat, path = "your/file/path/goes/here.csv", na = "NA")
write_tsv(x = dat, path = "your/file/path/goes/here.txt", na = "NA")

7.2 Using Haven

The haven package can again be used for working with non-native data file types. We can write data files for SPSS, Stata, and SAS (as of haven version 2.1.0, writing SAS data is still experimental).

7.2.1 SPSS

Haven allows us to export both .sav and .zsav data files. If compress = TRUE, the data file will be exported as a .zsav file.

write_sav(data = dat, path = "your/file/path/goes/here.sav", compress = FALSE) # .sav file
write_sav(data = dat, path = "your/file/path/goes/here.sav", compress = TRUE) # .zsav file

7.2.2 Stata

With haven, we can export .dta data files and specify which version of Stata to use.

write_dta(data = dat, path = "your/file/path/goes/here.dta", version = 14)

7.2.3 SAS

Finally, we can also export our data as a sas formatted data file. This function is currently experimental, and may only work for limited datasets.

write_sas(data = dat, path = "your/file/path/goes/here.dta")