• 1 Overview
    • 1.1 Packages (and versions) used in this document
  • 2 ANCOVA
    • 2.1 Getting Familiar with the Data
      • 2.1.1 Descriptive Statistics
    • 2.2 The Analysis
      • 2.2.1 Brief Overview of SS Types
      • 2.2.2 Performing the ANCOVA

1 Overview

In this tutorial, we’ll cover how to perform an ANCOVA in R using the lm() function. We’ll also cover the different types of sums of squares and how they affect the results of your ANCOVA. For our ANCOVA, we’ll continue with the model we built in the x-way ANOVA tutorial. In this tutorial, we’ll be using the mtcars dataset available in R to look at the differences in miles per gallon (mpg) of cars with different types of engines (vs) and transmission (am) after adjusting for vehicle weight (wt, our covariate).

1.1 Packages (and versions) used in this document

## [1] "R version 4.0.2 (2020-06-22)"
##  Package Version
##    dplyr   1.0.2
##      car   3.0-9
##  ggplot2   3.3.2

2 ANCOVA

2.1 Getting Familiar with the Data

It’s generally a good idea to get an idea of what’s in your data frame before attempting to analyze it. We’ll use str() to check the class of our variables, and we’ll use summary() to get an overview of our data frame - such as missingness, distribution information of variables, etc.

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

From the str() output, we can see that our outcome variable (mpg) and covariate (wt) are of the numeric class, which they should be. However, we can see that vs and am are also numeric, but we want them to be factors. We’ll need to fix this.

mtcars$vs <- factor(mtcars$vs, levels = c(0, 1), labels = c("V-shaped", "straight"))
mtcars$am <- factor(mtcars$am, levels = c(0, 1), labels = c("automatic", "manual"))
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : Factor w/ 2 levels "V-shaped","straight": 1 1 2 2 1 2 1 2 2 2 ...
##  $ am  : Factor w/ 2 levels "automatic","manual": 2 2 2 1 1 1 1 1 1 1 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

That’s better. Now let’s move on to the summary() of our data.

summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec              vs             am    
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   V-shaped:18   automatic:19  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   straight:14   manual   :13  
##  Median :3.695   Median :3.325   Median :17.71                               
##  Mean   :3.597   Mean   :3.217   Mean   :17.85                               
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90                               
##  Max.   :4.930   Max.   :5.424   Max.   :22.90                               
##       gear            carb      
##  Min.   :3.000   Min.   :1.000  
##  1st Qu.:3.000   1st Qu.:2.000  
##  Median :4.000   Median :2.000  
##  Mean   :3.688   Mean   :2.812  
##  3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :8.000

From the summary() output, we can see that we don’t have equal sample sizes in each level of vs and am. We can also see some distributional information about the numeric variables.

2.1.1 Descriptive Statistics

It’ll also be helpful in reporting our results to have some idea of what our descriptive statistics are. We’ll retrieve them the same way we did in the x-Way ANOVA tutorial. We’ll use the group_by() and summarise() functions from the dplyr package.

summarise(.data = group_by(mtcars, vs, am), mean_mpg = mean(mpg), sd_mpg = sd(mpg), mean_wt = mean(wt), sd_wt = sd(wt), n = n())
## # A tibble: 4 x 7
## # Groups:   vs [2]
##   vs       am        mean_mpg sd_mpg mean_wt sd_wt     n
##   <fct>    <fct>        <dbl>  <dbl>   <dbl> <dbl> <int>
## 1 V-shaped automatic     15.0   2.77    4.10 0.768    12
## 2 V-shaped manual        19.8   4.01    2.86 0.487     6
## 3 straight automatic     20.7   2.47    3.19 0.348     7
## 4 straight manual        28.4   4.76    2.03 0.440     7

2.2 The Analysis

We’ll be picking up where the analysis left off in the x-Way ANOVA tutorial. If you’ve read through that tutorial, you can probably skip to the performing the ANCOVA section, but if you haven’t, then we need to have a discussion about the different types of sums of squares.

2.2.1 Brief Overview of SS Types

Now that we’ll have more than one predictor variable, we need to be aware of the types of sums of squares and when they should be used.

2.2.1.1 Type I SS

Type I SS considers the predictors sequentially. This isn’t particularly useful for unbalanced studies (like the one we’re examining in this tutorial), but is useful for parsimonious quadratic models. The anova() function defaults to using Type I SS.

2.2.1.2 Type II SS

Type II SS adjusts the effects of a predictor (a) for an effect (b) if, and only if, (b) does not contain (a). For example:

In an A + B + A*B model, A would be adjusted for B, B would be adjusted for A, and A*B would be adjusted for A and B (because neither contains the term A*B).

If the model contains only main effects, then type II SS and type III SS will be the same.

2.2.1.3 Type III SS

For type III SS, every effect (predictor) is adjusted for all other effects. This allows a comparison of main effects in the presence of an interaction.