1 Overview

In this tutorial, we’ll cover how to perform moderated and nonlinear regressions in R using the lm() function. If you’re not sure how to do a Simple Linear Regression or Multiple Regression, you may want to look at those tutorials first. For this tutorial, we’ll be using the mtcars dataset available in R to predict the miles per gallon (mpg) of cars based on their weight (wt) and horsepower (hp). The analyses done in this tutorial will pick up where the Multiple Regression tutorial left off.

1.1 Packages (and versions) used in this document

## [1] "R version 4.0.2 (2020-06-22)"
##  Package Version
##  ggplot2   3.3.2
##    olsrr   0.5.3

1.2 Moderated Regression

In the Multiple Regression tutorial, we created the following model to examine if mpg can be predicted by wt and hp.

wthp_mod <- lm(mpg ~ wt + hp, data = mtcars)
summary(wthp_mod)
## 
## Call:
## lm(formula = mpg ~ wt + hp, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

We saw that wt and hp were both significant predictors of mpg. That’s all well and good, but you might be wondering how you’d go about adding a potential interaction term for wt and hp. If you weren’t wondering that, hopefully you are now. We have a number of ways to add additional terms to our models. We can do it in the explicit way we did in the above model with +, or we can opt for a more elegant solution.

1.2.1 Operators for Regression Formulae

In this section, we’ll use a, b, and c to denote predictors in our hypothetical model.

1.2.1.1 +

We can use + like we saw earlier to simply add a predictor to our model. For example, lm(Y ~ a + b + c) would produce the following model: \[Y_i=\beta_0+\beta_1a_i+\beta_2b_i+\beta_3c_i+\epsilon_i \]

As we saw in our example with the mtcars data, this doesn’t add any interaction terms.

1.2.1.2 :

To manually add an interaction term, we can use : in conjunction with +. If we were to run lm(Y ~ a + b + c + a:b), we’d get the following model: \[Y_i=\beta_0+\beta_1a_i+\beta_2b_i+\beta_3c_i+\beta_4a_ib_i+\epsilon_i \]

But there are easier ways to do this.

1.2.1.3 *

To get the same model shown above, we could instead run this code: lm(Y ~ a*b + c). Using * specifies that we want to include the main effects of the predictors as well as their interaction term. If we were to run this, lm(Y ~ a*b*c), we’d get the following model, because it will include all of the terms and their interactions with each other: \[Y_i=\beta_0+\beta_1a_i+\beta_2b_i+\beta_3c_i+\beta_4a_ib_i+\beta_5a_ic_i+\beta_6b_ic_i+\beta_7a_ib_ic_i+\epsilon_i \]

1.2.1.4 ^

We could specify the above full model in another way: lm(Y ~ (a + b + c)^3). This code is saying that we want to include the predictors and their interactions up to a 3-way interaction. If we didn’t want the 3-way interaction, we could instead specify that we want only the main effects and 2-way interactions: lm(Y ~ (a + b + c)^2). This would yield: \[Y_i=\beta_0+\beta_1a_i+\beta_2b_i+\beta_3c_i+\beta_4a_ib_i+\beta_5a_ic_i+\beta_6b_ic_i+\epsilon_i\]

1.2.1.5 -

We could specify the above model in another way by using - to specify that we don’t want the 3-way interaction (remember that : specifies only an interaction term): lm(Y ~ (a + b + c)^3 - a:b:c)

1.2.2 Interaction Effects

Now that we’ve seen many ways to include additional predictors in our model, let’s get back to predicting mpg from wt and hp. In the following R code, we’ll run our full model.

full_mod <- lm(mpg ~ wt * hp, data = mtcars)
summary(full_mod)
## 
## Call:
## lm(formula = mpg ~ wt * hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0632 -1.6491 -0.7362  1.4211  4.5513 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 49.80842    3.60516  13.816 5.01e-14 ***
## wt          -8.21662    1.26971  -6.471 5.20e-07 ***
## hp          -0.12010    0.02470  -4.863 4.04e-05 ***
## wt:hp        0.02785    0.00742   3.753 0.000811 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.153 on 28 degrees of freedom
## Multiple R-squared:  0.8848, Adjusted R-squared:  0.8724 
## F-statistic: 71.66 on 3 and 28 DF,  p-value: 2.981e-13

We can see from the summary that this model includes the main effects and interaction of our two predictor variables. All of our terms also appear to be significant predictors of mpg.

We may wish to run a simple model comparison to see if the trade-off between the added complexity of our additional term and the increase in PRE (Proportional Reduction in Error; Multiple R-squared) is worth it. We can do this with the anova() function.

anova(wthp_mod, full_mod)
## Analysis of Variance Table
## 
## Model 1: mpg ~ wt + hp
## Model 2: mpg ~ wt * hp
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     29 195.05                                  
## 2     28 129.76  1    65.286 14.088 0.0008108 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From this output, we can see that the full model is a significant improvement over the model with both main effects.

1.3 Nonlinear Regression

You’ll sometimes find that for whatever (hopefully theoretical) reason, your data will have quadratic, cubic, or possibly even higher powers for predictors. Modeling these in lm() is also very straightforward. We can use the I() function mentioned earlier. Be sure to remember that if you’re going to include a higher order power in your model, you must also include all of the lower powers for the predictor.

nl_mod <- lm(mpg ~ wt + I(wt^2), data = mtcars)
summary(nl_mod)
## 
## Call:
## lm(formula = mpg ~ wt + I(wt^2), data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.483 -1.998 -0.773  1.462  6.238 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  49.9308     4.2113  11.856 1.21e-12 ***
## wt          -13.3803     2.5140  -5.322 1.04e-05 ***
## I(wt^2)       1.1711     0.3594   3.258  0.00286 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.651 on 29 degrees of freedom
## Multiple R-squared:  0.8191, Adjusted R-squared:  0.8066 
## F-statistic: 65.64 on 2 and 29 DF,  p-value: 1.715e-11

From the summary, we can see that wt and wt^2 are significant predictors of mpg.