In this tutorial, we’ll cover the basics of matrix algebra through manual calculations as well as by using functions available inside of R for working with matrices. The code presented should enable you to manipulate and calculate many different types of matrices, such as adjoined matrices, standardized score matrices, and deviation score matrices.
Additionally, we’ll use some simple example data to cover how to create correlation matrices and variance/covariance matrices. We’ll also use this data to calculate a regression through matrix algebra.
## [1] "R version 4.0.2 (2020-06-22)"
## Package Version
## psych 2.0.7
Making matrices is easy to do in R and doesn’t require any packages other than those that come with base R. We’ll start off by making two separate 2x2 matrices, A
and B
; matrix names are traditionally uppercase letters. Notice that R intuitively determines the matrix’s order based on the number of rows or columns specified - there’s no need to specify both rows and columns. Can you tell what happens when we specify byrow = TRUE
instead of the default byrow = FALSE
for matrix B
?
A <- matrix(data = 1:4, ncol = 2)
A
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
B <- matrix(data = 5:8, nrow = 2, byrow = TRUE)
B
## [,1] [,2]
## [1,] 5 6
## [2,] 7 8
Now that we know how to make matrices, we’ll make three more: a 2x3, a 3x2, and a 3x3. We’ll use all five of these matrices later to explore how matrix algebra is done in R.
C <- matrix(data = 1:6, ncol = 3)
C
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
D <- matrix(data = c(rep(x = 1:3, times = 2)), nrow = 3, byrow = TRUE)
D
## [,1] [,2]
## [1,] 1 2
## [2,] 3 1
## [3,] 2 3
E <- matrix(data = c(rep(x =1:4, times = 2), 5), ncol = 3)
E
## [,1] [,2] [,3]
## [1,] 1 4 3
## [2,] 2 1 4
## [3,] 3 2 5
As an aside, never asign anything to “T” or “F” because these are shorthand for the logical values TRUE
and FALSE
.
Adding and subtracting matrices in R is very straightforward. To perform element-wise addition and subtraction, you can simply add and subtract matrices like you would vectors or variables.
Z <- A + B
Z
## [,1] [,2]
## [1,] 6 9
## [2,] 9 12
Multiplying matrices in R isn’t as straightforward. If we try to multiply them like we would vectors or variables, we’ll perform element-wise multiplication, which probably isn’t what you’re intending to do. We can see this with matrix W
below.
W <- A * B
W
## [,1] [,2]
## [1,] 5 18
## [2,] 14 32
To perform matrix multiplication, we must use a special operator that let’s R know that we’re not intending for element-wise multiplication. When doing this, you need to ensure that your matrices conform to the multiplication rule:
The number of columns of the pre-multiplied matrix must be equal to the number of rows of the post-multiplied matrix.
V <- A %*% B
V
## [,1] [,2]
## [1,] 26 30
## [2,] 38 44
Now let’s try a bit more complex of a matrix multiplication problem. Notice how the dimensions of these matrices conform to the multiplication rule:
[2x2] * [2x3] * [3x3] * [3x2] * [2*2] = [2x2]
U <- A %*% C %*% E %*% D %*% B
U
## [,1] [,2]
## [1,] 10048 11702
## [2,] 14752 17180
To calculate column sums for matrix E
we first need to make a row vector of the appropriate length containing only ones. Remember that vector names are traditionally lowercase letters.
ir <- matrix(c(rep(x = 1, times = 3)), nrow = 1)
ir
## [,1] [,2] [,3]
## [1,] 1 1 1
We’ll now pre-multiply E
by ir
to create a vector that contains the sums of E
’s columns.
ecs <- ir %*% E
E
## [,1] [,2] [,3]
## [1,] 1 4 3
## [2,] 2 1 4
## [3,] 3 2 5
ecs
## [,1] [,2] [,3]
## [1,] 6 7 12
How might we calculate the sums of the rows?
We’ll make a column vector of 1s and do the same process as we did for the column sums.
ic <- matrix(c(rep(x = 1, times = 3)), ncol = 1)
ic
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
ers <- E %*% ic
E
## [,1] [,2] [,3]
## [1,] 1 4 3
## [2,] 2 1 4
## [3,] 3 2 5
ers
## [,1]
## [1,] 8
## [2,] 7
## [3,] 10
It’s good to know how to do things manually, but it’s not always necessary in practice to do things the long way. So now we’ll look at couple of functions in R that will make our lives easier: colSums()
and rowSums()
.
ecs2 <- colSums(E)
ecs2
## [1] 6 7 12
ers2 <- rowSums(E)
ers2
## [1] 8 7 10
Bear in mind that these functions return a numeric vector not a matrix vector. You can always check the class of an object with class()
.
class(ecs2)
## [1] "numeric"
Let’s fix this by letting R know that we want them to be matrices
ecs3 <- matrix(colSums(E), nrow = 1)
ers3 <- matrix(rowSums(E), ncol = 1)
class(ecs3)
## [1] "matrix" "array"
Similar to “regular algebra,” to get a mean, we need to divide by n. To do this in matrix algebra, we multiply by (1/n).
ecm <- (1/3)*ecs3
ecm
## [,1] [,2] [,3]
## [1,] 2 2.333333 4
erm <- (1/3)*ers3
erm
## [,1]
## [1,] 2.666667
## [2,] 2.333333
## [3,] 3.333333
Or we can use a function inside of R to get our answer a bit easier.
ecm2 <- colMeans(E)
ecm2
## [1] 2.000000 2.333333 4.000000
erm2 <- rowMeans(E)
erm2
## [1] 2.666667 2.333333 3.333333
The deviance score matrix consists of our matrix E
less the respective column means. To do this, we first need to make a matrix of column means the same order as E
. We’ll just use our column of 1s from earlier to accomplish this.
ECM <- ic %*% ecm
ECM
## [,1] [,2] [,3]
## [1,] 2 2.333333 4
## [2,] 2 2.333333 4
## [3,] 2 2.333333 4
Now we can subtract this new matrix of means, ECM
, from the original matrix, E
. One way to check that this is correct is to take the sum of each column, which should be equal to 0.
devE <- E - ECM
devE
## [,1] [,2] [,3]
## [1,] -1 1.6666667 -1
## [2,] 0 -1.3333333 0
## [3,] 1 -0.3333333 1
Before we get into the calculation, we should note that a determinant is a scalar (single-value matrix) that has a number of various purproses, but the most meaningful for us is that it helps in calculating the inverse of a matrix.
Let’s first make two example matrices. Note that A
is a 2x2 and B
is a 3x3. Both of these are square matrices, and only square matrices have determinants.
A <- matrix(data = c("a", "b", "c", "d"), nrow = 2, byrow = TRUE)
A
## [,1] [,2]
## [1,] "a" "b"
## [2,] "c" "d"
B <- matrix(data = c("a", "b", "c", "d", "e", "f", "g", "h", "i"), nrow = 3, byrow = TRUE)
B
## [,1] [,2] [,3]
## [1,] "a" "b" "c"
## [2,] "d" "e" "f"
## [3,] "g" "h" "i"
To calculate a determinant, we’re going to first calculate all of the left to right diagonal products and then subtract the right to left diagonal products from that. Calculating the determinant of a 2x2 is fairly straightforward. For matrix A
, the determinant would be: (a*d)-(b*c)
.
Unfortunately, it gets a bit more complicated the larger your matrices get. For B
, the determinant is: (a*e*i)+(b*f*g)+(c*d*h)-(c*e*g)-(b*d*i)-(a*f*h)
.
Now for an actual numeric example, we’ll create two matrices of the same dimensions as above.
D <- matrix(data = 1:4, nrow = 2, byrow = TRUE)
D
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
E <- matrix(data = (c((1:8), 10)), nrow = 3, byrow = TRUE)
E
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 10
Thinking back to what we just learned, the determinant for matrix D
should equal:
d <- (D[1,1]*D[2,2]) - (D[1,2]*D[2,1])
d
## [1] -2
Notice that this code is just calling on individual elements of matrix D
. I’m sure you can see how this could quickly get out of hand for larger matrices.
e <- (E[1,1]*E[2,2]*E[3,3]) + (E[1,2]*E[2,3]*E[3,1]) + (E[1,3]*E[2,1]*E[3,2]) -
(E[1,3]*E[2,2]*E[3,1]) - (E[1,2]*E[2,1]*E[3,3]) - (E[1,1]*E[2,3]*E[3,2])
e
## [1] -3
Thankfully, R makes this a bit easier for us with the det()
function.
d2 <- det(x = D)
d2
## [1] -2
e2 <- det(x = E)
e2
## [1] -3
Simply put, this is a matrix that will yield the determinant when multiplied by a different matrix.
Using our matrix A
, we can demonstrate how this works. Notice that the primary diagonal is flipped, and the off diagonal became negative.
adjA <- matrix(data = c("d", "-b", "-c", "a"), nrow = 2, byrow = TRUE)
adjA
## [,1] [,2]
## [1,] "d" "-b"
## [2,] "-c" "a"
Let’s plug in the corresponding numbers from matrix D
.
adjD <- matrix(data = c(4, -2, -3, 1), nrow = 2, byrow = TRUE)
adjD
## [,1] [,2]
## [1,] 4 -2
## [2,] -3 1
Multiplying our original D
by adjD
should yield a 2x2 matrix with our determinant on the diagonal.
D %*% adjD
## [,1] [,2]
## [1,] -2 0
## [2,] 0 -2
det(x = D)
## [1] -2
Now, if we take what we did in the previous step. We can get our inverse. We’re going to divide our adjugate matrix by the determinant.
Di <- (1/d2) * adjD
Di
## [,1] [,2]
## [1,] -2.0 1.0
## [2,] 1.5 -0.5
It’s good to understand how all of this works together, but R can calculate an inverse for you (if one exists).
Di2 <- solve(D)
Di2
## [,1] [,2]
## [1,] -2.0 1.0
## [2,] 1.5 -0.5
Just to double check that everything worked correctly, we should be able to multiply our inverse by our original matrix and get the identity matrix (a matrix with 1s on the diagonal and 0s elsewhere) back.
I <- Di2 %*% D
I
## [,1] [,2]
## [1,] 1 4.440892e-16
## [2,] 0 1.000000e+00
This worked aside from the small rounding errors.
t()
You will often need to transpose your matrices/vectors. You can do this with t()
. I named this “Ep” (for E-prime) in place of E’ because you can’t use that notation in R.
Ep <- t(x = E)
Ep
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 10
diag()
The diag()
function has multiple uses. First, we can use it to create a diagonal matrix with a value(x) on the diagonal. X can be a matrix, vector, or scalar.
S <- diag(x = 5, nrow = 3, ncol = 3)
S
## [,1] [,2] [,3]
## [1,] 5 0 0
## [2,] 0 5 0
## [3,] 0 0 5
Stemming from the first point, we can use diag()
to create an identity matrix.
I <- diag(x = 1, nrow = 3, ncol = 3)
I
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
Finally, we can use it to create a vector of the diagonal values in a matrix. As earlier, this returns a numeric vector. You can add matrix()
to change this.
s <- diag(x = S)
s
## [1] 5 5 5
solve()
Another handy function is solve()
, this allows you to calculate an inverse.
Ei <- solve(E)
Ei
## [,1] [,2] [,3]
## [1,] -0.6666667 -1.333333 1
## [2,] -0.6666667 3.666667 -2
## [3,] 1.0000000 -2.000000 1
eigen()
Finally, we’ll talk about the eigen()
function. This function will return a list that contains the eigenvalues as well as the eigenvectors of your specified matrix. To see the eigen values/vectors, you can subset them out of the list.
Eeigen <- eigen(x = E)
Eeigen$values
## [1] 16.7074933 -0.9057402 0.1982469
Eeigen$vectors
## [,1] [,2] [,3]
## [1,] -0.2235134 -0.8658458 0.2782965
## [2,] -0.5039456 0.0856512 -0.8318468
## [3,] -0.8343144 0.4929249 0.4801895
For this illustration, we’ll use the lunch data that is discussed by Wood (2019). This data represents ratings of lunch items over the course of 5 school days - higher numbers indicate higher satisfaction with the item.
To start things off we need to create our data matrix. We’ll also go ahead and create a column of 1s.
sandwich <- c(1:5)
beverage <- c(4, 7, 5, 8, 6)
dessert <- c(4, 2, 6, 10, 8)
L <- cbind(sandwich, beverage, dessert)
L
## sandwich beverage dessert
## [1,] 1 4 4
## [2,] 2 7 2
## [3,] 3 5 6
## [4,] 4 8 10
## [5,] 5 6 8
ic <- matrix(c(rep(x = 1, times = 5)), ncol = 1)
ic
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
## [4,] 1
## [5,] 1
First, we need to create our deviation score matrix like we did earlier. Notice that in this one line of code, we’re calculating the column means, making a matrix of the appropriate size, and then subtracting that matrix from the original matrix.
X <- L - (ic %*% matrix(colMeans(L), nrow = 1))
X
## sandwich beverage dessert
## [1,] -2 -2 -2
## [2,] -1 1 -4
## [3,] 0 -1 0
## [4,] 1 2 4
## [5,] 2 0 2
To calculate the variance/covariance matrix, we’re going to take our deviation matrix (X
), pre-multiply its transpose, and divide by n-1: [1/(n-1)]*X'X
C <- 1/4*(t(X) %*% X)
C
## sandwich beverage dessert
## sandwich 2.50 1.25 4
## beverage 1.25 2.50 2
## dessert 4.00 2.00 10
To calculate the correlation matrix, we first need to make a diagonal matrix to allow us to divide our deviation score matrix by the respective variable standard deviations.
S <- diag(x = c(1/sd(sandwich), 1/sd(beverage), 1/sd(dessert)), nrow = 3, ncol = 3)
S
## [,1] [,2] [,3]
## [1,] 0.6324555 0.0000000 0.0000000
## [2,] 0.0000000 0.6324555 0.0000000
## [3,] 0.0000000 0.0000000 0.3162278
Next, we need to multiply our deviation matrix by this new matrix, S
.
Z <- X %*% S
Z
## [,1] [,2] [,3]
## [1,] -1.2649111 -1.2649111 -0.6324555
## [2,] -0.6324555 0.6324555 -1.2649111
## [3,] 0.0000000 -0.6324555 0.0000000
## [4,] 0.6324555 1.2649111 1.2649111
## [5,] 1.2649111 0.0000000 0.6324555
We’ll then premultiply this new matrix, Z
, by its transpose and [1/(n-1)]
.
R <- (1/4)*(t(Z) %*% Z)
R
## [,1] [,2] [,3]
## [1,] 1.0 0.5 0.8
## [2,] 0.5 1.0 0.4
## [3,] 0.8 0.4 1.0
Sometimes you’ll need to calculate the SMC. This can be done by hand easily. We’ll use our previous correlation matrix, R
.
SMC <- 1 - 1/diag(solve(R))
SMC
## [1] 0.6785714 0.2500000 0.6400000
Now that we’ve had our fun, we can verify our matrix skills.
Compare your variance covariance matrix, C
, with the result of either var()
or cov()
- both should return the variance/covariance matrix.
C2 <- var(x = X)
C3 <- cov(x = X)
all.equal(C, C2)
## [1] TRUE
all.equal(C, C3)
## [1] TRUE
Similarly, we can check our correlation matrix with cor()
.
R2 <- cor(x = Z, method = "pearson")
all.equal(R, R2)
## [1] TRUE
There is a function, smc()
, in the psych package that will do this for you.
SMC2 <- smc(R)
SMC2
## V1 V2 V3
## 0.6785714 0.2500000 0.6400000
SMC
## [1] 0.6785714 0.2500000 0.6400000
To calculate a regression in matrix algebra, you will get very familiar with the following equation:
\((X'X)^{-1} * X'Y\) (read as: x-prime x inverse x-prime y)
For this example, we’ll use the lunch data as before. First, we’ll set up our data for our friend lm()
- we’re making a copy of the same data in matrix L
just in a data frame format.
dat <- as.data.frame(cbind(sandwich, beverage, dessert))
dat
## sandwich beverage dessert
## 1 1 4 4
## 2 2 7 2
## 3 3 5 6
## 4 4 8 10
## 5 5 6 8
Now, we’ll set it up for matrix algebra. Here we’re creating an X
matrix that includes the predictor variables, sandwich and beverage, and a Y
matrix that includes the outcome variable, dessert. Don’t forget that we also need a column of 1s in X
.
X <- cbind(c(rep(x = 1, times = 5)), sandwich, beverage)
X
## sandwich beverage
## [1,] 1 1 4
## [2,] 1 2 7
## [3,] 1 3 5
## [4,] 1 4 8
## [5,] 1 5 6
Y <- matrix(data = dessert, ncol = 1)
Y
## [,1]
## [1,] 4
## [2,] 2
## [3,] 6
## [4,] 10
## [5,] 8
Now let’s put everything together and run our regression in Matrix form. Remember: \((X'X)^{-1} * X'Y\).
Note: Orders equal [3x5] * [5x3] * [3x5] * [5x1]
What should the order of b equal?
b <- solve((t(X) %*% X)) %*% t(X) %*% Y
b
## [,1]
## 1.200000e+00
## sandwich 1.600000e+00
## beverage -1.110223e-15
Now we’ll run the same model (sandwich and beverage predicting dessert) in lm()
.
m1 <- lm(dessert ~ sandwich + beverage, data = dat)
summary(m1)
##
## Call:
## lm(formula = dessert ~ sandwich + beverage, data = dat)
##
## Residuals:
## 1 2 3 4 5
## 1.200e+00 -2.400e+00 -6.106e-16 2.400e+00 -1.200e+00
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.200e+00 5.231e+00 0.229 0.840
## sandwich 1.600e+00 9.798e-01 1.633 0.244
## beverage -1.622e-16 9.798e-01 0.000 1.000
##
## Residual standard error: 2.683 on 2 degrees of freedom
## Multiple R-squared: 0.64, Adjusted R-squared: 0.28
## F-statistic: 1.778 on 2 and 2 DF, p-value: 0.36
Take a moment and compare the results. What do you notice?
As you can see, other than rounding errors, we get the exact same regression coefficients.