Eigenvalues and Eigenvectors, Matrix Calculus

Eigenvalues

Definition (4.3.1)

For any square matrix \(\pmb{A}\) a scalar \(\lambda\) and a vector \(\pmb{x}\) can be found such that

\[\pmb{A}\pmb{x}=\lambda \pmb{x}\]

\(\lambda\) is called an eigenvalue and \(\pmb{x}\) its eigenvector.


Note if \(\pmb{A}\pmb{x}=\lambda \pmb{x}\) we have \(\pmb{A}\pmb{x} -\lambda \pmb{x} = 0\) or \(\left(\pmb{A} -\lambda \pmb{I}\right)\pmb{x} = 0\). Therefore \(\pmb{A} -\lambda \pmb{I}\) is a singular matrix and \(\vert \pmb{A} -\lambda \pmb{I}\vert=0\), which is called the characteristic equation.

Example (4.3.2)

\[ \begin{aligned} &\pmb{A} = \begin{pmatrix} 1 & 3 \\ -1 & 5 \end{pmatrix} \\ &\vert \pmb{A} -\lambda \pmb{I}\vert = \begin{vmatrix} 1-\lambda & 3 \\ -1 & 5-\lambda \end{vmatrix} =\\ &(1-\lambda)(5-\lambda)+3 = 0\\ &\lambda^2-6\lambda+8=(\lambda-2)(\lambda-4)=0 \end{aligned} \]

so we have eigenvalues \(\lambda_1=2\) and \(\lambda_2=4\). Now

\[ \begin{aligned} &\pmb{A} -\lambda_1 \pmb{I} =0 \\ &\begin{pmatrix} 1-2 & 3 \\ -1 & 5-2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}= \begin{pmatrix} 0 \\ 0 \end{pmatrix} \\ &-x_1+3x_2=0 \\ \end{aligned} \\ \]

set \(x_1=1\), then \(x_2=1/3\). Also

\[\sqrt{x_1^2+x_2^2} = \sqrt{1^2+(1/3)^2} = \sqrt{10}/3\]

so the normalized eigenvector is \((1\text{ }1/3)'/(\sqrt{10}/3) = (3\text{ }1)'/\sqrt{10}\).

And

\[ \begin{aligned} & \pmb{A} -\lambda_2 \pmb{I}=0 \\ &\begin{pmatrix} 1-4 & 3 \\ -1 & 5-4 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix} \\ &-3x_1+3x_2=0 \end{aligned} \\ \]

Again setting \(x_1=1\), then \(x_2=1\). Also

\[\sqrt{x_1^2+x_2^2} = \sqrt{1^2+1^2}=\sqrt{2}\]

so the normalized eigenvector is \((1\text{ }1)'/\sqrt{2}\).

Let’s check:

A=rbind(c(1, 3), c(-1, 5))
A
##      [,1] [,2]
## [1,]    1    3
## [2,]   -1    5
x=cbind(c(3, 1)/sqrt(10))
cbind(A%*%x, 2*x)
##           [,1]      [,2]
## [1,] 1.8973666 1.8973666
## [2,] 0.6324555 0.6324555
x=cbind(c(1, 1)/sqrt(2))
cbind(A%*%x, 4*x)
##          [,1]     [,2]
## [1,] 2.828427 2.828427
## [2,] 2.828427 2.828427

with R we can find eigenvalues and eigenvectors with

eigen(A)
## eigen() decomposition
## $values
## [1] 4 2
## 
## $vectors
##            [,1]       [,2]
## [1,] -0.7071068 -0.9486833
## [2,] -0.7071068 -0.3162278

Theorem (4.3.3)

Let \(\pmb{A}\) be a matrix with eigenvalue and eigenvector \(\lambda, \pmb{x}\). Let c,k be scalars with \(c\ne0,k\ne0\), Then

  1. \(c\lambda\) is an eigenvalue of \(c\pmb{A}\)
  2. \(c\lambda+k\) is an eigenvalue \(c\pmb{A}+k\pmb{I}\)
  3. \(\lambda^n\) is an eigenvalue of \(\pmb{A}^n\)
  4. \(1/\lambda\) is an eigenvalue of \(\pmb{A}^{-1}\) (if \(\pmb{A}^{-1}\) exists)

proof

  1. \[c\pmb{Ax}=c\lambda\pmb{x}\]
  2. \[c(\pmb{A}+k\pmb{I})\pmb{x}=c(\pmb{Ax}+k\pmb{Ix})=c\lambda\pmb{x}+k\pmb{x}=(c\lambda+k)\pmb{x}\]
  3. \[\pmb{A}^2\pmb{x}=\pmb{A}(\pmb{A}\pmb{x}) = \pmb{A}(\lambda\pmb{x}) = \lambda(\pmb{A}\pmb{x}) = \lambda^2\pmb{x}\]

The statement with n follows by repeating this calculation n times.

\[ \begin{aligned} &\pmb{x} = \pmb{A}^{-1}\pmb{A}\pmb{x} =\pmb{A}^{-1}\lambda\pmb{x} \\ &\pmb{A}^{-1}\pmb{x} =\frac1{\lambda} \pmb{x} \end{aligned} \]

Comment

Note that if \(\lambda, \mu\) are eigenvalues of \(\pmb{A}\) and \(\pmb{B}\) respectively, in general \(\lambda+ \mu\) is NOT an eigenvalue of \(\pmb{A+B}\).

Theorem (4.3.4)

Let \(\pmb{A}\) be a matrix with eigenvalue \(\lambda\) and eigenvector \(\pmb{x}\). Let \(p(x)=\sum_{k=0}^n a_k x^k\) be a polynomial. Then

\[p(\pmb{A})\pmb{x}=p(\lambda)\pmb{x}\]

proof

\[ \begin{aligned} &p(\pmb{A})\pmb{x} =\left( \sum_{k=0}^n a_k \pmb{A}^k \right)\pmb{x} =\\ &\sum_{k=0}^n \left( a_k \pmb{A}^k \pmb{x} \right) = \\ &\sum_{k=0}^n a_k \left( \pmb{A}^k \pmb{x} \right) = \\ &\sum_{k=0}^n a_k \left( \lambda^k \pmb{x} \right) = \\ &\left( \sum_{k=0}^n a_k \lambda^k \right)\pmb{x} = \\ &p(\lambda)\pmb{x} \end{aligned} \]

If the respective series are convergent this can sometimes be extended to infinite series:

Example (4.3.5)

say \(\lambda\) is an eigenvalue of \(\pmb{A}\), then \(1-\lambda\) is an eigenvalue of \(\pmb{I}-\pmb{A}\) by (4.3.3). If \(\pmb{I}-\pmb{A}\) is nonsingular \(\frac1{1-\lambda}\) is an eigenvalue of \((\pmb{I-A})^{-1}\), also by (4.3.3). If \(\vert \lambda\vert<1\), then

\[\frac1{1-\lambda} =\sum_{k=0}^\infty \lambda^k\]

and so

\[(\pmb{I-A})^{-1} =\sum_{k=0}^\infty \pmb{A}^k\]

Theorem (4.3.6)

  1. The eigenvalues of \(\pmb{AB}\) are the same as the eigenvalues of \(\pmb{BA}\)

  2. If P is any nonsingular matrix, then \(\pmb{A}\) and \(\pmb{P^{-1}AP}\) have the same eigenvalues.

  3. If C is any orthogonal matrix, then \(\pmb{A}\) and \(\pmb{C'AC}\) have the same eigenvalues.

proof

  1. say \(\lambda\) is an eigenvalue of \(\pmb{AB}\), then \((\pmb{AB})\pmb{x} =\lambda\pmb{x}\). But then

\[\pmb{B}\pmb{A}(\pmb{B}\pmb{x})=\pmb{B}(\pmb{AB})\pmb{x} =\pmb{B}(\lambda\pmb{x}) = \lambda(\pmb{B}\pmb{x}) \]

so \(\lambda\) is an eigenvalue of \(\pmb{BA}\) with eigenvector \(\pmb{Bx}\).

  1. by i. \(\pmb{P^{-1}AP}\) has the same eigenvalues as \(\pmb{P^{-1}PA}= \pmb{A}\)

  2. same as ii.

Example (4.3.6a)

Let’s consider again the orthonormal matrix

\[ \pmb{C} = \begin{pmatrix} \cos x & \sin x \\ -\sin x & \cos x \end{pmatrix} \]

and let \(\pmb{A} =\begin{pmatrix} 1 & 2 \\ -1 & 3\end{pmatrix}\), \(\pmb{B} =\begin{pmatrix} 0 & -2 \\ 1 & 5\end{pmatrix}\), then

Cx=function(x) matrix(c(cos(x), -sin(x), sin(x), cos(x)), 2, 2)
A=matrix(c(1,-1,2,3), 2, 2)
B=matrix(c(0,1,-2,5), 2, 2)
eigen(A%*%B)$value
## [1] 18.4582364  0.5417636
eigen(B%*%A)$value
## [1] 18.4582364  0.5417636
eigen(A)$value
## [1] 2+1i 2-1i
eigen(t(Cx(1))%*%A%*%Cx(1))$value
## [1] 2+1i 2-1i
eigen(t(Cx(-0.5))%*%A%*%Cx(-0.5))$value
## [1] 2+1i 2-1i

Symmetric Matrices

Theorem (4.3.7)

Let \(\pmb{A}\) be an \(n\times n\) symmetric matrix. Then

  1. the eigenvalues of \(\pmb{A}\) are real.
  2. the eigenvectors corresponding to distinct eigenvalues are mutually orthogonal.

proof

  1. Let \(\lambda\) be an eigenvalue of \(\pmb{A}\) with eigenvector \(\pmb{x}\), then

\[ \begin{aligned} &\lambda\pmb{\bar{x}'x} = \\ &\pmb{\bar{x}'}(\lambda\pmb{x}) = \\ &\pmb{\bar{x}'Ax} = \\ &(\pmb{A'\bar{x}})'\pmb{x} = \\ &(\pmb{A\bar{x}})'\pmb{x} = \\ &(\bar{\lambda}\pmb{\bar{x}})'\pmb{x} = \\ &\bar{\lambda}\pmb{\bar{x}}'\pmb{x} \end{aligned} \] and so \(\lambda=\bar{\lambda}\), or \(\lambda\) is real.

  1. omitted

Example (4.3.7a)

A=matrix(c(1, -1, 3, -1, 0, 5, 3, 5, 1), 3, 3)
A
##      [,1] [,2] [,3]
## [1,]    1   -1    3
## [2,]   -1    0    5
## [3,]    3    5    1
eigen(A)$value
## [1]  6.078042  1.617631 -5.695672
t(eigen(A)$vector[, 1])%*%eigen(A)$vector[, 2]
##               [,1]
## [1,] -4.597017e-17
t(eigen(A)$vector[, 1])%*%eigen(A)$vector[, 3]
##               [,1]
## [1,] -1.110223e-16
t(eigen(A)$vector[, 2])%*%eigen(A)$vector[, 3]
##               [,1]
## [1,] -1.569925e-16

Theorem (4.3.8)

Spectral Decomposition

Let \(\pmb{A}\) be an \(n\times n\) symmetric matrix. Let \(\lambda_1,..,\lambda_n\) and \(\pmb{x}_1,..,\pmb{x}_n\) be its eigenvalues and eigenvectors. Let \(\pmb{D}=diag(\lambda_1,..,\lambda_n)\) and \(\pmb{C} = (\pmb{x}_1,..,\pmb{x}_n)\). Then

\[\pmb{A} = \pmb{C}\pmb{D}\pmb{C}' = \sum_{i=1}^n \lambda_i \pmb{x}_i\pmb{x}'_i\]

and

\[\pmb{C}'\pmb{A}\pmb{C}=\pmb{D}\]

proof

by its definition \(\pmb{C}\) is orthogonal. Therefore \(\pmb{I}=\pmb{C}\pmb{C}'\) and so \(\pmb{A}=\pmb{A}\pmb{C}\pmb{C}'\) and

\[ \begin{aligned} &\pmb{A} = \\ &\pmb{A}(\pmb{x}_1,..,\pmb{x}_n)\pmb{C}' = \\ &(\pmb{A}\pmb{x}_1,..,\pmb{A}\pmb{x}_n)\pmb{C}' = \\ &(\lambda_1\pmb{x}_1,..,\lambda_n\pmb{x}_n)\pmb{C}' = \\ &\pmb{C}\pmb{D}\pmb{C}' \end{aligned} \]

Theorem (4.3.9)

Spectral Decomposition II

Let \(\pmb{A}\) be an \(n\times n\) nonsingular matrix. Let \(\lambda_1,..,\lambda_n\) and \(\pmb{x}_1,..,\pmb{x}_n\) be its eigenvalues and eigenvectors. Let \(\pmb{D}=diag(\lambda_1,..,\lambda_n)\) and \(\pmb{P} = (\pmb{x}_1,..,\pmb{x}_n)\). Then

\[\pmb{A} = \pmb{P}\pmb{D}\pmb{P}^{-1}\]


The spectral decomposition can be used to define functions of matrices: say \(\pmb{A} = \pmb{P}\pmb{D}\pmb{P}^{-1}\), then

\[ \begin{aligned} &\pmb{A}^2 = \pmb{A}\pmb{A} = \\ &\pmb{P}\pmb{D}\pmb{P}^{-1}\pmb{P}\pmb{D}\pmb{P}^{-1} = \\ &\pmb{P}\pmb{D}\pmb{D}\pmb{P}^{-1} = \\ &\pmb{P}\pmb{D}^2\pmb{P}^{-1} \\ \end{aligned} \]

and the square of a diagonal matrix is easy to find. This immediately generalizes to

\[\pmb{A}^n = \pmb{P}\pmb{D}^n\pmb{P}^{-1}\]

Let’s say we have a function and we know its power series expansion \(f(x)=\sum_{i=0}^\infty a_i x^i\), then

\[ \begin{aligned} &f(\pmb{A}) = \sum_{i=0}^\infty a_i \pmb{A}^i = \\ &\sum_{i=0}^\infty a_i \pmb{P}\pmb{D}^i\pmb{P}^{-1} = \\ &\pmb{P}\left[ \sum_{i=0}^\infty a_i \pmb{D}^i \right]\pmb{P}^{-1} \end{aligned} \]

Example (4.3.10)

\[ \pmb{A} = \begin{pmatrix} 1 & 2 \\ 0 & 2 \end{pmatrix} \]

and we want to find \(\pmb{B}=\exp (\pmb{A})\).

Let’s find its eigenvalues and eigenvectors with R:

A=rbind( c(1, 2), c(0, 2) )
EA=eigen(A)
EA
## eigen() decomposition
## $values
## [1] 2 1
## 
## $vectors
##           [,1] [,2]
## [1,] 0.8944272    1
## [2,] 0.4472136    0

so \(\pmb{A}\) has eigenvalues 2 and 1 and eigenvectors \((0.894, 0.447)'\), \((1,0)'\). Recall that

\[\exp(x) = \sum_{i=0}^\infty \frac{x^i}{i!}\]

and so

\[\pmb{B}=\exp (\pmb{A}) = \pmb{P}\left[ \sum_{i=0}^\infty \pmb{D}^i/i! \right]\pmb{P}^{-1}\]

now

\[ \begin{aligned} &\sum_{i=0}^\infty \pmb{D}^i/i! = \\ &\sum_{i=0}^\infty \frac1{i!} \begin{pmatrix} 2 & 0 \\ 0 & 1 \end{pmatrix}^i = \\ &\begin{pmatrix} \sum_{i=0}^\infty \frac{2^i}{i!} & 0 \\ 0 & \sum_{i=0}^\infty \frac{1^i}{i!} \end{pmatrix} = \\ &\begin{pmatrix} e^2 & 0 \\ 0 & e^1 \end{pmatrix} \end{aligned} \]

and so

P=EA$vectors
D1=diag(c(exp(2), exp(1)))
P%*%D1%*%solve(P)
##          [,1]     [,2]
## [1,] 2.718282 9.341549
## [2,] 0.000000 7.389056

Theorem (4.3.11)

Let \(\pmb{A}\) be an \(n\times n\) matrix with eigenvalues \(\lambda_1,..,\lambda_n\). Then

  1. \(\vert \pmb{A} \vert =\prod_{i=1}^n \lambda_i\)
  2. tr\((\pmb{A}) =\sum_{i=1}^n \lambda_i\)

proof

If \(\pmb{A}\) is a symmetric matrix we have

\[\vert\pmb{A}\vert = \vert\pmb{CDC'}\vert = \vert\pmb{C}\vert\vert\pmb{D}\vert\vert\pmb{C}'\vert=\vert\pmb{D}\vert=\prod_{i=1}^n \lambda_i\]

The general case can be found in any linear algebra textbook.

Example (4.3.12)

Say

\[ \pmb{A} = \begin{pmatrix} 1 & 3 \\ -1 & 5 \end{pmatrix} \]

we have previously found the eigenvalues to be 2 and 4. Now

\[det(\pmb{A}) = 1*5-(3*(-1)=8 = 2*4\]

and

\[tr(\pmb{A}) = 1+5 = 6 = 2+4\]

Positive Definite Matrices

Theorem (4.3.13)

Let \(\pmb{A}\) be an \(n\times n\) matrix with eigenvalues \(\lambda_1,..,\lambda_n\). Then

  1. if \(\pmb{A}\) is positive definite then \(\lambda_i>0\)
  2. if \(\pmb{A}\) is positive semi-definite then \(\lambda_i\ge0\). The number of eigenvalues > 0 is the rank of \(\pmb{A}\).

proof

\[ \begin{aligned} &\pmb{A}\pmb{x} =\lambda_i\pmb{x} \\ &\pmb{x}'\pmb{A}\pmb{x} =\pmb{x}'\lambda_i\pmb{x} = \lambda_i\pmb{x}'\pmb{x}\\ &\lambda_i = \frac{\pmb{x}'\pmb{A}\pmb{x}}{\pmb{x}'\pmb{x}} >0\\ \end{aligned} \]

proof of ii omitted

Say \(\pmb{A}\) is positive-definite. Let \(\pmb{A}=\pmb{C}\pmb{D}\pmb{C}'\) be the spectral decomposition of \(\pmb{A}\). Define \(\pmb{D}^{1/2} = diag(\sqrt{\lambda_1},..,\sqrt{\lambda_1})\) (which we can do because all eigenvalues are positive) and define \(\pmb{A}^{1/2}=\pmb{C}\pmb{D}^{1/2}\pmb{C}'\). Now

\[ \begin{aligned} &\pmb{A}^{1/2}\pmb{A}^{1/2} = \\ &\pmb{C}\pmb{D}^{1/2}\pmb{C}'\pmb{C}\pmb{D}^{1/2}\pmb{C}' = \\ &\pmb{C}\pmb{D}^{1/2}\pmb{D}^{1/2}\pmb{C}' = \\ &\pmb{C}\pmb{D}\pmb{C}'=\pmb{A} \end{aligned} \]

and so \(\pmb{A}^{1/2}\) is the square root of \(\pmb{A}\)!

Example (4.3.13a)

Say

\[ \pmb{A} = \begin{pmatrix} 2 & -1 & 0\\ -1 & 2 &-1\\ 0 & -1 & 2 \end{pmatrix} \]

A=matrix(c(2,-1, 0, -1, 2, -1, 0, -1, 2), 3, 3)
A
##      [,1] [,2] [,3]
## [1,]    2   -1    0
## [2,]   -1    2   -1
## [3,]    0   -1    2
lambda=eigen(A)$value
C=eigen(A)$vector
sqA = C%*%diag(sqrt(lambda))%*%t(C)
round(sqA, 5)
##          [,1]     [,2]     [,3]
## [1,]  1.36039 -0.38268 -0.05383
## [2,] -0.38268  1.30656 -0.38268
## [3,] -0.05383 -0.38268  1.36039
round(sqA%*%sqA, 5)
##      [,1] [,2] [,3]
## [1,]    2   -1    0
## [2,]   -1    2   -1
## [3,]    0   -1    2

Idempotent Matrices

Definition (4.3.14)

A square matrix \(\pmb{A}\) is called idempotent if \(\pmb{A}^2=\pmb{A}\)

Theorem (4.3.15)

The only nonsingular idempotent matrix is the identity.

proof

Say \(\pmb{A}\) is idempotent and nonsingular, so \(\pmb{A}^2=\pmb{A}\) and \(\pmb{A}^{-1}\) exists. Therefore

\[\pmb{A}=(\pmb{A}^{-1}\pmb{A})\pmb{A} =\pmb{A}^{-1}\pmb{A}^2=\pmb{A}^{-1}\pmb{A} = \pmb{I}\]

Theorem (4.3.16)

If \(\pmb{A}\) is symmetric and idempotent then \(\pmb{A}\) is positive semidefinite.

proof

\[\pmb{A} =\pmb{A}^2=\pmb{A}\pmb{A}=\pmb{A}'\pmb{A}\] \[\pmb{x'Ax} =(\pmb{Ax})'(\pmb{Ax})\ge 0\]

Theorem (4.3.17)

If \(\pmb{A}\) is symmetric and idempotent of rank r, then \(\pmb{A}\) has r eigenvalues equal to 1 and n-r eigenvalues equal to 0.

proof

\[ \begin{aligned} &\pmb{A}\pmb{x} =\lambda\pmb{x} \\ &\pmb{A}^2\pmb{x} =\lambda^2\pmb{x} \\ &\pmb{A}^2\pmb{x} =\pmb{A}\pmb{x} = \lambda\pmb{x} \\ &\lambda\pmb{x}=\lambda^2\pmb{x} \\ &(\lambda-\lambda^2)\pmb{x}=\lambda(1-\lambda)\pmb{x}=\pmb{0} \\ \end{aligned} \]

and so all eigenvalues are either 0 or 1.

\(\pmb{A}\) is positive semidefinite and therefore the rank of \(\pmb{A}\) is equal to the number of positive eigenvalues, that is the number of eigenvalues equal to 1.

Corollary (4.3.18)

If \(\pmb{A}\) is symmetric and idempotent of rank r, then \(tr(\pmb{A})=r\).

Theorem (4.3.19)

If \(\pmb{A}\) is \(n\times n\) and idempotent, \(\pmb{P}\) is nonsingular, and \(\pmb{C}\) is orthogonal we have

  1. \(\pmb{I}-\pmb{A}\) is idempotent
  2. \(\pmb{A}(\pmb{I}-\pmb{A})=(\pmb{I}-\pmb{A})\pmb{A}=\pmb{O}\)
  3. \(\pmb{P}^{-1}\pmb{A}\pmb{P}\) is idempotent
  4. \(\pmb{C}'\pmb{A}\pmb{C}\) is idempotent

proof omitted

Example (4.3.19a)

Let

\[ \pmb{A} = \begin{pmatrix} 2 & -2 & -4 \\ -1 & 3 & 4\\ 1 & -2 & -3\\ \end{pmatrix} \]

A=matrix(c(2, -1, 1,-2,3,-2,-4,4,-3), 3, 3)
A%*%A
##      [,1] [,2] [,3]
## [1,]    2   -2   -4
## [2,]   -1    3    4
## [3,]    1   -2   -3
round(eigen(A)$value, 6)
## [1] 1 1 0
library(Matrix)
c(rankMatrix(A))
## [1] 2
sum(diag(A))
## [1] 2
diag(3)-A
##      [,1] [,2] [,3]
## [1,]   -1    2    4
## [2,]    1   -2   -4
## [3,]   -1    2    4
(diag(3)-A)%*%(diag(3)-A)
##      [,1] [,2] [,3]
## [1,]   -1    2    4
## [2,]    1   -2   -4
## [3,]   -1    2    4
A%*%(diag(3)-A)
##      [,1] [,2] [,3]
## [1,]    0    0    0
## [2,]    0    0    0
## [3,]    0    0    0

Vector and Matrix Calculus

There are a number of ways to extend the ideas of calculus to matrices. We will discuss two:

Let \(u=f(\pmb{x} )\) be a function of variables \(\pmb{x} = (x_1,..,x_p)'\) and let \(\partial u/\partial x_i\) be the partial derivatives. We define the vector \(\partial u/\partial \pmb{x}\) as

\[ \frac{\partial u}{\partial \pmb{x}} = \begin{pmatrix} \partial u/\partial x_1 \\ \partial u/\partial x_2 \\ \vdots \\ \partial u/\partial x_p \end{pmatrix} \]

Theorem (4.3.20)

Let \(u=\pmb{a}'\pmb{x}\) where \(\pmb{a} = (a_1,..,a_p)'\). Then

\[\frac{\partial u}{\partial \pmb{x}} = \frac{\partial \pmb{a}'\pmb{x}}{\partial \pmb{x}} = \frac{\partial \pmb{x}'\pmb{a}}{\partial \pmb{x}} = \pmb{a}\]

proof obvious

Theorem (4.3.21)

Let \(u=\pmb{x}'\pmb{A}\pmb{x}\) where \(\pmb{A}\) is a symmetric matrix. Then

\[\frac{\partial u}{\partial \pmb{x}} = \frac{\partial \pmb{x}'\pmb{A}\pmb{x}}{\partial \pmb{x}} = 2\pmb{A}\pmb{x}\]

proof

\[ \begin{aligned} &(\pmb{A}\pmb{x})_k =\sum_{i=1}^p a_{ki}x_i \\ &\pmb{x}'\pmb{A}\pmb{x} =\sum_{j=1}^p x_j \left( \sum_{i=1}^p a_{ji}x_i \right) = \sum_{i;j=1}^p a_{ji}x_ix_j = \\ &\sum_{i=1}^p a_{ii}x_i^2 + \sum_{i\ne j}^p a_{ji}x_ix_j \\ &\frac{\partial \pmb{x}'\pmb{A}\pmb{x}}{\partial x_i} =2a_{ii}x_i + 2\sum_{j\ne i}a_{ij}x_j = 2\sum_{j}a_{ij}x_j =2\pmb{a}'\pmb{x}\\ \end{aligned} \]


Example (4.3.22)

Say \(\pmb{A} = \begin{pmatrix} 1& 2 \\ 2 & -1 \end{pmatrix}\), then

\[ \begin{aligned} &\pmb{Ax} = \\ &\begin{pmatrix} 1& 2 \\ 2 & -1\end{pmatrix} \begin{pmatrix} x_ 1 \\ x_2 \end{pmatrix}=\\ &\begin{pmatrix} x_ 1+2x_2 \\ 2x_1-x_2 \end{pmatrix}\\ &\frac{\partial \pmb{Ax}}{\partial x_1} = \begin{pmatrix} 1 \\ 2 \end{pmatrix}=\pmb{a}_1\\ &\frac{\partial \pmb{Ax}}{\partial x_2} = \begin{pmatrix} 2 \\ -1 \end{pmatrix}=\pmb{a}_2 \end{aligned} \] Also

\[ \begin{aligned} &\pmb{x'Ax} = \\ &\begin{pmatrix} x_ 1 & x_2 \end{pmatrix} \begin{pmatrix} x_ 1+2x_2 \\ 2x_1-x_2 \end{pmatrix}= \\ &x_1(x_ 1+2x_2)+x_2(2x_1-x_2)= \\ &x_1^2+4x_1x_2-x_2^2 \\ &\frac{\partial \pmb{x'Ax}}{\partial x_1} = 2x_1+4x_2\\ &\frac{\partial \pmb{x'Ax}}{\partial x_2} = -2x_2+4x_1 \end{aligned} \] but

\[ \begin{aligned} &2\pmb{Ax} = \\ &2\begin{pmatrix} 1& 2 \\ 2 & -1 \end{pmatrix} \begin{pmatrix} x_ 1 \\ x_2 \end{pmatrix}=\\ &2\begin{pmatrix} x_ 1+2x_2 \\ 2x_1-x_2 \end{pmatrix}= \\ &\begin{pmatrix} 2x_ 1+4x_2 \\ 4x_1-2x_2 \end{pmatrix} \end{aligned} \]


Another way to turn matrices into functions is as follows: let \(\pmb{X}=(x_{ij})\) be a \(p\times p\) matrix of variables and define the function \(u=f(\pmb{X})\). Define the matrix of partial derivatives

\[ \frac{\partial u}{\partial \pmb{X}} = \begin{pmatrix} \partial u/\partial x_{11} & ... & \partial u/\partial x_{1p} \\ \partial u/\partial x_{21} & ... & \partial u/\partial x_{2p} \\ \vdots & & \vdots \\ \partial u/\partial x_{p1} & ... & \partial u/\partial x_{1p} \\ \end{pmatrix} \]

Theorem (4.3.23)

Let \(u=tr(\pmb{XA})\) where \(\pmb{X}\) is a positive definite matrix and \(\pmb{A}\) is a matrix of constants. Then

\[\frac{\partial u}{\partial \pmb{X}} = \pmb{A}+\pmb{A'}-diag(\pmb{A})\]

proof

Note that we previously found

\[tr(\pmb{XA})=\sum_{i,j}^p x_{ij}a_{ji}=\sum_{i}^p x_{ii}a_{ii}+\sum_{i,j}^p x_{ij}a_{ji}\]

because \(\pmb{X}\) is positive definite and therefore symmetric. So \(\partial u/\partial x_{ii}=a_{ii}\) and \(\partial u/\partial x_{ij}=a_{ij}+a_{ji}\) if \(i\ne j\).

Also

\[\left[\pmb{A}+\pmb{A'}-diag(\pmb{A})\right]_{ii} = a_{ii}+a_{ii}-a_{ii}=a_{ii}\] and if \(i\ne j\)

\[\left[\pmb{A}+\pmb{A'}-diag(\pmb{A})\right]_{ij} = a_{ij}+a_{ji}\]

Example (4.3.24)

Let

\[ \pmb{A} = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \end{pmatrix} \]

\(\pmb{X}\) is supposed to be positive definite, which implies that it is symmetric and so

\[ \pmb{X}\pmb{A} = \begin{pmatrix} x_{11} & x_{12} \\ x_{12} & x_{22} \end{pmatrix} \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix} = \\ \begin{pmatrix} x_{11} a_{11} +x_{12}a_{21} & x_{11} a_{12} +x_{12}a_{22} \\ x_{12} a_{11} +x_{22}a_{21} & x_{12} a_{12} +x_{22}a_{22} \end{pmatrix} \] and so

\[tr(\pmb{X}\pmb{A})=x_{11}a_{11}+(a_{21}+a_{12})x_{12}+x_{22}a_{22}\]

and so we find

\[ \frac{\partial u}{\partial \pmb{X}} = \begin{pmatrix} a_{11} &a_{12}+a_{21}\\ a_{12}+a_{21} & a_{22} \\ \end{pmatrix} \]

but also

\[ \pmb{A}+\pmb{A'}-diag(\pmb{A}) =\\ \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \end{pmatrix}+ \begin{pmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \\ \end{pmatrix}- \begin{pmatrix} a_{11} & 0 \\ 0 & a_{22} \\ \end{pmatrix}=\\ \begin{pmatrix} a_{11} & a_{12}+a_{21} \\ a_{12}+a_{21} & a_{22} \\ \end{pmatrix} \]

Theorem (4.3.25)

Let \(u=\log\vert \pmb{X}\vert\) where \(\pmb{X}\) is a positive-definite matrix. Then

\[\frac{\partial u}{\partial \pmb{X}} = 2\pmb{X}^{-1}-diag(\pmb{X}^{-1})\]

proof omitted

Example (4.3.26)

\[ \pmb{X} = \begin{pmatrix} x_{11} & x \\ x & x_{22} \end{pmatrix}\\ \vert \pmb{X} \vert = x_{11}x_{22}-x^2 \\ \log \vert \pmb{X} \vert = \log(x_{11}x_{22}-x^2) \\ \frac{\partial u}{\partial \pmb{X}} = \begin{pmatrix} x_{22} & -2x \\ -2x & x_{11} \end{pmatrix}/(x_{11}x_{22}-x^2) \]

also

\[ \begin{aligned} &\pmb{X}^{-1} =\frac1{x_{11}x_{22}-x^2} \begin{pmatrix} x_{22} & -x \\ -x & x_{11} \end{pmatrix} \end{aligned} \\ 2\pmb{X}^{-1}-diag(\pmb{X}^{-1}) = \\ 2\frac1{x_{11}x_{22}-x^2}\begin{pmatrix} x_{22} & -x \\ -x & x_{11} \end{pmatrix}- \frac1{x_{11}x_{22}-x^2}\begin{pmatrix} x_{22} & 0 \\ 0 & x_{11} \end{pmatrix} = \\ \begin{pmatrix} x_{22} & -2x \\ -2x & x_{11} \end{pmatrix}/(x_{11}x_{22}-x^2) \]

Example (4.3.26a)

\[ \pmb{X} = \begin{pmatrix} 2x_{11} & -x_{12} & 0\\ -x_{12} & 2x_{22} &-x_{23}\\ 0 & -x_{23} & 2x_{33} \end{pmatrix} \]

is positive-definite if all x’s are close to 1, so

X=matrix(c(2,-1, 0, -1, 2, -1, 0, -1, 2), 3, 3)
round(2*solve(X)-diag(diag(solve(X))), 1)
##      [,1] [,2] [,3]
## [1,]  0.8    1  0.5
## [2,]  1.0    1  1.0
## [3,]  0.5    1  0.7

but we can also find \(\frac{\partial \log \vert\pmb{X}\vert}{\partial \pmb{X}}\) numerically:

du=function(X, h=0.0001) {
   X=matrix(c(2,-1, 0, -1, 2, -1, 0, -1, 2), 3, 3)
   f=function(X) log(det(X))
   n=ncol(X)
   out=matrix(0, n, n)
   for(i in 1:n) 
      for(j in 1:n) {
         X1=X
         X1[i,j]=X[i,j]+h
         out[i,j]=(f(X1)-f(X))*ifelse(i==j, 1, 2)
      }   
   out/h
}
round(du(X), 1)
##      [,1] [,2] [,3]
## [1,]  0.7    1  0.5
## [2,]  1.0    1  1.0
## [3,]  0.5    1  0.7

where the off-diagonal elements are multiplied by 2 because the matrix is symmetric.

Definition (4.3.27)

Let A be an \(n\times n\) nonsingular matrix with elements \((a_{ij})\) that are functions of a scalar x. Then we define

\[\frac{\partial \pmb{A}}{\partial x} = \left(\frac{\partial \pmb{a}_{ij}}{\partial x} \right)\]

Theorem (4.3.28)

Let \(\pmb{A}\) be nonsingular of order n with derivative \(\frac{\partial \pmb{A}}{\partial x}\). Then

\[\frac{\partial \pmb{A}^{-1}}{\partial x} = -\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x}\pmb{A}^{-1}\]

proof

Because \(\pmb{A}\) is nonsingular we have

\[\pmb{A}^{-1}\pmb{A}=\pmb{I}\]

Therefore

\[\frac{\partial \pmb{A}^{-1}\pmb{A}}{\partial x}=\frac{\partial \pmb{A}^{-1}}{\partial x}\pmb{A}+\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x} = \pmb{O}\]

and so

\[\frac{\partial \pmb{A}^{-1}}{\partial x}\pmb{A}=-\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x}\]

and

\[\frac{\partial \pmb{A}^{-1}}{\partial x}=-\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x}\pmb{A}^{-1}\]

Example (4.3.29)

Let’s see what this means for a \(1\times 1\) matrix: \(\pmb{A}=[f(x)]\), so \(\pmb{A}^{-1}=[\frac1{f(x)}]\) and so

\[\frac{\partial \pmb{A}^{-1}}{\partial x}=\frac{\partial }{\partial x}[\frac1{f(x)}]=[-\frac{f'(x)}{f(x)^2}]\]

On the other hand

\[-\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x}\pmb{A}^{-1} = -[\frac1{f(x)}][f'(x)][\frac1{f(x)}] =[-\frac{f'(x)}{f(x)^2}]\]

Note that of course \(\pmb{A}^{-1}\not=[f^{-1}(x)]\)

Consider

\[ \pmb{A} = \begin{pmatrix} x & x^2 \\ 2x & 1 \end{pmatrix} \]

Let’s try and check on this with R:

A=function(x) rbind( c(x, x^2), c(2*x, 1) )
A.inf=function(x) solve(A(x))
x=1.5;h=0.001
A.prime = (A(x+h)-A(x))/h
A.inf.prime = (A.inf(x+h)-A.inf(x))/h
round(A.inf.prime, 2)
##       [,1]  [,2]
## [1,]  0.45 -0.45
## [2,] -0.98  0.49
round(-solve(A(x))%*%A.prime%*%solve(A(x)), 2)
##       [,1]  [,2]
## [1,]  0.45 -0.45
## [2,] -0.98  0.49
A.inf=function(x) solve(A(x))
x=2;h=0.001
A.prime = (A(x+h)-A(x))/h
A.inf.prime = (A.inf(x+h)-A.inf(x))/h
round(A.inf.prime, 2)
##       [,1]  [,2]
## [1,]  0.12 -0.18
## [2,] -0.33  0.16
round(-solve(A(x))%*%A.prime%*%solve(A(x)), 2)
##       [,1]  [,2]
## [1,]  0.12 -0.18
## [2,] -0.33  0.16

Theorem (4.3.30)

Let \(\pmb{A}\) be a positive definite matrix. Then

\[\frac{\partial \log \vert \pmb{A}\vert}{\partial x}= tr\left(\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x}\right)\]

proof omitted

Example (4.3.31)

Consider

\[ \pmb{A} = \begin{pmatrix} x & x^2 \\ x^2 & 3 \end{pmatrix} \]

\[ \begin{aligned} &\frac{\partial \log \vert \pmb{A}\vert}{\partial x}= \\ &\frac{\partial \log \left( 3x-x^4 \right)}{\partial x}= \\ &\frac{3-4x^3}{3x-x^4} \\ \end{aligned} \]

and

\[ \begin{aligned} &\frac{\partial A}{\partial x}= \begin{pmatrix} 1 & 2x \\ 2x & 0 \end{pmatrix} \\ &\pmb{A}^{-1} = \frac1{3x-x^4} \begin{pmatrix} 3 & -x^2 \\ -x^2 & x \end{pmatrix}\\ &\pmb{A}^{-1}\frac{\partial A}{\partial x}= \frac1{3x-x^4} \begin{pmatrix} 3 & -x^2 \\ -x^2 & x \end{pmatrix} \begin{pmatrix} 1 & 2x \\ 2x & 0 \end{pmatrix}= \\ &\frac1{3x-x^4} \begin{pmatrix} 3-2x^2 & 6x \\ x^2 & -2x^2 \end{pmatrix} \\ &tr\left(\pmb{A}^{-1}\frac{\partial A}{\partial x}\right) =\\ &\frac1{3x-x^4}\left(3-2x^2 -2x^2\right) = \frac{3-4x^3}{3x-x^4} \end{aligned} \]

Optimization - Lagrange Multipliers

If we have a function \(u=f(\pmb{x})\) and we want to find a maxima or minima we may be able to do so by finding solving the equation \(\frac{\partial u}{\partial x}=0\). Often however we need to find a maxima or minima under additional constraints. Let’s denote these by \(h_i(\pmb{x})=0\), i=1,..,q. Using the method of Lagrange multipliers means finding an extremum of the function

\[v=u+\sum_{i=1}^h \lambda_i h_i(\pmb{x})\]

which means solving the system of equations

\[\frac{\partial u}{\partial x} +\sum_{i=1}^q \lambda_i\frac{\partial h_i}{\partial x}=0\] \[h_i(\pmb{x})=0;i=1,..,q\]

The \(\lambda_i's\) are called the Lagrange multipliers

Example (4.3.32)

Let \(f(x, y)=2x^2+3xy+y^2\). Find the minimum of f subject to the constraint \(x+2y=1\)

Here \(h(x,y)=x+2y-1\), so

\[ \begin{aligned} &\frac{\partial f}{\partial x} + \lambda \frac{\partial h}{\partial x}=4x+3y+\lambda =0 \text{ }(I)\\ &\frac{\partial f}{\partial y} + \lambda \frac{\partial h}{\partial x}=3x+2y+2\lambda =0 \text{ }(II)\\ &x+2y -1 =0 \text{ }(III)\\ &I-II:x+y-\lambda=0; \lambda=x+y\text{ }(VI)\\ &VI\rightarrow I\text{ }5x+4y=0 \text{ }(V)\\ &V-2II \rightarrow 3x=-2; x=-2/3\\ &y=(1-x)/1=(1-(-2/3))/2=5/6 \end{aligned} \]