For any square matrix \(\pmb{A}\) a scalar \(\lambda\) and a vector \(\pmb{x}\) can be found such that
\[\pmb{A}\pmb{x}=\lambda \pmb{x}\]
\(\lambda\) is called an eigenvalue and \(\pmb{x}\) its eigenvector.
Note if \(\pmb{A}\pmb{x}=\lambda \pmb{x}\) we have \(\pmb{A}\pmb{x} -\lambda \pmb{x} = 0\) or \(\left(\pmb{A} -\lambda \pmb{I}\right)\pmb{x} = 0\). Therefore \(\pmb{A} -\lambda \pmb{I}\) is a singular matrix and \(\vert \pmb{A} -\lambda \pmb{I}\vert=0\), which is called the characteristic equation.
\[ \begin{aligned} &\pmb{A} = \begin{pmatrix} 1 & 3 \\ -1 & 5 \end{pmatrix} \\ &\vert \pmb{A} -\lambda \pmb{I}\vert = \begin{vmatrix} 1-\lambda & 3 \\ -1 & 5-\lambda \end{vmatrix} =\\ &(1-\lambda)(5-\lambda)+3 = 0\\ &\lambda^2-6\lambda+8=(\lambda-2)(\lambda-4)=0 \end{aligned} \]
so we have eigenvalues \(\lambda_1=2\) and \(\lambda_2=4\). Now
\[ \begin{aligned} &\pmb{A} -\lambda_1 \pmb{I} =0 \\ &\begin{pmatrix} 1-2 & 3 \\ -1 & 5-2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}= \begin{pmatrix} 0 \\ 0 \end{pmatrix} \\ &-x_1+3x_2=0 \\ \end{aligned} \\ \]
set \(x_1=1\), then \(x_2=1/3\). Also
\[\sqrt{x_1^2+x_2^2} = \sqrt{1^2+(1/3)^2} = \sqrt{10}/3\]
so the normalized eigenvector is \((1\text{ }1/3)'/(\sqrt{10}/3) = (3\text{ }1)'/\sqrt{10}\).
And
\[ \begin{aligned} & \pmb{A} -\lambda_2 \pmb{I}=0 \\ &\begin{pmatrix} 1-4 & 3 \\ -1 & 5-4 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix} \\ &-3x_1+3x_2=0 \end{aligned} \\ \]
Again setting \(x_1=1\), then \(x_2=1\). Also
\[\sqrt{x_1^2+x_2^2} = \sqrt{1^2+1^2}=\sqrt{2}\]
so the normalized eigenvector is \((1\text{ }1)'/\sqrt{2}\).
Let’s check:
A=rbind(c(1, 3), c(-1, 5))
A
## [,1] [,2]
## [1,] 1 3
## [2,] -1 5
x=cbind(c(3, 1)/sqrt(10))
cbind(A%*%x, 2*x)
## [,1] [,2]
## [1,] 1.8973666 1.8973666
## [2,] 0.6324555 0.6324555
x=cbind(c(1, 1)/sqrt(2))
cbind(A%*%x, 4*x)
## [,1] [,2]
## [1,] 2.828427 2.828427
## [2,] 2.828427 2.828427
with R we can find eigenvalues and eigenvectors with
eigen(A)
## eigen() decomposition
## $values
## [1] 4 2
##
## $vectors
## [,1] [,2]
## [1,] -0.7071068 -0.9486833
## [2,] -0.7071068 -0.3162278
Let \(\pmb{A}\) be a matrix with eigenvalue and eigenvector \(\lambda, \pmb{x}\). Let c,k be scalars with \(c\ne0,k\ne0\), Then
proof
The statement with n follows by repeating this calculation n times.
\[ \begin{aligned} &\pmb{x} = \pmb{A}^{-1}\pmb{A}\pmb{x} =\pmb{A}^{-1}\lambda\pmb{x} \\ &\pmb{A}^{-1}\pmb{x} =\frac1{\lambda} \pmb{x} \end{aligned} \]
Note that if \(\lambda, \mu\) are eigenvalues of \(\pmb{A}\) and \(\pmb{B}\) respectively, in general \(\lambda+ \mu\) is NOT an eigenvalue of \(\pmb{A+B}\).
Let \(\pmb{A}\) be a matrix with eigenvalue \(\lambda\) and eigenvector \(\pmb{x}\). Let \(p(x)=\sum_{k=0}^n a_k x^k\) be a polynomial. Then
\[p(\pmb{A})\pmb{x}=p(\lambda)\pmb{x}\]
proof
\[ \begin{aligned} &p(\pmb{A})\pmb{x} =\left( \sum_{k=0}^n a_k \pmb{A}^k \right)\pmb{x} =\\ &\sum_{k=0}^n \left( a_k \pmb{A}^k \pmb{x} \right) = \\ &\sum_{k=0}^n a_k \left( \pmb{A}^k \pmb{x} \right) = \\ &\sum_{k=0}^n a_k \left( \lambda^k \pmb{x} \right) = \\ &\left( \sum_{k=0}^n a_k \lambda^k \right)\pmb{x} = \\ &p(\lambda)\pmb{x} \end{aligned} \]
If the respective series are convergent this can sometimes be extended to infinite series:
say \(\lambda\) is an eigenvalue of \(\pmb{A}\), then \(1-\lambda\) is an eigenvalue of \(\pmb{I}-\pmb{A}\) by (4.3.3). If \(\pmb{I}-\pmb{A}\) is nonsingular \(\frac1{1-\lambda}\) is an eigenvalue of \((\pmb{I-A})^{-1}\), also by (4.3.3). If \(\vert \lambda\vert<1\), then
\[\frac1{1-\lambda} =\sum_{k=0}^\infty \lambda^k\]
and so
\[(\pmb{I-A})^{-1} =\sum_{k=0}^\infty \pmb{A}^k\]
The eigenvalues of \(\pmb{AB}\) are the same as the eigenvalues of \(\pmb{BA}\)
If P is any nonsingular matrix, then \(\pmb{A}\) and \(\pmb{P^{-1}AP}\) have the same eigenvalues.
If C is any orthogonal matrix, then \(\pmb{A}\) and \(\pmb{C'AC}\) have the same eigenvalues.
proof
\[\pmb{B}\pmb{A}(\pmb{B}\pmb{x})=\pmb{B}(\pmb{AB})\pmb{x} =\pmb{B}(\lambda\pmb{x}) = \lambda(\pmb{B}\pmb{x}) \]
so \(\lambda\) is an eigenvalue of \(\pmb{BA}\) with eigenvector \(\pmb{Bx}\).
by i. \(\pmb{P^{-1}AP}\) has the same eigenvalues as \(\pmb{P^{-1}PA}= \pmb{A}\)
same as ii.
Let’s consider again the orthonormal matrix
\[ \pmb{C} = \begin{pmatrix} \cos x & \sin x \\ -\sin x & \cos x \end{pmatrix} \]
and let \(\pmb{A} =\begin{pmatrix} 1 & 2 \\ -1 & 3\end{pmatrix}\), \(\pmb{B} =\begin{pmatrix} 0 & -2 \\ 1 & 5\end{pmatrix}\), then
Cx=function(x) matrix(c(cos(x), -sin(x), sin(x), cos(x)), 2, 2)
A=matrix(c(1,-1,2,3), 2, 2)
B=matrix(c(0,1,-2,5), 2, 2)
eigen(A%*%B)$value
## [1] 18.4582364 0.5417636
eigen(B%*%A)$value
## [1] 18.4582364 0.5417636
eigen(A)$value
## [1] 2+1i 2-1i
eigen(t(Cx(1))%*%A%*%Cx(1))$value
## [1] 2+1i 2-1i
eigen(t(Cx(-0.5))%*%A%*%Cx(-0.5))$value
## [1] 2+1i 2-1i
Let \(\pmb{A}\) be an \(n\times n\) symmetric matrix. Then
proof
\[ \begin{aligned} &\lambda\pmb{\bar{x}'x} = \\ &\pmb{\bar{x}'}(\lambda\pmb{x}) = \\ &\pmb{\bar{x}'Ax} = \\ &(\pmb{A'\bar{x}})'\pmb{x} = \\ &(\pmb{A\bar{x}})'\pmb{x} = \\ &(\bar{\lambda}\pmb{\bar{x}})'\pmb{x} = \\ &\bar{\lambda}\pmb{\bar{x}}'\pmb{x} \end{aligned} \] and so \(\lambda=\bar{\lambda}\), or \(\lambda\) is real.
A=matrix(c(1, -1, 3, -1, 0, 5, 3, 5, 1), 3, 3)
A
## [,1] [,2] [,3]
## [1,] 1 -1 3
## [2,] -1 0 5
## [3,] 3 5 1
eigen(A)$value
## [1] 6.078042 1.617631 -5.695672
t(eigen(A)$vector[, 1])%*%eigen(A)$vector[, 2]
## [,1]
## [1,] -4.597017e-17
t(eigen(A)$vector[, 1])%*%eigen(A)$vector[, 3]
## [,1]
## [1,] -1.110223e-16
t(eigen(A)$vector[, 2])%*%eigen(A)$vector[, 3]
## [,1]
## [1,] -1.569925e-16
Spectral Decomposition
Let \(\pmb{A}\) be an \(n\times n\) symmetric matrix. Let \(\lambda_1,..,\lambda_n\) and \(\pmb{x}_1,..,\pmb{x}_n\) be its eigenvalues and eigenvectors. Let \(\pmb{D}=diag(\lambda_1,..,\lambda_n)\) and \(\pmb{C} = (\pmb{x}_1,..,\pmb{x}_n)\). Then
\[\pmb{A} = \pmb{C}\pmb{D}\pmb{C}' = \sum_{i=1}^n \lambda_i \pmb{x}_i\pmb{x}'_i\]
and
\[\pmb{C}'\pmb{A}\pmb{C}=\pmb{D}\]
proof
by its definition \(\pmb{C}\) is orthogonal. Therefore \(\pmb{I}=\pmb{C}\pmb{C}'\) and so \(\pmb{A}=\pmb{A}\pmb{C}\pmb{C}'\) and
\[ \begin{aligned} &\pmb{A} = \\ &\pmb{A}(\pmb{x}_1,..,\pmb{x}_n)\pmb{C}' = \\ &(\pmb{A}\pmb{x}_1,..,\pmb{A}\pmb{x}_n)\pmb{C}' = \\ &(\lambda_1\pmb{x}_1,..,\lambda_n\pmb{x}_n)\pmb{C}' = \\ &\pmb{C}\pmb{D}\pmb{C}' \end{aligned} \]
Spectral Decomposition II
Let \(\pmb{A}\) be an \(n\times n\) nonsingular matrix. Let \(\lambda_1,..,\lambda_n\) and \(\pmb{x}_1,..,\pmb{x}_n\) be its eigenvalues and eigenvectors. Let \(\pmb{D}=diag(\lambda_1,..,\lambda_n)\) and \(\pmb{P} = (\pmb{x}_1,..,\pmb{x}_n)\). Then
\[\pmb{A} = \pmb{P}\pmb{D}\pmb{P}^{-1}\]
The spectral decomposition can be used to define functions of matrices: say \(\pmb{A} = \pmb{P}\pmb{D}\pmb{P}^{-1}\), then
\[ \begin{aligned} &\pmb{A}^2 = \pmb{A}\pmb{A} = \\ &\pmb{P}\pmb{D}\pmb{P}^{-1}\pmb{P}\pmb{D}\pmb{P}^{-1} = \\ &\pmb{P}\pmb{D}\pmb{D}\pmb{P}^{-1} = \\ &\pmb{P}\pmb{D}^2\pmb{P}^{-1} \\ \end{aligned} \]
and the square of a diagonal matrix is easy to find. This immediately generalizes to
\[\pmb{A}^n = \pmb{P}\pmb{D}^n\pmb{P}^{-1}\]
Let’s say we have a function and we know its power series expansion \(f(x)=\sum_{i=0}^\infty a_i x^i\), then
\[ \begin{aligned} &f(\pmb{A}) = \sum_{i=0}^\infty a_i \pmb{A}^i = \\ &\sum_{i=0}^\infty a_i \pmb{P}\pmb{D}^i\pmb{P}^{-1} = \\ &\pmb{P}\left[ \sum_{i=0}^\infty a_i \pmb{D}^i \right]\pmb{P}^{-1} \end{aligned} \]
\[ \pmb{A} = \begin{pmatrix} 1 & 2 \\ 0 & 2 \end{pmatrix} \]
and we want to find \(\pmb{B}=\exp (\pmb{A})\).
Let’s find its eigenvalues and eigenvectors with R:
A=rbind( c(1, 2), c(0, 2) )
EA=eigen(A)
EA
## eigen() decomposition
## $values
## [1] 2 1
##
## $vectors
## [,1] [,2]
## [1,] 0.8944272 1
## [2,] 0.4472136 0
so \(\pmb{A}\) has eigenvalues 2 and 1 and eigenvectors \((0.894, 0.447)'\), \((1,0)'\). Recall that
\[\exp(x) = \sum_{i=0}^\infty \frac{x^i}{i!}\]
and so
\[\pmb{B}=\exp (\pmb{A}) = \pmb{P}\left[ \sum_{i=0}^\infty \pmb{D}^i/i! \right]\pmb{P}^{-1}\]
now
\[ \begin{aligned} &\sum_{i=0}^\infty \pmb{D}^i/i! = \\ &\sum_{i=0}^\infty \frac1{i!} \begin{pmatrix} 2 & 0 \\ 0 & 1 \end{pmatrix}^i = \\ &\begin{pmatrix} \sum_{i=0}^\infty \frac{2^i}{i!} & 0 \\ 0 & \sum_{i=0}^\infty \frac{1^i}{i!} \end{pmatrix} = \\ &\begin{pmatrix} e^2 & 0 \\ 0 & e^1 \end{pmatrix} \end{aligned} \]
and so
P=EA$vectors
D1=diag(c(exp(2), exp(1)))
P%*%D1%*%solve(P)
## [,1] [,2]
## [1,] 2.718282 9.341549
## [2,] 0.000000 7.389056
Let \(\pmb{A}\) be an \(n\times n\) matrix with eigenvalues \(\lambda_1,..,\lambda_n\). Then
proof
If \(\pmb{A}\) is a symmetric matrix we have
\[\vert\pmb{A}\vert = \vert\pmb{CDC'}\vert = \vert\pmb{C}\vert\vert\pmb{D}\vert\vert\pmb{C}'\vert=\vert\pmb{D}\vert=\prod_{i=1}^n \lambda_i\]
The general case can be found in any linear algebra textbook.
Say
\[ \pmb{A} = \begin{pmatrix} 1 & 3 \\ -1 & 5 \end{pmatrix} \]
we have previously found the eigenvalues to be 2 and 4. Now
\[det(\pmb{A}) = 1*5-(3*(-1)=8 = 2*4\]
and
\[tr(\pmb{A}) = 1+5 = 6 = 2+4\]
Let \(\pmb{A}\) be an \(n\times n\) matrix with eigenvalues \(\lambda_1,..,\lambda_n\). Then
proof
\[ \begin{aligned} &\pmb{A}\pmb{x} =\lambda_i\pmb{x} \\ &\pmb{x}'\pmb{A}\pmb{x} =\pmb{x}'\lambda_i\pmb{x} = \lambda_i\pmb{x}'\pmb{x}\\ &\lambda_i = \frac{\pmb{x}'\pmb{A}\pmb{x}}{\pmb{x}'\pmb{x}} >0\\ \end{aligned} \]
proof of ii omitted
Say \(\pmb{A}\) is positive-definite. Let \(\pmb{A}=\pmb{C}\pmb{D}\pmb{C}'\) be the spectral decomposition of \(\pmb{A}\). Define \(\pmb{D}^{1/2} = diag(\sqrt{\lambda_1},..,\sqrt{\lambda_1})\) (which we can do because all eigenvalues are positive) and define \(\pmb{A}^{1/2}=\pmb{C}\pmb{D}^{1/2}\pmb{C}'\). Now
\[ \begin{aligned} &\pmb{A}^{1/2}\pmb{A}^{1/2} = \\ &\pmb{C}\pmb{D}^{1/2}\pmb{C}'\pmb{C}\pmb{D}^{1/2}\pmb{C}' = \\ &\pmb{C}\pmb{D}^{1/2}\pmb{D}^{1/2}\pmb{C}' = \\ &\pmb{C}\pmb{D}\pmb{C}'=\pmb{A} \end{aligned} \]
and so \(\pmb{A}^{1/2}\) is the square root of \(\pmb{A}\)!
Say
\[ \pmb{A} = \begin{pmatrix} 2 & -1 & 0\\ -1 & 2 &-1\\ 0 & -1 & 2 \end{pmatrix} \]
A=matrix(c(2,-1, 0, -1, 2, -1, 0, -1, 2), 3, 3)
A
## [,1] [,2] [,3]
## [1,] 2 -1 0
## [2,] -1 2 -1
## [3,] 0 -1 2
lambda=eigen(A)$value
C=eigen(A)$vector
sqA = C%*%diag(sqrt(lambda))%*%t(C)
round(sqA, 5)
## [,1] [,2] [,3]
## [1,] 1.36039 -0.38268 -0.05383
## [2,] -0.38268 1.30656 -0.38268
## [3,] -0.05383 -0.38268 1.36039
round(sqA%*%sqA, 5)
## [,1] [,2] [,3]
## [1,] 2 -1 0
## [2,] -1 2 -1
## [3,] 0 -1 2
A square matrix \(\pmb{A}\) is called idempotent if \(\pmb{A}^2=\pmb{A}\)
The only nonsingular idempotent matrix is the identity.
proof
Say \(\pmb{A}\) is idempotent and nonsingular, so \(\pmb{A}^2=\pmb{A}\) and \(\pmb{A}^{-1}\) exists. Therefore
\[\pmb{A}=(\pmb{A}^{-1}\pmb{A})\pmb{A} =\pmb{A}^{-1}\pmb{A}^2=\pmb{A}^{-1}\pmb{A} = \pmb{I}\]
If \(\pmb{A}\) is symmetric and idempotent then \(\pmb{A}\) is positive semidefinite.
proof
\[\pmb{A} =\pmb{A}^2=\pmb{A}\pmb{A}=\pmb{A}'\pmb{A}\] \[\pmb{x'Ax} =(\pmb{Ax})'(\pmb{Ax})\ge 0\]
If \(\pmb{A}\) is symmetric and idempotent of rank r, then \(\pmb{A}\) has r eigenvalues equal to 1 and n-r eigenvalues equal to 0.
proof
\[ \begin{aligned} &\pmb{A}\pmb{x} =\lambda\pmb{x} \\ &\pmb{A}^2\pmb{x} =\lambda^2\pmb{x} \\ &\pmb{A}^2\pmb{x} =\pmb{A}\pmb{x} = \lambda\pmb{x} \\ &\lambda\pmb{x}=\lambda^2\pmb{x} \\ &(\lambda-\lambda^2)\pmb{x}=\lambda(1-\lambda)\pmb{x}=\pmb{0} \\ \end{aligned} \]
and so all eigenvalues are either 0 or 1.
\(\pmb{A}\) is positive semidefinite and therefore the rank of \(\pmb{A}\) is equal to the number of positive eigenvalues, that is the number of eigenvalues equal to 1.
If \(\pmb{A}\) is symmetric and idempotent of rank r, then \(tr(\pmb{A})=r\).
If \(\pmb{A}\) is \(n\times n\) and idempotent, \(\pmb{P}\) is nonsingular, and \(\pmb{C}\) is orthogonal we have
proof omitted
Let
\[ \pmb{A} = \begin{pmatrix} 2 & -2 & -4 \\ -1 & 3 & 4\\ 1 & -2 & -3\\ \end{pmatrix} \]
A=matrix(c(2, -1, 1,-2,3,-2,-4,4,-3), 3, 3)
A%*%A
## [,1] [,2] [,3]
## [1,] 2 -2 -4
## [2,] -1 3 4
## [3,] 1 -2 -3
round(eigen(A)$value, 6)
## [1] 1 1 0
library(Matrix)
c(rankMatrix(A))
## [1] 2
sum(diag(A))
## [1] 2
diag(3)-A
## [,1] [,2] [,3]
## [1,] -1 2 4
## [2,] 1 -2 -4
## [3,] -1 2 4
(diag(3)-A)%*%(diag(3)-A)
## [,1] [,2] [,3]
## [1,] -1 2 4
## [2,] 1 -2 -4
## [3,] -1 2 4
A%*%(diag(3)-A)
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
## [3,] 0 0 0
There are a number of ways to extend the ideas of calculus to matrices. We will discuss two:
Let \(u=f(\pmb{x} )\) be a function of variables \(\pmb{x} = (x_1,..,x_p)'\) and let \(\partial u/\partial x_i\) be the partial derivatives. We define the vector \(\partial u/\partial \pmb{x}\) as
\[ \frac{\partial u}{\partial \pmb{x}} = \begin{pmatrix} \partial u/\partial x_1 \\ \partial u/\partial x_2 \\ \vdots \\ \partial u/\partial x_p \end{pmatrix} \]
Let \(u=\pmb{a}'\pmb{x}\) where \(\pmb{a} = (a_1,..,a_p)'\). Then
\[\frac{\partial u}{\partial \pmb{x}} = \frac{\partial \pmb{a}'\pmb{x}}{\partial \pmb{x}} = \frac{\partial \pmb{x}'\pmb{a}}{\partial \pmb{x}} = \pmb{a}\]
proof obvious
Let \(u=\pmb{x}'\pmb{A}\pmb{x}\) where \(\pmb{A}\) is a symmetric matrix. Then
\[\frac{\partial u}{\partial \pmb{x}} = \frac{\partial \pmb{x}'\pmb{A}\pmb{x}}{\partial \pmb{x}} = 2\pmb{A}\pmb{x}\]
proof
\[ \begin{aligned} &(\pmb{A}\pmb{x})_k =\sum_{i=1}^p a_{ki}x_i \\ &\pmb{x}'\pmb{A}\pmb{x} =\sum_{j=1}^p x_j \left( \sum_{i=1}^p a_{ji}x_i \right) = \sum_{i;j=1}^p a_{ji}x_ix_j = \\ &\sum_{i=1}^p a_{ii}x_i^2 + \sum_{i\ne j}^p a_{ji}x_ix_j \\ &\frac{\partial \pmb{x}'\pmb{A}\pmb{x}}{\partial x_i} =2a_{ii}x_i + 2\sum_{j\ne i}a_{ij}x_j = 2\sum_{j}a_{ij}x_j =2\pmb{a}'\pmb{x}\\ \end{aligned} \]
Say \(\pmb{A} = \begin{pmatrix} 1& 2 \\ 2 & -1 \end{pmatrix}\), then
\[ \begin{aligned} &\pmb{Ax} = \\ &\begin{pmatrix} 1& 2 \\ 2 & -1\end{pmatrix} \begin{pmatrix} x_ 1 \\ x_2 \end{pmatrix}=\\ &\begin{pmatrix} x_ 1+2x_2 \\ 2x_1-x_2 \end{pmatrix}\\ &\frac{\partial \pmb{Ax}}{\partial x_1} = \begin{pmatrix} 1 \\ 2 \end{pmatrix}=\pmb{a}_1\\ &\frac{\partial \pmb{Ax}}{\partial x_2} = \begin{pmatrix} 2 \\ -1 \end{pmatrix}=\pmb{a}_2 \end{aligned} \] Also
\[ \begin{aligned} &\pmb{x'Ax} = \\ &\begin{pmatrix} x_ 1 & x_2 \end{pmatrix} \begin{pmatrix} x_ 1+2x_2 \\ 2x_1-x_2 \end{pmatrix}= \\ &x_1(x_ 1+2x_2)+x_2(2x_1-x_2)= \\ &x_1^2+4x_1x_2-x_2^2 \\ &\frac{\partial \pmb{x'Ax}}{\partial x_1} = 2x_1+4x_2\\ &\frac{\partial \pmb{x'Ax}}{\partial x_2} = -2x_2+4x_1 \end{aligned} \] but
\[ \begin{aligned} &2\pmb{Ax} = \\ &2\begin{pmatrix} 1& 2 \\ 2 & -1 \end{pmatrix} \begin{pmatrix} x_ 1 \\ x_2 \end{pmatrix}=\\ &2\begin{pmatrix} x_ 1+2x_2 \\ 2x_1-x_2 \end{pmatrix}= \\ &\begin{pmatrix} 2x_ 1+4x_2 \\ 4x_1-2x_2 \end{pmatrix} \end{aligned} \]
Another way to turn matrices into functions is as follows: let \(\pmb{X}=(x_{ij})\) be a \(p\times p\) matrix of variables and define the function \(u=f(\pmb{X})\). Define the matrix of partial derivatives
\[ \frac{\partial u}{\partial \pmb{X}} = \begin{pmatrix} \partial u/\partial x_{11} & ... & \partial u/\partial x_{1p} \\ \partial u/\partial x_{21} & ... & \partial u/\partial x_{2p} \\ \vdots & & \vdots \\ \partial u/\partial x_{p1} & ... & \partial u/\partial x_{1p} \\ \end{pmatrix} \]
Let \(u=tr(\pmb{XA})\) where \(\pmb{X}\) is a positive definite matrix and \(\pmb{A}\) is a matrix of constants. Then
\[\frac{\partial u}{\partial \pmb{X}} = \pmb{A}+\pmb{A'}-diag(\pmb{A})\]
proof
Note that we previously found
\[tr(\pmb{XA})=\sum_{i,j}^p x_{ij}a_{ji}=\sum_{i}^p x_{ii}a_{ii}+\sum_{i,j}^p x_{ij}a_{ji}\]
because \(\pmb{X}\) is positive definite and therefore symmetric. So \(\partial u/\partial x_{ii}=a_{ii}\) and \(\partial u/\partial x_{ij}=a_{ij}+a_{ji}\) if \(i\ne j\).
Also
\[\left[\pmb{A}+\pmb{A'}-diag(\pmb{A})\right]_{ii} = a_{ii}+a_{ii}-a_{ii}=a_{ii}\] and if \(i\ne j\)
\[\left[\pmb{A}+\pmb{A'}-diag(\pmb{A})\right]_{ij} = a_{ij}+a_{ji}\]
Let
\[ \pmb{A} = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \end{pmatrix} \]
\(\pmb{X}\) is supposed to be positive definite, which implies that it is symmetric and so
\[ \pmb{X}\pmb{A} = \begin{pmatrix} x_{11} & x_{12} \\ x_{12} & x_{22} \end{pmatrix} \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix} = \\ \begin{pmatrix} x_{11} a_{11} +x_{12}a_{21} & x_{11} a_{12} +x_{12}a_{22} \\ x_{12} a_{11} +x_{22}a_{21} & x_{12} a_{12} +x_{22}a_{22} \end{pmatrix} \] and so
\[tr(\pmb{X}\pmb{A})=x_{11}a_{11}+(a_{21}+a_{12})x_{12}+x_{22}a_{22}\]
and so we find
\[ \frac{\partial u}{\partial \pmb{X}} = \begin{pmatrix} a_{11} &a_{12}+a_{21}\\ a_{12}+a_{21} & a_{22} \\ \end{pmatrix} \]
but also
\[ \pmb{A}+\pmb{A'}-diag(\pmb{A}) =\\ \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \end{pmatrix}+ \begin{pmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \\ \end{pmatrix}- \begin{pmatrix} a_{11} & 0 \\ 0 & a_{22} \\ \end{pmatrix}=\\ \begin{pmatrix} a_{11} & a_{12}+a_{21} \\ a_{12}+a_{21} & a_{22} \\ \end{pmatrix} \]
Let \(u=\log\vert \pmb{X}\vert\) where \(\pmb{X}\) is a positive-definite matrix. Then
\[\frac{\partial u}{\partial \pmb{X}} = 2\pmb{X}^{-1}-diag(\pmb{X}^{-1})\]
proof omitted
\[ \pmb{X} = \begin{pmatrix} x_{11} & x \\ x & x_{22} \end{pmatrix}\\ \vert \pmb{X} \vert = x_{11}x_{22}-x^2 \\ \log \vert \pmb{X} \vert = \log(x_{11}x_{22}-x^2) \\ \frac{\partial u}{\partial \pmb{X}} = \begin{pmatrix} x_{22} & -2x \\ -2x & x_{11} \end{pmatrix}/(x_{11}x_{22}-x^2) \]
also
\[ \begin{aligned} &\pmb{X}^{-1} =\frac1{x_{11}x_{22}-x^2} \begin{pmatrix} x_{22} & -x \\ -x & x_{11} \end{pmatrix} \end{aligned} \\ 2\pmb{X}^{-1}-diag(\pmb{X}^{-1}) = \\ 2\frac1{x_{11}x_{22}-x^2}\begin{pmatrix} x_{22} & -x \\ -x & x_{11} \end{pmatrix}- \frac1{x_{11}x_{22}-x^2}\begin{pmatrix} x_{22} & 0 \\ 0 & x_{11} \end{pmatrix} = \\ \begin{pmatrix} x_{22} & -2x \\ -2x & x_{11} \end{pmatrix}/(x_{11}x_{22}-x^2) \]
\[ \pmb{X} = \begin{pmatrix} 2x_{11} & -x_{12} & 0\\ -x_{12} & 2x_{22} &-x_{23}\\ 0 & -x_{23} & 2x_{33} \end{pmatrix} \]
is positive-definite if all x’s are close to 1, so
X=matrix(c(2,-1, 0, -1, 2, -1, 0, -1, 2), 3, 3)
round(2*solve(X)-diag(diag(solve(X))), 1)
## [,1] [,2] [,3]
## [1,] 0.8 1 0.5
## [2,] 1.0 1 1.0
## [3,] 0.5 1 0.7
but we can also find \(\frac{\partial \log \vert\pmb{X}\vert}{\partial \pmb{X}}\) numerically:
du=function(X, h=0.0001) {
X=matrix(c(2,-1, 0, -1, 2, -1, 0, -1, 2), 3, 3)
f=function(X) log(det(X))
n=ncol(X)
out=matrix(0, n, n)
for(i in 1:n)
for(j in 1:n) {
X1=X
X1[i,j]=X[i,j]+h
out[i,j]=(f(X1)-f(X))*ifelse(i==j, 1, 2)
}
out/h
}
round(du(X), 1)
## [,1] [,2] [,3]
## [1,] 0.7 1 0.5
## [2,] 1.0 1 1.0
## [3,] 0.5 1 0.7
where the off-diagonal elements are multiplied by 2 because the matrix is symmetric.
Let A be an \(n\times n\) nonsingular matrix with elements \((a_{ij})\) that are functions of a scalar x. Then we define
\[\frac{\partial \pmb{A}}{\partial x} = \left(\frac{\partial \pmb{a}_{ij}}{\partial x} \right)\]
Let \(\pmb{A}\) be nonsingular of order n with derivative \(\frac{\partial \pmb{A}}{\partial x}\). Then
\[\frac{\partial \pmb{A}^{-1}}{\partial x} = -\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x}\pmb{A}^{-1}\]
proof
Because \(\pmb{A}\) is nonsingular we have
\[\pmb{A}^{-1}\pmb{A}=\pmb{I}\]
Therefore
\[\frac{\partial \pmb{A}^{-1}\pmb{A}}{\partial x}=\frac{\partial \pmb{A}^{-1}}{\partial x}\pmb{A}+\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x} = \pmb{O}\]
and so
\[\frac{\partial \pmb{A}^{-1}}{\partial x}\pmb{A}=-\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x}\]
and
\[\frac{\partial \pmb{A}^{-1}}{\partial x}=-\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x}\pmb{A}^{-1}\]
Let’s see what this means for a \(1\times 1\) matrix: \(\pmb{A}=[f(x)]\), so \(\pmb{A}^{-1}=[\frac1{f(x)}]\) and so
\[\frac{\partial \pmb{A}^{-1}}{\partial x}=\frac{\partial }{\partial x}[\frac1{f(x)}]=[-\frac{f'(x)}{f(x)^2}]\]
On the other hand
\[-\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x}\pmb{A}^{-1} = -[\frac1{f(x)}][f'(x)][\frac1{f(x)}] =[-\frac{f'(x)}{f(x)^2}]\]
Note that of course \(\pmb{A}^{-1}\not=[f^{-1}(x)]\)
Consider
\[ \pmb{A} = \begin{pmatrix} x & x^2 \\ 2x & 1 \end{pmatrix} \]
Let’s try and check on this with R:
A=function(x) rbind( c(x, x^2), c(2*x, 1) )
A.inf=function(x) solve(A(x))
x=1.5;h=0.001
A.prime = (A(x+h)-A(x))/h
A.inf.prime = (A.inf(x+h)-A.inf(x))/h
round(A.inf.prime, 2)
## [,1] [,2]
## [1,] 0.45 -0.45
## [2,] -0.98 0.49
round(-solve(A(x))%*%A.prime%*%solve(A(x)), 2)
## [,1] [,2]
## [1,] 0.45 -0.45
## [2,] -0.98 0.49
A.inf=function(x) solve(A(x))
x=2;h=0.001
A.prime = (A(x+h)-A(x))/h
A.inf.prime = (A.inf(x+h)-A.inf(x))/h
round(A.inf.prime, 2)
## [,1] [,2]
## [1,] 0.12 -0.18
## [2,] -0.33 0.16
round(-solve(A(x))%*%A.prime%*%solve(A(x)), 2)
## [,1] [,2]
## [1,] 0.12 -0.18
## [2,] -0.33 0.16
Let \(\pmb{A}\) be a positive definite matrix. Then
\[\frac{\partial \log \vert \pmb{A}\vert}{\partial x}= tr\left(\pmb{A}^{-1}\frac{\partial \pmb{A}}{\partial x}\right)\]
proof omitted
Consider
\[ \pmb{A} = \begin{pmatrix} x & x^2 \\ x^2 & 3 \end{pmatrix} \]
\[ \begin{aligned} &\frac{\partial \log \vert \pmb{A}\vert}{\partial x}= \\ &\frac{\partial \log \left( 3x-x^4 \right)}{\partial x}= \\ &\frac{3-4x^3}{3x-x^4} \\ \end{aligned} \]
and
\[ \begin{aligned} &\frac{\partial A}{\partial x}= \begin{pmatrix} 1 & 2x \\ 2x & 0 \end{pmatrix} \\ &\pmb{A}^{-1} = \frac1{3x-x^4} \begin{pmatrix} 3 & -x^2 \\ -x^2 & x \end{pmatrix}\\ &\pmb{A}^{-1}\frac{\partial A}{\partial x}= \frac1{3x-x^4} \begin{pmatrix} 3 & -x^2 \\ -x^2 & x \end{pmatrix} \begin{pmatrix} 1 & 2x \\ 2x & 0 \end{pmatrix}= \\ &\frac1{3x-x^4} \begin{pmatrix} 3-2x^2 & 6x \\ x^2 & -2x^2 \end{pmatrix} \\ &tr\left(\pmb{A}^{-1}\frac{\partial A}{\partial x}\right) =\\ &\frac1{3x-x^4}\left(3-2x^2 -2x^2\right) = \frac{3-4x^3}{3x-x^4} \end{aligned} \]
If we have a function \(u=f(\pmb{x})\) and we want to find a maxima or minima we may be able to do so by finding solving the equation \(\frac{\partial u}{\partial x}=0\). Often however we need to find a maxima or minima under additional constraints. Let’s denote these by \(h_i(\pmb{x})=0\), i=1,..,q. Using the method of Lagrange multipliers means finding an extremum of the function
\[v=u+\sum_{i=1}^h \lambda_i h_i(\pmb{x})\]
which means solving the system of equations
\[\frac{\partial u}{\partial x} +\sum_{i=1}^q \lambda_i\frac{\partial h_i}{\partial x}=0\] \[h_i(\pmb{x})=0;i=1,..,q\]
The \(\lambda_i's\) are called the Lagrange multipliers
Let \(f(x, y)=2x^2+3xy+y^2\). Find the minimum of f subject to the constraint \(x+2y=1\)
Here \(h(x,y)=x+2y-1\), so
\[ \begin{aligned} &\frac{\partial f}{\partial x} + \lambda \frac{\partial h}{\partial x}=4x+3y+\lambda =0 \text{ }(I)\\ &\frac{\partial f}{\partial y} + \lambda \frac{\partial h}{\partial x}=3x+2y+2\lambda =0 \text{ }(II)\\ &x+2y -1 =0 \text{ }(III)\\ &I-II:x+y-\lambda=0; \lambda=x+y\text{ }(VI)\\ &VI\rightarrow I\text{ }5x+4y=0 \text{ }(V)\\ &V-2II \rightarrow 3x=-2; x=-2/3\\ &y=(1-x)/1=(1-(-2/3))/2=5/6 \end{aligned} \]