Approximation Methods

Taylor Approximations

Say we have a r.v. X with density f, a function h and we want to know var(h(X)). Of course by definition we have

\[var(X) = \int_{-\infty}^{\infty} x^2f(x)dx-\left(\int_{-\infty}^{\infty} xf(x)dx\right)^2\]

but sometimes these integrals (sums) are very difficult to evaluate. In this section we discuss some methods for approximating the variance.

Recall: If a function h(x) has derivatives of order r, that is if g(r)(x) exists, then for any constant a the Taylor polynomial of order r is defined by

\[T_r(x)=\sum_{n=0}^r \frac{h^{(n)}(a)}{n!}(x-a)^n\]

One of the most famous theorems in mathematics called Taylor’s theorem states that the remainder of the approximation h(x)-Tr(x) goes to 0 faster than the highest order term:

Theorem (8.10.1)

Taylor’s theorem

\[\lim_{x\rightarrow a}\frac{h(x)-T_r(x)}{(x-a)^r}=0\]

There are various formulas for the remainder term, but we won’t need them here.

Example (8.10.2)

say \(h(x) = \log(x+1)\) and we want to approximate h at x=0. Then we have

\[ \begin{aligned} &h(0) = \log(1)=0\\ &\frac{dh}{dx}\vert_{x=0} = \frac1{x+1}\vert_{x=0}=1\\ &\frac{d^2h}{dx^2}\vert_{x=0} = -\frac1{(x+1)^2}\vert_{x=0}=-1\\ &\frac{d^3h}{dx^3}\vert_{x=0} = \frac2{(x+1)^3}\vert_{x=0}=2\\ &\frac{d^{(r)}h}{dx^r}\vert_{x=0} = \frac{(-1)^{r-1}(r-1)!}{(x+1)^r}\vert_{x=0}=(-1)^{r-1}(r-1)!\\ \end{aligned} \] and so

\[ \begin{aligned} &T_0(x) = h(0) =0\\ &T_1(x) = T_0(x)+\frac{dh}{dx}\vert_{x=0}\cdot(x-0)=x\\ &T_2(x) = T_1(x)+\frac{d^2h}{dx^2}\vert_{x=0}\cdot(x-0)^2=x-x^2/2\\ &T_3(x) = T_2(x)+\frac{d^3h}{dx^3}\vert_{x=0}\cdot(x-0)^3=x-x^2/2+x^3/2\\ &\text{ }\\ &T_r(x) = \sum_{n=0}^r \frac{(-1)^{n-1}(n-1)!}{n!} x^n=\sum_{n=0}^r \frac{(-1)^{n-1}x^n}{n} \end{aligned} \] The approximation is illustrated here:

a <- 0; r <- 3
x <- seq(-0.9, 0.9, length = 250)
h <- rep(0, r+1)
h[1] <- log(a+1)
for (n in 1:r) h[n+1] <- (-1)^(n+1)/n/(a+1)^n
y <- matrix(0, 250, r+1)
y[, 1] <- rep(log(a+1), 250)
for (k in 1:r) y[ , k+1] <- y[, k] + h[k+1]*(x-a)^k
df <- data.frame(x=x, ly=log(x+1))
plt <- ggplot(df, aes(x, ly)) +
        geom_line(size=1.2) + ylab("")
  
for (k in 1:(r + 1)) 
  plt <- plt + 
    geom_line(data=data.frame(x=x, y=y[, k]), aes(x, y), color=k)
plt

One application of this is the

Delta Method

Theorem (8.10.3)

Let \(Y_n\) be a sequence of rv’s that satisfies

\[ \sqrt{n}(Y_n - \theta) \rightarrow N(0, \sigma) \]

in distribution. For a given function h and a specific value of \(\theta\), suppose that \(h'(\theta)\) exist and and is not 0. Then

\[ \sqrt{n}\left(h(Y_n) - h(\theta)\right) \rightarrow N(0, \sigma \vert h'(\theta)\vert) \]

proof

the Taylor expansion of \(h(Y_n)\) around \(Y_n=\theta\) is

\[h(Y_n )=h( \theta )+h'( \theta )(Y_n - \theta )+R\]

where \(R \rightarrow 0\) as \(Y_n \rightarrow \theta\). Now

\[ \begin{aligned} &\sqrt{n}[h(Y_n)-h(\theta) ] = \\ &\sqrt{n}\left[h( \theta )+h'( \theta )(Y_n - \theta )+R-h( \theta) \right] = \\ &h'( \theta )\sqrt{n}(Y_n - \theta )+\sqrt{n}R \rightarrow h'( \theta )X \end{aligned} \]

where \(X \sim N(0,\sigma)\). Also note that \(var(h'( \theta )X)=[h'( \theta )]^2var(X)\).

Example (8.10.4)

say \(X_1, ..,X_n\) iid Exp(1), so E[X]=var(X)=1, then by the CLT

\[\sqrt{n} (\bar{X} -1) \rightarrow N(0,1)\]

Let \(h(x)=x^p\), so \(h'(x)=px^{p-1}\) and by the delta method

\[\sqrt{n}( \bar{X}^p-1) \rightarrow N(0,p)\]

n <- 100; p <- 2; B <- 10000
x <-  matrix(rexp(n*B, 1), ncol=n)
xbar <- apply(x, 1, mean)
print(mean(xbar))
## [1] 1.001464
y <- sqrt(n)*(xbar^p-1)
bw <- diff(range(y))/50 
df <- data.frame(x=y)
ggplot(df, aes(x)) +
  geom_histogram(aes(y = ..density..),
    color = "black", 
    fill = "white", 
    binwidth = bw) + 
    labs(x = "x", y = "Density") +
  stat_function(fun = dnorm, 
                colour = "blue",
                args=list(mean=0, sd=p))


Say we have a sequence of iid rv’s \(X_1, ..,X_n\), each with mean \(\mu \ne 0\) and standard deviation \(\sigma\). We know from the central limit theorem that

\[\sqrt{n}(\bar{X} -\mu) \rightarrow N(0,\sigma)\]

Now let \(h(x)=1/x\), then \(h'(x)=-1/x^2\) and we get

\(\sqrt{n}(1/\bar{X} -1/\mu) \rightarrow N(0,\sigma/\mu^2)\)

say for example \(X_i\sim U[0,1]\), then \(\mu=1/2\) and \(\sigma=1/ \sqrt{12}\), so according to the delta method

\(\sqrt{n}(1/ \bar{X} -1/ \mu ) \sim N(0, \sigma / \mu^2) = N(0,2/ \sqrt{3})\)

n <- 100; B <- 10000
x <-  matrix(runif(n*B), ncol=n)
xbar <- apply(x, 1, mean)
y <- sqrt(n)*(1/xbar - 1/0.5)
bw <- diff(range(y))/50 
df <- data.frame(x=y)
ggplot(df, aes(x)) +
  geom_histogram(aes(y = ..density..),
    color = "black", 
    fill = "white", 
    binwidth = bw) + 
    labs(x = "x", y = "Density") +
  stat_function(fun = dnorm, 
                colour = "blue",
                args=list(mean=0, sd=2/sqrt(3)))

Variance Approximations based on Taylor’s Theorem

For our purposes we will need only first-order approximations (that is using the first derivative) but we will need a multivariate extension as follows: say X1, ..,Xn are r.v. with means \(\mu_1, .. , \mu_n\) and define X=(X1, ..,Xn) and \(\pmb{\mu}=( \mu_1, .. , \mu_n)\). Suppose there is a differentiable function h(X) for which we want an approximate estimate of the variance. Define

\[h'_i(\pmb{\mu})=\frac{\partial h(\pmb{t})}{\partial t_i}\vert_{\pmb{t=\mu}}\]

Then first order Taylor expansion of h about \(\pmb{\mu}\) is

\[h(\pmb{t})=h(\pmb{\mu})+\sum_{i=1}^n h'_i(\pmb{\mu})(t_i-\mu_i)+\text{Remaindeer}\] Forgetting about the remainder we have

\[ \begin{aligned} &E[h(\pmb{X})] \approx E[h(\pmb{\mu})+\sum_{i=1}^n h'_i(\pmb{\mu})(X_i-\mu_i)] = \\ &h(\pmb{\mu})+\sum_{i=1}^n h'_i(\pmb{\mu})(E[X_i]-\mu_i) = h(\pmb{\mu}) \end{aligned} \]

\[ \begin{aligned} &var(h(\pmb{X})) \approx E[(h(\pmb{X})-h(\pmb{\mu})^2] \approx\\ &E\left[\left(\sum_{i=1}^n h'_i(\pmb{\mu})(X_i-\mu_i)\right)^2\right] = \\ &E\left[\sum_{i,j=1}^n h'_i(\pmb{\mu})(X_i-\mu_i)h'_j(\pmb{\mu})(X_j-\mu_j)\right] = \\ &E\left[ \sum_{i=1}^n (h'_i(\pmb{\mu}))^2(X_i-\mu_i)^2\right] +\\ &2E\left[\sum_{i<j=1}^n h'_i(\pmb{\mu})h'_j(\pmb{\mu})(X_i-\mu_i)(X_j-\mu_j)\right]=\\ &\sum_{i=1}^n (h'_i(\pmb{\mu}))^2E\left[(X_i-\mu_i)^2\right] +\\ &2\sum_{i<j=1}^n h'_i(\pmb{\mu})h'_j(\pmb{\mu})E\left[(X_i-\mu_i)(X_j-\mu_j)\right]=\\ &\sum_{i=1}^n (h'_i(\pmb{\mu}))^2var(X_i) +2\sum_{i<j=1}^n h'_i(\pmb{\mu})h'_j(\pmb{\mu})cov\left(X_i,X_j\right) \end{aligned} \]

Example (8.10.5)

Say we have just one rv X, then the formula simplifies to

\[var(h(X))\approx(h'(\mu))^2var(X)\]

say \(X\sim N( \mu ,1)\) with \(\mu\) large enough so that P(X>0)=1. We want to find \(var(\log(X))\). Set h(x)=log(x), then h’(x)=1/x and

\[var(\log(X))\approx(\frac1{\mu})^2\times 1 = \frac1{\mu^2}\]

check with

var(log(rnorm(10000, 10)))
## [1] 0.01023906

Example (8.10.6)

say we have a sample X1, ..,Xn from a Bernoulli r.v. with success parameter p, that is P(X=1)=p=1-P(X=0). One popular measure of the probability of winning a game is the odds p/(1-p). For example when you roll a fair die the odds of getting a six are (1/6)/(1-(1/6) = 1:5.

An obvious estimator for p is \(\hat{p}\), the sample mean, or here the proportion of “successes” in the n trials. Then an obvious estimator for the odds is \(\hat{p}/(1-\hat{p})\). The question is, what is the variance of this estimator?

First note that

\[ \begin{aligned} &var(\hat{p}) = \\ &var(1/n \sum X_i) = \\ &\frac1{n^2} \sum var(X_i) = \\ &\frac1n var(X_1)=\\ &p(1-p)/n \end{aligned} \] Using the above approximation we get the following: let h(p)=p/(1-p), so h’(p)=1/(1-p)2 and

\[ \begin{aligned} &var(\frac{\hat{p}}{1-\hat{p}}) \approx (h'(p))^2var(\hat{p}) = \\ &\left[\frac1{(1-p)^2}\right]^2\frac{p(1-p)}{n} =\frac{p}{n(1-p)^3} \end{aligned} \]

p <- 0.25; n <- 25; B <- 10000
x <- matrix(rbinom(B, 1, p), ncol=n)
phat <- apply(x, 1, mean)
odds <- phat/(1 - phat)
round(c(var(odds), p/n/(1 - p)^3), 4)
## [1] 0.0246 0.0237

Say we have two rv’s X and Y and \(X \perp Y\), then the formula simplifies to

\[var(h(X,Y))\approx h_x^2(\mu_x,\mu_y)var(X)+h^2_y(\mu_x,\mu_y)var(Y)\]

Example (8.10.7)

say X and Y have a geometric distribution with parameters p and r, respectively. We want to approximate the variance of \(\sqrt{X^2+Y^2}\)

Now \(\mu_X=1/p, var(X)=(1-p)/p^2, \mu_Y=1/r, var(Y)=(1-r)/r^2\)

let \(h(x,y)= \sqrt{x^2+y^2}\), then

\[ \begin{aligned} &\frac{dh}{dx} = \frac{x}{x^2+y^2}\\ &\frac{dh}{dy} = \frac{y}{x^2+y^2}\\ &var(\sqrt{X^2+Y^2}) =\\ &\left(\frac{\mu_x}{\mu_x^2+\mu_y^2}\right)^2var(X)+\left(\frac{\mu_y}{\mu_x^2+\mu_y^2}\right)^2var(Y)=\\ &\left(\frac{1/p}{1/^2+1/r^2}\right)^2\frac{1-p}{p^2}+\left(\frac{1/r}{1/^2+1/r^2}\right)^2\frac{1-r}{r^2}=\\ &\frac{1-p}{p^2(1+(p/r)^2)}+\frac{1-r}{r^2(1+(r/p)^2)} \end{aligned} \]

p <- 0.2; r <- 0.3; B <- 10000
x <- rgeom(B, p)+1
y <- rgeom(B, r)+1
round(c(var(sqrt(x^2+y^2)),
  (1-p)/(p^2*(1+(p/r)^2))+(1-r)/(r^2*(1+(r/p)^2))), 3)
## [1] 19.606 16.239

Example (8.10.8)

let’s consider the random vector with joint density \(f(x,y) = 1\), \(0<x,y< 1\).

Say we want to find var(X/Y). Of course \(X,Y\sim U[0,1]\) and independent, so we know E[X]=E[Y]=1/2 and var(X)=var(Y)=1/12.

If we consider the function h(x,y) = x/y we have \(h_x(x,y)=1/y\) and \(h_y(x,y)=-x/y^2\) and so

\[ \begin{aligned} &var(X/Y)\approx h_x^2(\mu_x,\mu_y)var(X)+h^2_y(\mu_x,\mu_y)var(Y) = \\ &(\frac1{1/2})^2\frac1{12}+(-\frac{1/2}{(1/2)^2})^2\frac1{12} = \frac23 \end{aligned} \]

How good is this approximation?

var(runif(10000)/runif(10000))
## [1] 1649682

shows that it is actually very bad! The reason is that occasionally the denominator is very small, so the ratio is very big.

Let’s change the problem a little: now \(f(x,y) = 1\), \(1<x,y<2\).

that is \(X,Y\sim U[1,2]\), so E[X]=E[Y]=3/2, var(X)=var(Y)=1/12. Now

\[ \begin{aligned} &var(X/Y)\approx h_x^2(\mu_x,\mu_y)var(X)+h^2_y(\mu_x,\mu_y)var(Y) = \\ &(\frac1{3/2})^2\frac1{12}+(-\frac{3/2}{(3/2)^2})^2\frac1{12} = \frac2{27} \end{aligned} \]

and this is actually quite good:

round(c(2/27, var(runif(10000,1,2)/runif(10000,1,2))), 4)
## [1] 0.0741 0.0859

Generally ratios are often trouble!

Example (8.10.9)

let’s consider the random vector with joint density \(f(x,y) = 6x\), \(0<x<y< 1\)

Say we want to find car(X/Y)

First we have

\[ \begin{aligned} &f_x(x) =\int_{x}^1 6x dy = 6x(1-x);0<x<1 \\ &X\sim Beta(2,2) \\ &E[X] =1/2 \\ &var(x) = \frac{2\cdot 2}{(2+2)^2(2+2+1)}=\frac1{20}\\ &\text{ }\\ &f_y(y) =\int_{0}^y 6x dx = 3y^2;0<y<1 \\ &X\sim Beta(3,1) \\ &E[X] =3/4 \\ &var(x) = \frac{3\cdot 1}{(3+1)^2(3+1+1)}=\frac3{80}\\ &\text{ }\\ &E[XY] = \int_0^1 \int_{0}^y xy6x dydxdy \frac25\\ &cov(X,Y)=E[XY]-E[X]E[Y] = \frac1{40} \end{aligned} \]

so

\[ \begin{aligned} &var(X/Y)\approx \\ &h_x^2(\mu_x,\mu_y)var(X)+h^2_y(\mu_x,\mu_y)var(Y)+2h_x(\mu_x,\mu_y)h_y(\mu_x,\mu_y)cov(X,Y) = \\ &(\frac1{3/4})^2\frac1{20}+(-\frac{1/2}{(3/4)^2})^2\frac3{80}+2(\frac1{3/4})(-\frac{1/2}{(3/4)^2})\frac1{40} = 0.058 \end{aligned} \]

and this is quite good:

x <- rbeta(10000, 2, 2)
y <- runif(10000, x, 1)
round(var(x/y), 4)
## [1] 0.0561

Example (8.10.10)

Let \((X,Y,Z)\) be a multivariate normal with mean vector (1, 1, 1) and variance-covariance matrix

\[ \begin{bmatrix} 1 & 0.5 & 0.8\\ 0.5 & 1 & -0.2\\ 0.8& -0.2 & 1 \end{bmatrix} \]

Find an approximation to \(var(\sqrt{X^2+Y^2+Z^2})\)

\[ \begin{aligned} &h(x,y,z) =\sqrt{x^2+y^2+z^2} \\ &\frac{\partial h}{\partial x} (1, 1, 1) =\frac{x}{\sqrt{x^2+y^2+z^2}} = \frac1{\sqrt{3}} \end{aligned} \]

and by symmetry the same holds for the other derivatives. So

\[ \begin{aligned} &var(\sqrt{X^2+Y^2+Z^2}) \approx \\ &(\frac1{\sqrt 3})^2 \left(\sum var(X_i) +2 \sum_{i<j} cov(X_i, X_j)\right) =\\ &\frac13 \left(1+1+1+2(0.5+0.8-0.2) \right) =1.7 \end{aligned} \]

library(mvtnorm)
mu <- rep(1, 3)
vc <- cbind(c(1, 0.5, 0.8),
            c(0.5, 1, -0.2),
            c(0.8, -0.2, 1))
x <- rmvnorm(1e4, mu, vc)
var(sqrt(x[, 1]^2+x[, 2]^2+x[, 3]^2))
## [1] 1.129622