A Longer Example - Estimation

Say \(X_1, .., X_n\) are iid F with

\[f(x|a)=ax^{a-1}\text{, }0<x<1 \text{, }a>0\] (or simply \(X\sim Beta(a,1)\)). Note that the cdf is given by \(F(x)=x^a;0<x<1\).

First let’s find the method of moments estimator and the maximum likelihood estimator of a:

\[E[X]=\frac{a}{a+1}\] so the method of moments estimator is \(\hat{a}_1=\frac{\bar{x}}{1-\bar{x}}\).

\[ \begin{aligned} &f(\pmb{x}\vert a) =\prod_{i=1}^n (ax_i^{a-1}) = a^n\prod_{i=1}^n (x_i^{a-1}) \\ &l(a\vert\pmb{x}) = n\log a +(a-1)\sum_{i=1}^n\log x_i\\ &\frac{dl}{da} = \frac{n}{a}+\sum_{i=1}^n\log x_i=0\\ &\hat{a}_2 = -\frac{n}{\sum_{i=1}^n\log x_i} = n/T \end{aligned} \]

where \(T=-\sum \log x_i\).

Next we find the Bayes estimator of a if the prior is Exp(1).

\[ \begin{aligned} &f(\pmb{x}, a) = a^n\prod_{i=1}^n (x_i^{a-1})e^{-a} = \\ &a^n\exp \left\{(a-1)\sum_{i=1}^n\log x_i -a\right\} = \\ &a^n\exp \left\{(a-1)(-T) -a\right\} = \\ &a^n\exp \left\{T-(T+1)a\right\} = \\ \end{aligned} \] the marginal is

\[ \begin{aligned} &m(\pmb{x}) =\int_0^\infty a^n\exp \left\{T-(T+1)a\right\} da=\\ &\frac{e^T}{(T+1)^{n+1}}\int_0^\infty [(T+1)a]^n\exp \left\{-(T+1)a\right\}[(T+1)da] = \\ &\frac{e^T}{(T+1)^{n+1}}\int_0^\infty x^n\exp \left\{-x\right\}dx =\\ &\frac{e^T}{(T+1)^{n+1}}\Gamma(n+1) \end{aligned} \] and so the posterior distribution is

\[f(a\vert\pmb{x})=\frac{a^n\exp \left\{T-(T+1)a\right\}}{\frac{e^T}{(T+1)^{n+1}}\Gamma(n+1)}=\frac{(T+1)^{n+1}}{\Gamma(n+1)}a^ne^{-(T+1)a}\] so \(a|\pmb{x}\sim Gamma(n+1,\frac1{T+1})\)

Finally we need to “extract” on number from the posterior density. We can again use either

  • mean: \(\hat{a}=\frac{n+1}{T+1}\)

  • median: \(\hat{a}=qgamma(0.5, n+1,\frac1{T+1})\)

  • mode \(\hat{a}=\frac{n}{T+1}\)

Note that for this prior the estimators from the Bayesian method are essentially the same as the mle.

Say instead we have some prior knowledge that \(a\sim N^{+}(1,1)\), where \(N^{+}\) is a normal distribution truncated at 0. That is \(P(a<x|a>0)\). It is easy to show that this has density \(\phi(a, 1, 1)/(1-\Phi(0,1,1))\). Now

\[m(\pmb{x}) =\int_0^\infty a^n\exp \left\{T-(T+1)a\right\}\frac1{\sqrt{2\pi}}e^{-(a-1)^2/2}/(1-\Phi(0,1,1)) da\]

and this integral seems difficult to evaluate. There is a solution that still works, though:

\[ \begin{aligned} &\frac{d}{da}\left[\log \frac{f(\pmb{x},a)}{m(\pmb{x})}\right] = \frac{d}{da}\log f(\pmb{x},a) =\\ &\frac{d}{da}\log\left[ a^n\exp \left\{T-(T+1)a\right\}\frac1{\sqrt{2\pi}}e^{-(a-1)^2/2}/(1-\Phi(0,1,1))\right] = \\ &\frac{d}{da} \left[n\log a-\log\sqrt{2\pi}+(a-1)T-\frac12(a-1)^2 \right] = \\ &\frac{n}{a}+T-(a-1)=0 \end{aligned} \]

so if we use the mode of the posterior density as our estimator we don’t need to find \(m(\pmb{x})\).

What properties do these estimators have?

Unbiasedness

is \(\bar{x}/(1-\bar{x})\) unbiased for a? To find out we would need to first find the density of \(\bar{x}\), but in this case that is not possible in this generality. Instead we can run a simulation:

n <- 20; a0 <- 5; B <- 10000
x <- matrix(rbeta(n*B, a0, 1), ncol=n)
xbar <- apply(x, 1, mean)
xhat <- xbar/(1-xbar)
round(mean(xhat), 2)
## [1] 5.22

this seems to suggest that the estimator is biased for larger a.

How about the mle? First note that

\[P(-\log X_i\le x)=P(X_i>e^{-x})=1-e^{-ax}\] so \(-\log X_i\sim Exp(a)\). and therefore

\[T=-\sum_{i=1}^n \log X_i\sim Gamma(n,1/a)\] which means

\[ \begin{aligned} &E[\hat{a}_2] =E[n/T] = \\ &\int_0^\infty \frac{n}{t} \frac{a^n}{\Gamma(n)}t^{n-1}e^{-at}dt = \\ &\frac{na}{n+1}\int_0^\infty \frac{a^{n-1}}{\Gamma(n-2)}t^{(n-1)-1}e^{-at}dt = \frac{na}{n+1}\\ \end{aligned} \]

so the mle is almost unbiased.

Sufficiency:

\[f(\pmb{x}\vert a) = a^ne^{-(a-1)T}\]

so the mle is a sufficient statistic for a.

Ancillary Statistic

\[P(T<x) = \int_0^x \frac{a^n}{\Gamma(n)}t^{n-1}e^{-at}dt=\int_0^{ax} \frac{1}{\Gamma(n)}z^{n-1}e^{-z}dz\]

using the change of variables z=at. Therefore

\[P(aT<x)=P(X<x/a)=\int_0^{x} \frac{1}{\Gamma(n)}z^{n-1}e^{-z}dz\]

So the distribution of \(aT\) does not depend on a, it is an ancillary statistic.

Consistency:

From the WLLN we know that \(\bar{x}\rightarrow E[X_1]=a/(a+1)\), so

\[ \begin{aligned} &P\left(\vert\frac{\bar{X}}{1-\bar{X}}-a\vert<\epsilon \right) = \\ &P\left(a-\epsilon<\frac{\bar{X}}{1-\bar{X}}<a+\epsilon \right) = \\ &P\left(\frac{a-\epsilon}{1+a-\epsilon}<\bar{X}<\frac{a+\epsilon}{1+a+\epsilon} \right) = \\ &P\left(\frac{a-\epsilon}{1+a-\epsilon}-\frac{a}{a+1}<\bar{X}-\frac{a}{a+1}<\frac{a+\epsilon}{1+a+\epsilon}-\frac{a}{a+1} \right) =\\ &P\left(\frac{-\epsilon}{(1+a-\epsilon)(a+1)}<\bar{X}-\frac{a}{a+1}<\frac{\epsilon}{(1+a+\epsilon)(a+1)} \right) \ge\\ &P\left(-M_a\epsilon<\bar{X}-\frac{a}{a+1}<M_a\epsilon \right) =\\ &P\left(\vert \bar{X}-\frac{a}{a+1}\vert <M_a\epsilon \right) \rightarrow 1\\ \end{aligned} \] as \(n\rightarrow\infty\), where

\[M_a=\max\{\frac1{(1+a-\epsilon)(a+1)},\frac{1}{(1+a-\epsilon)(a+1)}\}\] so we see that the method of moments estimator is a consistent estimator of a.


Note: it is in general not true that if \(X_n\rightarrow x\) in probability then \(g(X_n)\rightarrow g(x)\) for any function g.


How about the mle? Again from the WLLN we have

\[-1/n\sum \log X_i\rightarrow E[-\log X_1] = 1/a\]

Now

\[ \begin{aligned} &P\left(\vert\hat{a}_2-a\vert<\epsilon\right) = \\ &P\left(a-\epsilon<\frac{-n}{\sum \log X_i}<a+\epsilon\right) = \\ &P\left(\frac1{a+\epsilon}<-\frac1n \sum \log X_i<\frac1{a-\epsilon}\right) = \\ &P\left(\frac1{a+\epsilon}-\frac1{a}<-\frac1n \sum \log X_i-\frac1{a}<\frac1{a-\epsilon}-\frac1{a}\right) = \\ &P\left(\frac{-\epsilon}{a(a+\epsilon)}<-\frac1n \sum \log X_i-\frac1{a}<\frac{\epsilon}{a(a-\epsilon)}\right) \ge \\ &P\left(-M_a\epsilon<-\frac1n \sum \log X_i-\frac1{a}<M_a\epsilon\right) = \\ &P\left(\vert-\frac1n \sum \log X_i-\frac1{a}\vert <M_a\epsilon\right) \rightarrow 1 \\ \end{aligned} \] as \(n\rightarrow\infty\), where

\[M_a=\max\{\frac1{a(a-\epsilon)},\frac{1}{a(a+\epsilon)}\}\] and so the mle is a consistent estimator of a as well.

Relative Efficiency:

The relative efficiency of the two estimators is the ratio of their variances, unfortunately the variance of the method of moments estimator can not be calculated directly. For any specific case of n and a we could use simulation to find the variance.

As for the variance of the MLE we find

\[E[(\hat{a}_2)^2]=\int_0^\infty (\frac{n}{t})^2 \frac{a^n}{\Gamma(n)}t^{n-1}e^{-at}dt =\frac{n^2a^2}{(n-1)(n-2)}\]

and so

\[var(\frac{n-1}{n}\hat{a}_2)=\frac{a^2}{n-2}\]

Rao-Cramer lower bound:

\[ \begin{aligned} &f(x|a)=ax^{a-1} \\ &\log f(x|a) =\log a+(a-1)\log x \\ &\frac{d \log f(x|a)}{da} = 1/a+\log x\\ &\frac{d^2 \log f(x|a)}{da^2} = -1/a^2\\ &-E\left[\frac{d^2 \log f(X|a)}{da^2}\right] = 1/a^2\\ \end{aligned} \]

and so for any unbiased estimator T we have \(var(T)\ge a^2/n\), so the MLE does not achieve the lower bound, although the MLE is asymptotically efficient.

Robustness: in this case \(0<x<1\), so robustness is not an issue.