prediction

Prediction Intervals

Frequentist Solution

There is a different kind of interval estimation problem where the interest is not in a parameter but in a future observation. These type of intervals are called prediction intervals.

Example (6.3.1)

In some experiment each day we collect an observation from a normal distribution with mean \(\mu\) and standard deviation \(\sigma\). Say so far we have data \(x_1,..x_{n}\). Tomorrow we will carry out this experiment again and collect observation \(Y\). Find a \((1-\alpha)100\%\) confidence interval for \(Y\), that is find \(L\) and \(U\) such that

\[P(L(\pmb{x})<Y<U(\pmb{x}))=0.9\]

Let’s consider the random variable \(Z=\bar{X}-Y\). We know that Z has a normal distribution with mean 0 and

\[ \begin{aligned} &var(Z) = var(\bar{X}-Y) = \\ &var(\bar{X})+var(Y) =var(\frac1n \sum_{i=1}^n X_i)+var(Y) = \\ &\frac1{n^2 }\sum_{i=1}^n var(X_i)+var(Y) = \\ &\frac1{n^2 }n\sigma^2+\sigma^2=(1+1/n)\sigma^2 \end{aligned} \]

and so Z has standard deviation \(\sigma \sqrt{1+1/n}\). Therefore

\[ \begin{aligned} &1-\alpha=P(|Z/(\sigma \sqrt{1+1/n})|<z_{\alpha/2}) = \\ &P(|Z|< z_{\alpha/2}\sigma \sqrt{1+1/n}) = \\ &P(-z_{\alpha/2}\sigma \sqrt{1+1/n})<\bar{X}-Y< z_{\alpha/2}\sigma \sqrt{1+1/n}) = \\ &P(\bar{X}-z_{\alpha/2}\sigma \sqrt{1+1/n})<Y<\bar{X}+ z_{\alpha/2}\sigma \sqrt{1+1/n}) \\ \end{aligned} \]

and so a \((1-\alpha)100\%\) confidence interval for Y is given by

\[\left(\bar{x}-z_{\alpha/2}\sigma \sqrt{1+1/n}, \text{ }\bar{x}+z_{\alpha/2}\sigma \sqrt{1+1/n}\right)\]

As a numerical example consider:

mu=10;sigma=1;n=20;alpha=0.05
x=rnorm(n, mu, sigma)
xbar=mean(x)
round(xbar+c(-1,1)*qnorm(1-alpha/2)*sigma/sqrt(1+1/n), 2)

## [1]  8.02 11.85

Notice that this interval does not shrink to a point as n goes to infinity. This makes sense because Y is random and we can never expect to be able to predict it perfectly.

Notice that the derivation of the variance of Z did not depend on Z having a normal distribution. It holds whenever \(\hat{x}\) is used as an estimator.

Bayesian Solution

In a Bayesian setup one needs to find the posterior predictive distribution, which is defined as

\[f(y\vert \pmb{x})=\int p(y\vert\theta,\pmb{x})p(\theta\vert \pmb{x}) d\theta\]

Often one can ssume that the future observation is independent of the sample, and so this simplifies to

\[f(y\vert \pmb{x})=\int p(y\vert\theta)p(\theta\vert \pmb{x}) d\theta\]

Example (6.3.2)

Say \(X_1,..,X_n\sim N(\mu ,\sigma)\), \(\sigma\) known, and \(\pi(\mu)=1\). Then we know from (3.2.6) that \(\mu|\pmb{X=x}\sim N(\bar{x},\sigma/\sqrt n)\) and so

\[ \begin{aligned} &f(y\vert x)=\int p(y\vert\mu)p(\mu\vert \pmb{x}) d\mu = \\ &\int_{-\infty}^{\infty} \frac1{\sqrt{2\pi \sigma^2}}\exp\left\{-\frac1{2\sigma^2}(y-\mu)^2\right\} \frac1{\sqrt{2\pi \sigma^2/n}}\exp\left\{-\frac1{2\sigma^2/n}(\mu-\bar{x})^2\right\} d\mu = \\ &\int_{-\infty}^{\infty} \frac1{2\pi\sigma^2/\sqrt{n}}\exp\left\{-\frac{1}{2\sigma^2}\left[(y-\mu)^2+n(\mu-\bar{x})^2\right]\right\}d\mu \end{aligned} \]

Now in the brackets in the exponential we have

\[ \begin{aligned} &(\mu-y)^2+n(\mu-\bar{x})^2=\\ &\mu^2-2y\mu+y^2+n\mu^2-2n\bar{x}\mu+n\bar{x}^2=\\ &(n+1)\mu^2-2(y+n\bar{x})\mu+(y^2+n\bar{x}^2)=\\ &(n+1)\left(\mu^2-2\frac{y+n\bar{x}}{n+1}\mu\right)+(y^2+n\bar{x}^2)=\\ &(n+1)\left(\mu^2-2\frac{y+n\bar{x}}{n+1}\mu+\left(\frac{y+n\bar{x}}{n+1}\right)^2-\left(\frac{y+n\bar{x}}{n+1}\right)^2\right)+(y^2+n\bar{x}^2)=\\ &(n+1)\left(\mu^2-2\frac{y+n\bar{x}}{n+1}\mu+\left(\frac{y+n\bar{x}}{n+1}\right)^2\right)-(n+1)\left(\frac{y+n\bar{x}}{n+1}\right)^2+(y^2+n\bar{x}^2)=\\ &(n+1)\left(\mu-\frac{y+n\bar{x}}{n+1}\right)^2-\frac{(y+n\bar{x})^2}{n+1}+(y^2+n\bar{x}^2)=\\ &(n+1)\left(\mu-\frac{y+n\bar{x}}{n+1}\right)^2+\frac{n(y-\bar{x})^2}{n+1} \end{aligned} \] and so

\[ \begin{aligned} &f(y\vert x)=\int p(y\vert\mu)p(\mu\vert \pmb{x}) d\mu = \\ &\int_{-\infty}^{\infty} \frac1{2\pi\sigma^2/\sqrt{n}}\exp\left\{-\frac{1}{2\sigma^2}\left[(n+1)\left(\mu-\frac{y+n\bar{x}}{n+1}\right)^2+\frac{n(y-\bar{x})^2}{n+1}\right]\right\}d\mu=\\ &\frac1{2\pi\sigma^2/\sqrt{n}}\frac{\sqrt{2\pi\sigma^2}}{\sqrt{n+1}}\exp\left\{-\frac{1}{2\sigma^2}\left[\frac{n(y-\bar{x})^2}{n+1}\right]\right\}\times\\ &\int_{-\infty}^{\infty} \frac{\sqrt{n+1}}{\sqrt{2\pi\sigma^2}}\exp\left\{-\frac{n+1}{2\sigma^2}\left[\left(\mu-\frac{ny+\bar{x}}{n+1}\right)^2\right]\right\}d\mu=\\ &\frac1{2\pi\sigma^2/\sqrt{n}}\frac{\sqrt{2\pi\sigma^2}}{\sqrt{n+1}}\exp\left\{-\frac{1}{2\sigma^2}\left[\frac{n(y-\bar{x})^2}{n+1}\right]\right\}\\ &\frac1{\sqrt{2\pi\sigma^2(1+1/n)}}\exp\left\{-\frac{\left(y-\bar{x}\right)^2}{2\sigma^2(1+1/n)}\right\} \end{aligned} \] so \(y|\pmb{X=x}\sim N(\bar{x},(1+1/n) \sigma^2)\). And so credible intervals based on the posterior predictive distribution are the same as the frequentist confidence intervals.