next up previous contents
Next: Factorization Criterion Up: Reduction of Data Previous: Frequentist inference   Contents

Sufficient Statistics

[Read HC 7.2. The notation we will use for a statistic is $T({\bf X})=
T(X_1,X_2,\ldots,X_n)$, rather than $Y_1=u(X_1,X_2,\ldots,X_n)$ as they do.]

The idea of sufficiency is that if we observe a random variable $X$ (using a sample $X_1, \ldots, X_n$, or X) whose distribution depends on $\theta$, often X can be reduced via a function, without losing any information about $\theta$. For example,

\begin{displaymath}T(\mbox{\bf X})=T(X_1, \ldots, X_n)=
\sum_{i=1}^nX_i/n,\end{displaymath}

which is the sample mean, may in some cases contain all the relevant information about $\theta$, and in that case T(X) is called a sufficient statistic. That is, knowing the actual n observations doesn't contribute any more to the inference about $\theta$, than just knowing the average of the $n$ observations. We can then base our inference about $\theta$ on T(X), which can be considerably simpler than X (involving a univariate distribution, rather than an n-variate one).



Definition 7..1
A statistic $T=T({\bf X})$ is said to be sufficient for a family of distributions if and only if the conditional distribution of X given the value of $T$ is the same for all members of the family (that is, doesn't depend on $\theta$).


Equivalent definitions for the discrete case and continuous cases respectively are 7.3 and 7.3 below.



Definition 7..2
Let $f(x; \theta), \theta \in \Theta$ be a family of distributions of the discrete type. For a random sample $X_1, \ldots, X_n$ from $f(x;\theta)$, define $T=T(X_1, \ldots, X_n)$. Then $T$ is a sufficient statistic for $\theta$ if, for all $\theta$ and all possible sample points,

\begin{displaymath}
P(X_1=x_1, \ldots, X_n=x_n\vert T=t(x_1, \ldots, x_n))
\end{displaymath} (7.1)

does not involve $\theta$. [Note that the lack of dependence on $\theta$ includes not only the function, but the range space as well.]

Consider here the role of $\theta$. Its job is to represent all the stochastic information of the data. Other information such as the scale of measurement should not be random. So if $T$ is sufficient for $\theta$, then interpretation of the data conditional on $f(t(x_1, \ldots, x_n))$ should remove all the stochastic bits, leaving only the non-random bits.



Definition 7..3
Let $X_1, \ldots, X_n$ be a random sample from a continuous distribution, $f(x; \theta), \theta \in \Theta$. Let $T=T(X_1, \ldots, X_n)$ be a statistic with pdf $f_T(t)$. Then $T$ is sufficient for $\theta$ if and only if

\begin{displaymath}
\frac{f(x_1,\theta) \times f(x_2;\theta) \cdots \times f(x_n;\theta)}{
f_T(t(x_1,x_2,\ldots,x_n);\theta)}
\end{displaymath} (7.2)

does not depend on $\theta$, for every fixed value of t.
Again the range space of the $x_i$ must not depend on $\theta$ either.



Example 7..1
Given $X_1, \ldots, X_n$ is a random sample from a binomial distribution with parameters $m, \theta$, show that $T=\sum_{i=1}^nX_i$ is a sufficient statistic for $\theta$.
Solution. From Definition 1.2(a), we need to consider

\begin{displaymath}P(X_1=x_1, X_2=x_2, \ldots, X_n=x_n\vert\sum X_i=t),\end{displaymath}

and note that the $X_i$ are independent and that $\sum_{i=1}^n X_i \sim \mbox{bin}(
mn,\theta)$. So (1.1) becomes

\begin{displaymath}\frac{{m \choose x_1}\theta^{x_1}(1-\theta)^{m-x_1} \ldots {m...
...\theta^t (1-\theta)^{mn-t}}, \ x_i=0, 1, \ldots m, \sum
x_i=t,\end{displaymath}

which on simplication is

\begin{displaymath}\frac{{m \choose x_1}{m \choose x_2} \ldots {m \choose x_n}}
{{mn \choose t}}\end{displaymath}

which is seen to be free of $\theta$ and the range space of the $x_i$ also.
Hence $\sum_{i=1}^n X_i$ is sufficient for $\theta$.



Example 7..2
Let $X_1, \ldots, X_n$ be a random sample from the truncated exponential distribution, where

\begin{displaymath}f_{X_i}(x_i)=e^{\theta-x_i}, \, x_i > \theta\end{displaymath}

or, using the indicator function notation,

\begin{displaymath}f_{X_i}(x_i)=e^{\theta - x_i}I_{(\theta, \infty)}(x_i).\end{displaymath}

Show that $Y_1=\min(X_i)$ is sufficient for $\theta$.


Solution.
In Definition 1.2(b), $T=T(X_1,\ldots,X_n)=Y_1$ and to examine (1.2) we need $f_T(t)$, the pdf of the smallest order statistic. Now for the pdf above,

\begin{displaymath}F(x)=\int_{\theta}^x e^{\theta-z}dz=e^{\theta}[e^{\theta}-e^{-x}]=1-
e^{\theta-x}.\end{displaymath}

From Distribution Theory (4.6), the pdf of $Y_1$ is

\begin{displaymath}n[1-F(y_1)]^{n-1}f(y_1)=ne^{(\theta-y_1)(n-1)} \times e^{\theta-
y_1}=ne^{n(\theta-y_1)}, y_1 > \theta.\end{displaymath}

So the conditional density of $X_1, \ldots X_n$ given $T=t$ is

\begin{displaymath}\frac{e^{\theta-x_1}e^{\theta-x_2}\ldots e^{\theta-x_n}}
{ne^...
...e^{-\sum x_i}}{ne^{-n y_1}}, \ x_i \geq y_1, \,
i=1, \ldots n,\end{displaymath}

which is free of $\theta$ for each fixed $y_1=\mbox{min}(x_i)$. Note that since $x_i \geq y_1, i=1,\ldots, n$ neither the expression nor the range space depends on $\theta$, so the first order statistic, $Y_1$ is a sufficient statistic for $\theta$.


In establishing that a particular statistic is sufficient, we do not usually use the above definition(s) directly. Instead, a factorization criterion is preferred and this is described in 1.3.


next up previous contents
Next: Factorization Criterion Up: Reduction of Data Previous: Frequentist inference   Contents
Bob Murison 2000-10-31