next up previous contents
Next: The Exponential Family of Up: Reduction of Data Previous: Sufficient Statistics   Contents

Factorization Criterion

The Theorem stated below is often referred to as the Fisher-Neyman criterion.


Theorem 7..1
Let $X_1, \ldots, X_n$ (or X) denote a random sample from a distribution with density function f(x; $\theta$). Then the statistic T$=$t(X) is a sufficient statistic for $\theta$ if and only if we can find two functions $g$ and $h$ such that

\begin{displaymath}f({\bf x}; \theta)=g(t({\bf x});\theta)h({\bf x})\end{displaymath}

where, for every fixed value of t(x), h(x) does not depend on $\theta$.
(The range space of x for which $f($x$;\theta)>0$ must not depend on $\theta$.

(An aside; heuristic explanation.)

Factorisation is a way of separating the random and non-random components. Only when $t(x)$ is comprehensive enough such that

\begin{displaymath}f(x;\theta) = g\left( t(x); \theta\right) h(x) \ \ ,\end{displaymath}

will it be sufficient information to pin down $\theta$. Conversely, when it is sufficient, the extra ``enhancements'' are redundant.

Proof. For the continuous case, a proof is given in HC 7.2, Theorem 1, where their $k_1$ is our $g$ and their $k_2$ is our $h$.


To use the factorization criterion, we examine the joint density function, f(x; $\theta$) and see whether there is any factorization of the type required in terms of some function t(x). It is usually not easy to use the factorization criterion to show that a statistic $T$ is not sufficient.

Note that the family of distributions may be indexed by a vector parameter $\theta$, in which case the statistic $T$ in the definition of sufficiency can be a vector function of observations, for example, ($\bar{X}, S^2$) or ($X_{min}$, $X_{max}$).



Example 7..3
We will consider again example 1.1, using the factorization criterion.
The joint probability density function of $X_1, \ldots, X_n$ is

\begin{eqnarray*}
f({\bf x}; \theta)&=& e^{-\sum(x_i-\theta)}\prod_{i=1}^nI_{(\t...
...a} I_{(\theta, \infty)}(t)\\
&=&h({\bf x}) g(t({\bf x};\theta))
\end{eqnarray*}



where

\begin{displaymath}h({\bf x})=e^{-\sum x_i}, t({\bf x})=\min x_i \, \mbox{ and }
g(t({\bf x};\theta))=e^{n \theta}I_{(\theta,\infty)}(t).\end{displaymath}

So, by Theorem 1.1, $\min(X_i)$ is a sufficient statistic for $\theta$.



Example 7..4
(Example 4 in HC, 7.2, p319)
Let $X_1, \ldots, X_n$ be a random sample from a $N(\theta,
\sigma^2)$ distribution where $\sigma^2$ is known. Show that $\bar{X}$ is sufficient for $\theta$.
Now

\begin{displaymath}f(x_1;\theta) \ldots f(x_n;\theta)=ce^{-\sum_{i=1}^n(x_i-\theta)^2/2\sigma^2}.\end{displaymath}

Writing $x_i-\theta$ as $x_i-\bar{x}+(\bar{x}-\theta)$, we have,

\begin{displaymath}\sum(x_i-\theta)^2=\sum(x_i-\bar{x})^2 + n(\bar{x}-\theta)^2
+2(\overline{x}-\theta)\underbrace{\sum(x_i-\bar{x})}_{=0}. \end{displaymath}

So the RHS is

\begin{displaymath}ce^{-n(\bar{x}-\theta)^2/2\sigma^2}.
e^{-\sum(x_i-\bar{x})^2/2\sigma^2}=g(\bar{x};\theta)\cdot h({\bf x}), \end{displaymath}

since the first term on the RHS depends on x only through $\bar{x}$ (or $\sum x_i$) and the second term does not depend on $\theta$.


Read HC 7.2, Examples 5, 6.



Example 7..5
Consider a random sample of size $n$ from the uniform distribution, $f(x;\theta)=1/\theta, x \in (0,\theta]$. We will use the factorization criterion to find a sufficient statistic for $\theta$.
The joint density function is

\begin{displaymath}f({\bf x};\theta)=\left\{\begin{array}{ll}
\frac{1}{\theta^n}...
... \theta \mbox{ or } x_i<0 \mbox{ for any }i.
\end{array}\right.\end{displaymath}

This can be written in the form

\begin{displaymath}f({\bf x}:\theta)=g(y_n,\theta)h({\bf x}),\end{displaymath}

where

\begin{displaymath}g(y_n,\theta)=\left\{\begin{array}{ll}
\frac{1}{\theta^n},&\m...
...\theta>y_n,\\
0,&\mbox{ if }\theta \leq y_n,\end{array}\right.\end{displaymath}

and

\begin{displaymath}h({\bf x})=\left\{\begin{array}{ll}1,& \mbox{ if }x_i>0 \mbox{ for all }i,\\
0, & \mbox{ if any }x_i \leq 0.\end{array}\right.\end{displaymath}

Of course, $y_n$ in the above, is the largest order statistic. The factorization criterion is satisfied in terms of the statistic $T=Y_n$, so this statistic is sufficient for $\theta$.

Comment. Note that the joint pdf is $1/\theta^n$ which is just a function of $\theta$, so it would appear that any statistic is sufficient. The fallacy in this argument is that the joint density function is not always given by $1/\theta^n$, but is equal to zero for $x_i \notin
[0, \theta]$. So it really is not just a function of $\theta$. However, we can get it into the required form by taking $T=Y_n$. [Note that if $Y_n < \theta$ then all the $X_i \leq \theta$.]

Although the factorization criterion works here and in other cases where the range space depends on the parameter, one has to be careful, and it is often safer to find the conditional density for the sample given the statistic, rather than use the factorization criterion. This is done below.

The joint pdf of the ordered sample $Y_1<Y_2<\ldots<Y_n$ is

$\left\{ \begin{array}{ll}
\frac{n!}{\theta^n} & \mbox{ for } 0 \leq y_1 \leq \ldots y_n \leq \theta\\
0 & \mbox{ otherwise }, \end{array} \right.$
and the density for $Y_n$ is
$\left\{ \begin{array}{ll}
n y_n^{n-1}/\theta^n & \mbox{ for } 0 \leq y_n \leq \theta\\
0 & \mbox{ otherwise }. \end{array} \right.$
Hence the conditional density of $Y_1, \ldots, Y_n$ given $Y_n$ (which is $\leq \theta$) is
$\left\{ \begin{array}{ll}
(n-1)!/y_n^{n-1} & \mbox{ for } 0 \leq y_1 \leq \ldots \leq y_n\\
0 & \mbox{ otherwise. } \end{array} \right.$
which does not depend on $\theta$.



next up previous contents
Next: The Exponential Family of Up: Reduction of Data Previous: Sufficient Statistics   Contents
Bob Murison 2000-10-31