next up previous contents
Next: Interval Estimates Up: Estimation Previous: When can the MVB   Contents

Properties of Maximum Likelihood Estimates

Statistical inference should be consistent with the assumption that the best explanation of a set of data is provided by the value of $\theta$, ($\hat{\theta}$, say) that maximizes the likelihood function. Estimators derived by the method of maximum likelihood have some desirable properties. These are stated without proof below.

  1. Sufficiency

    It was already established in section 7.6, that if a single sufficient statistic exists for $\theta$, the maximum likelihood estimate of $\theta$ must be a function of it. That is, the mle depends on the sample observations only through the value of a sufficient statistic.

  2. Invariance

    The maximum likelihood estimate is invariant under functional transformations. That is, if $T=t(X_1, \ldots,X_n)$ is the mle of $\theta$ and if $u(\theta)$ is a function of $\theta$, then $u(T)$ is the mle of $u(\theta)$. For example, if $\hat{\sigma}$ is the mle of $\sigma$, then $\hat{\sigma}^2$ is the mle of $\sigma^2$. That is, $\widehat{\sigma^2}=\hat{\sigma}^2$.

  3. Consistency

    The maximum likelihood estimator is consistent.

  4. Efficiency

    If there is a MVB estimator of $\theta$, the method of maximum likelihood will produce it.

  5. Asymptotic Normality

    Under certain regularity conditions, a maximum likelihood estimator has an asymptotically normal distribution with variance $1/I(\theta)$.

  6. MLE of vector parameters

    Define

    \begin{displaymath}
I_{ij}(\theta)=E\left(\frac{\partial \log L(\theta)}{\partia...
...a_i}.
\frac{\partial \log L(\theta)}{\partial \theta_j}\right)
\end{displaymath} (8.13)

    Now the RHS of (8.13) can be expressed as

    \begin{displaymath}\mbox{cov}\left(\frac{\partial
\log L(\theta)}{\partial\theta_i},\frac{\partial \log L(\theta)}{\partial
\theta_j}\right).\end{displaymath}

    As was the case with information on one parameter, $I_{\bf X}(\theta)$ [see (1.6) and (1.10)], there is an alternative formula for computing the terms of the infomation matrix.
    \begin{displaymath}
I_{ij}(\theta)=\mbox{}-E\left(\frac{\partial^2 \log L(\theta)}{\partial
\theta_i \partial \theta_j}\right),
\end{displaymath} (8.14)

    provided certain regularity conditions are satisfied. Define an information matrix, I( ${\boldmath\theta}$) to have elements $I_{i,j}$, then, the mle's $\hat{\theta}_i$, found by solving the set of equations

    \begin{displaymath}\frac{\partial \log L(\theta)}{\partial \theta_i}=0, \quad i=1, 2, \ldots, \end{displaymath}

    have an asymptotically normal distribution with means $\theta_i$ and covariance matrix $[I(\theta)]^{-1}$.



Example 8..8
For a sample of size $n$ from a $N(\mu, \,
\sigma^2)$ distribution, the mle's of $\mu \ (=\theta_1)$ and $\sigma^2 \ (=\theta_2)$ are easily found to be $\overline{x}$, and $\sum(x_i-\overline{x})^2/n$. To find the information matrix, we note that

\begin{eqnarray*}
\frac{\partial^2 \log L}{\partial \mu^2}&=&-n/\sigma^2\\
\fr...
...L}{\partial \sigma^2}\right) &=&\frac{-n(\bar{x}-\mu)}{\sigma^4}
\end{eqnarray*}



Taking expectations (and using (2.14)), we obtain for the information matrix,

\begin{displaymath}I(\theta)=\left[\begin{array}{cc} n/\sigma^2 & 0\\ 0 & n/2 \sigma^4
\end{array}\right] ,\end{displaymath}

and for the covariance matrix

\begin{displaymath}[I(\theta)]^{-1}=\left[\begin{array}{cc}
\sigma^2/n & 0\\ 0&2 \sigma^4/n \end{array}\right]. \end{displaymath}

We note that $\bar{X}$ and $\hat{\sigma}^2$ are asymptotically normally distributed, and independent, with variances given. However, we know that the independence property and the normality and variance of $\bar{X}$ are exact for any $n$. But the normality property and the variance of $\sum(X_i-\bar{X})^2/n$ are strictly limiting ones.


next up previous contents
Next: Interval Estimates Up: Estimation Previous: When can the MVB   Contents
Bob Murison 2000-10-31