next up previous contents
Next: Distribution of Order Statistics Up: Order Statistics Previous: Order Statistics   Contents

Introduction

Parametric statistics allows us to reduce the data to a few parameters which makes it easier to interpret the data. Statistics such as the mean and variance describe the pattern of random events and allow us to evaluate the probability of events of interest. Under the assumption that the data follow a known parametric distribution, we estimate the parameters from the data. However, the usefulness of the parameters depends on the assumptions about the data being reliable and this is not necessarily guaranteed.

One strategy for interpreting data without stringent assumptions, is to use order statistics.

Read HC 4.6. In the following, we will use the notation of HC where the pdf of the random variable $X$ is denoted by $f(x)$, rather than $f_X(x)$.

pdf of the random variable $X$ is denoted by $f(x)$, rather than $f_X(x)$.


Definition 5..1
Let $X_1$, $X_2$, ...$X_n$ denote a random sample from a continuous distribution with pdf $f(x), a<x<b$. Let $Y_1$ be the smallest of these, $Y_2$ the next $X_i$ in order of magnitude, etc. Then $Y_i, \, i=1, 2, \ldots, n$ is called the ith order statistic of the sample, and $(Y_1, Y_2, \ldots, Y_n)$ the vector of order statistics. We may write $Y_1<Y_2<\ldots<Y_n$. The following alternative notation is also common; $X_{(1)}<X_{(2)},< \ldots,<X_{(n)}$.

Order statistics are non-parametric and only rely upon the weak assumption that the data are samples from a continuous distribution. we pick up information by ordering the data. If we know the underlying distribution, we can combine that knowledge with the rank of the order statistic of interest. For instance, if the underlying distribution is normal, $Y_{50}$ from a sample of size 101 will have a higher probability of being near the median than $Y_{10}$ or $Y_{90}$. But without ordering, the same could not be said for $X_{10}, X_{50},X_{90}$. So the ordering gives us extra information and we shall now explore the densities of order statistics, denoted $f_{Y_r}(y)$ etc.

Example 1 Suppose you were required to assess the ability to handle a crowd at a railway station with regard to stair width, staff etc. The statistic of interest is $Y_n$.

Example 2 An oil product freezes at $\approx 10^\circ $C and the company ponders whether it should market it in a cold climate. We would require the density of the minimum order statistic, $f_{Y_1}(y)$ to assess the risk of the product failing.

Examples of other Other situations where an order statistic is of interest are listed below.

  1. Largest component $Y_n$: maximum temperature, highest rainfall, maximum storage capacity of a dam, etc.
  2. Smallest component $Y_1$: minimum temperature, minimum breaking strength of rope, etc.
  3. Median. Median income, median examination mark, etc.

Order statistics are useful for summarizing data but may be limited for detailed descriptions of some process which has been measured. Order statistics are also ingredients for higher level statistical procedures.

Figure: The cdf of order statistics, steps of $\frac{1}{n+1}$.
\includegraphics[width=12cm,height=8cm]{NOTES/DISTNTH/ORDERSTATS/ostats.1}




Figure 5.1 shows the sample cdf as a step function increasing by $\frac{1}{n+1}$ at each order statistic. We can make statements about individual order statistics by borrowing information provided by the entire set. Remember that all we assumed about the original data was that it were continuous; there were no assumptions about the distribution. But now that the data are ordered, we can use the extra information provided by the ordering to derive density functions.

The data, $X_1 \ldots X_n$ might be independent but the ordered data $Y_1 \ldots Y_n$, are not.

To begin our study of order statistics we first want to find the joint distribution of $Y_1, \ldots, Y_n$.


next up previous contents
Next: Distribution of Order Statistics Up: Order Statistics Previous: Order Statistics   Contents
Bob Murison 2000-10-31