Suppose that
is the test statistic and
the critical region for a test
of a hypothesis concerning the value of a parameter
. Then the power
function of the test is the probability that the test rejects
, when the
actual parameter value is
. That is,
| (9.1) |
Suppose we want to test the simple
, against
the composite alternative
. Ideally we would like a test to detect a
departure from
with certainty; that is, we would like
to
be
for all
in
, and
to be
for
in
. Since for a fixed sample size, P(rejecting
is true) and
P(not rejecting
is false) cannot both be made arbitrarily small,
the ideal test is not possible.
So long as
is simple, it is possible to define P(Type I error), denoted by
, as P(rejecting
is true). But to allow for
to be
composite, we need the following definitions.
Definition
9..1
The size of a test (or of a critical
region) is
| (9.2) |
Definition
9..2
The size of Type II error is
| (9.3) |
Some statisticians regard the formal approach above, of setting up a rejection
region, as not the most appropriate, and prefer to compute a P-value.
This involves the choice of a test statistic T, the extreme values of which
provide evidence against
. The statistic T should be a good estimator of
and its distribution under
known. After experimentation, an
observed value of T, t say, is examined to see whether it can be considered
extreme in the sense of being unlikely to occur if
were true. The computed P-value is the probability of observing
or
something more extreme. This is the ``
'' at which the observed
value
is just significant.