Measures of location
Often it is not possible to list all the data or draw a histogram; it would be
nice to have one number which best represents a data set. Often where the data
lies is of interest, for which purpose a measure of location is useful. There
are several measures of location, which we shall illustrate with the data sets
A={2, 9, 5, 3, 8}, B={1, 4, 7, 3, 9, 2}, and the
weights of students of a previous lesson.
The minimum is the smallest value in a data set. It is often useful to put
data
in rank order when studying it, in which case A would be represented as
{2, 3, 5, 8, 9} and B as {1, 2, 3, 4, 7, 9}, and the
rank order of the weights was given before.
From these rank order listings, it is immediate that the minimum of A is 2,
the
minimum of B is 1, and the minimum of the weights is 105.
The maximum is the largest datum in a data set. From the above rank order
listings, it is immediate that the maximum of A is 9, the maximum of B is 9,
and the maximum of the weights is 235.
The midrange is the middle value in the sense that it is halfway between
the maximum and minimum. It is computed as (maximum+minimum)/2. The midrange
for data set A is (9+2)/2=5.5, the midrange for data set B is (9+1)/2=5, the
midrange for the weights is (235+105)/2=170. The midrange is easy to
calculate,
but because it is defined by the two extreme data, it may not be representative
of where most of the data lie.
The median is the middle value in the sense that half the data are above it,
and
half the data are below it. If there are an odd number of data points, the
median is the middle value, e.g., 5 for data set A. If there are an even
number
of data, the median is half way between the two middle values, e.g.,
(3+4)/2=3.5 for data set B and (155+155)/2=155 for the weights. When finding
the median, make sure the data are in rank order, and each value has been
listed as often as it occurs. The median is perhaps the best indicator of
where the
data lies, being truly amid the data values. Some
comments on the median by Stephen Jay Gould may be of interest.
The mean (which is represented as an overscored x which is pronounced x-bar)
is calculated by adding up all the data values and dividing by the
number of data (usually denoted by n). This formula can be concisely
represented using summation notation. For data set
A the mean is (2+3+5+8+9)/5=5.4, for data set B the mean is
(1+2+3+4+7+9)/6=4.33, for the weights the mean is
(105+110+112+113+120+125+125+130+...+235)/30=153.43. The mean reflects all
the
data, but is widely used because it can be algebraically manipulated and works
well with other statistics.
If a data set is symmetric, the mean is equal to the median, which is equal
to the midrange.
Exercise: When is the mean versus the median a better indication of where data
lies? Would you expect the mean or median age in a community to be larger?
The mean or median income? The mean or median cost of a house?
Challenge: How would you calculate the average class size at UNI?
The Mean is more correctly referred to as the arithmetic mean. It is worth
noting that there are other notions of average which are better suited to
specific problems. Two of these are the geometric mean and the harmonic mean.
The geometric mean is the nth root of the product of n numbers, or equivalently
the antilogarithm of the arithmetic mean of the logarithms. If one has the
data 2, 5, 8; one can calculate the geometric mean of those numbers as:
(2×5×8)^(1/3)=4.31 or
e^((1/3)(ln(2) + ln(5) + ln(8))) = e^((1/3)(.69+1.61+2.08))=4.31
The geometric mean is the appropriate concept of the mean for average interest
or inflation rates. If the inflation rates for three successive years are
3%, 12%, and 5%; then the cost of living will be multiplied by 1.03, 1.12, and
1.05 respectively, with a net result of multiplication by
1.03×1.12×1.05 = 1.21128 for the three year period. The cube root of
1.21128 is 1.066, which is the geometric mean, i.e., the inflation rate which,
if constant for three years, would produce the same increase. (The inflation
rate is actually .066%, since the inflation rate refers to the increase.)
The geometric mean will always be
less than or equal to the arithmetic mean.
The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals.
Thus the harmonic
mean of 2, 5, and 8 is:
1/((1/3)((1/2)+(1/5)+(1/8)))=3.64
The canonical problem in which the harmonic mean is employed is that if a car
drives 50 miles at 30 mph, 50 miles at 50 mph, and 50 miles at 60 mph, what is
its average speed? The total distance is 150 miles, the total time is
(50/30)+(50/50)+(50/60)=3.5; hence the average speed is 150/3.5=42.86 mph.
This can be concisely calculated as:
1/((1/3)((1/30)+(1/50)+(1/60)))=42.86
The harmonic mean is always less than or equal to the geometric mean.
Competencies: For the data set {2 5 9 4 6 7 6 8 8}, calculate the mean,
median, midrange, maximum, minimum, geometric mean, and harmonic mean
Reflection: For the above data set, which of the above statistics best
describes where the data is?
Challenge: When will the mean, median, and midrange be equal? When
will the maximum, minimum, and median be equal?
May 2003
return to index
campbell@math.uni.edu