Other Descriptive Statistics

There are other measures of location and spread than have been discussed previously. A few which may be of interest are:

Mode

The mode is the data value which occurs most often. It is seldom of interest with quantitative data, but is the only notion of average which is possible for qualitative (categorical) data. If there are three red balloons, two green balloons and five yellow balloons, the mode color is yellow. We could not compute a mean, median, or midrange color.

Weighted Mean

There are at least two motivations for the weighted mean. One is concerned with averaging average values, which is best explaind with an example: If I have a class of 30 students for whom the mean score on a test is 75, and another class of 50 students for whom the mean score is 80; then the sum of all the scores is (30)(75)+(50)(80), hence the overall mean is ((30)(75)=(50)(80))/(30+50) = 78.12. This is obtained by "weighting" the means by 30 and 50, respectively, and dividing by the sum of the wieghts. The result is called the weighted mean of the means. The weights employed could also be fractional. This is illustrated by the second motivation for weighted means.

Sometimes one feels that some data is more important/accurate than other data. For example, If one wanted to know what the average temperature in July is, he might feel that temperatures from recent years are more important than temperatures form less recent years. If the mean temperatures in July were 72 (1996), 69 (1995(, 75 (1994), 73 (1993), and 68 (1992); one could calculate a weighted mean:
(72x1 + 69x.8 + 75x.6 + 73x.4 + 68x.2)/(1+.8+.6+.4+.2)=71.67

Determining the average size of classes at a university is another problem where weighted means may be appropriate.

Geometric Mean

The geometric mean is the nth root of the product of n numbers, or equivalently the antilogarithm of the arithmetic mean of the logarithms. If one has the data 2, 5, 8; one can calculate the geometric mean of those numbers as:
(2x5X8)^(1/3)=4.31 or
e^((1/3)(ln(2) + ln(5) + ln(8))) = e^((1/3)(.69+1.61+2.08))=4.31
The geometric mean is the appropriate concept of the mean for average interest or inflation rates. If the inflation rates for three successive years are 3%, 12%, and 5%; the geometric mean of 1.03, 1.12, and 1.05, which is 1.066 gives the annual inflaton rate, 6.6%, which if constant for three years would produce the same increase in prices.

The geometric mean will always be less than or equal to the arithmetic mean.

Harmonic mean

The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals. Thus the harmonic mean of 2, 5, and 8 is:
1/((1/3)((1/2)+(1/5)+(1/8)))=3.64
The canonical problem in which the harmonic mean is employed is that if a car drives 50 miles at 30 mph, 50 miles at 50 mph, and 50 miles at 60 mph, what is its average speed? The total distance is 150 miles, the total time is (50/30)+(50/50)+(50/60)=3.5; hence the average speed is 150/3.5=42.86 mph.
This can be concisely calculated as:
1/((1/3)((1/30)+(1/50)+(1/60)))=42.86

The harmonic mean is always less than or equal to the geometric mean.

Coefficient of Variation

It is now appropriate to distinguish between the two types of quantitative data, interval and ratio. Interval data is data which can be represented with real numbers such as temperature, altitude, or weight. Ratio data is data which is identified with positive real numbers such as height and weight (temperatures and altitudes can be positive or negative, heights and weights cannot be negative). Ratio data is interval data, but the converse does not hold. The term ratio refers to the fact that you can characterize a stone as being twice as heavy as another, but you cannot refer to a day as being twice as hot as another.

A motivation for the coefficient of variation is if one is wondering whether there is more variation in weight among men than mice. Since the heaviest mouse weighs less than the standard deviation of weights of man, the standard deviation of the weights of mice must be less. But the question can be reposed as relative variation, which is what the coefficient of variation measures: the coefficient of variation is the ratio of the standard deviation to the mean. For one of my classes, the mean weight was 152 pounds, with a standard deviation of 31 pounds; the mean height was 69.3 inches, with a standard deviation of 3.86 inches. Hence the coefficients of variation were 31/152=.20 and 3.86/69.3=.056 for weight and height respectively. Note that that the coefficient of variation is independent of what units the data were measured in (pounds or kilograms or stones; inches or feet or metres).

return to index

Questions?