# Choices of descriptive statistics

## Qualitative data

We can represent the counts of qualitative data with either tables, bar graphs, or pie charts. Each has its advantages. Tables explicitly list the numbers of items, bar graphs make it easier to compare the relative size of categories (it is easier to judge the height of a bar than the angle of a wedge), pie charts emphasize the relative frequency. Even when one has decided on the vehicle to display the data, there are decisions to be made: should frequencies (raw count) or relative frequencies be recorded? How should the rows of a table or bars or wedges be ordered? Recall that tables and bar charts can also be used for information which is not count data which represents parts of a whole.

## Quantititative data

Quantititative data may be displayed with either a stem-and-leaf plot, a histogram, or a boxplot. The stem-and-leaf plot retains all the original information, a histogram may present a better picture of the shape of the distribution because it is not constrained by the decimal structure of the data, a boxplot explicitly displays the maximum, minimum, and three quartiles (including the median) which may be of interest.

## Summary statistics

The mean and median are the two most common measures of location. The mean is defined using the values of all the data, while the median ignores the degree to which extreme individuals are extreme. One can recover the total weight of a population from the mean if one knows the population size, but the median is less affected by misrecorded data or an aberrant individual. If one is using the mean as a measure of location, the standard deviation is generally used as a measure of spread because it is defined using the mean. If the median is used as a measure of location, the interquartile range (Q3 - Q1) is generally used as a measure of spread, since the median is Q2. Recall that z-scores are defined based on distance from the mean, while the 50th percentile is the median.

Exercise: What is the essence of the data set of weights in pounds: 188.5, 183.0, 194.0, 185.0, 214.0, 203.5, 186.0, 178.5, 109.0, 186.0, 184.5, 204.0, 184.5, 195.5, 202.5, 174.0, 183.0, 109.5? What graphical presentation or summary statistics would you use to convey that essence?

## Dependence on units

How do summary statistics (mean, median, range, standard deviation) change if you change from inches to feet? from degreees Fahrenheit to degrees Celcius? One can easily verify from the definitions that if you add a number to all the data (e.g., give everybody a \$100 bonus), the measures of location will change by that amount, but the measures of spread will not. However, if you mulitply all the data by a constant (e.g., convert weights from pounds to kilograms), both the measures of location and measures of spread will change by that factor (the variance will change by the square of that factor). There is a measure of spread which is independent of units called the coefficient of variation (or relative risk). The coefficient of variation is defined as the standard deviation divided by the mean; since both the standard deviation and mean change by the same factor when one convertsfrom pounds to kilograms, the ratio remains the same.

Competencies:Calculate the interquartile range and coefficient of variation for the class weights.

Reflection:

Challenge:

May 2002