Defining the Problem

Statistics are used when you want to know something about a collection or set of objects. This could be a group of people, the days in a month, or the set of times an athelete was at bat. For example, you might want to know the heights of a group of people. If the group is the students in one of my classes, I will ask them their heights. However, if I am interested in the heights of every person in the United States, I cannot personally ask each one their height. In fact, even the United States census, which is mandated to count everyone in the United States, cannot find everyone in the United States.

For this reason, two words are important for the science of statistics: population and sample. The population is the group you are interested in (e.g., all people in the U.S.). The sample is the group you can collect information about (e.g., the students in my class). A sample is always a subset of the population, i.e., every member of the sample is in the population. [Indeed, scientists perform experiments on mice when they are interested in the effect of drugs on man, but such analogies are not statistics.]

When I can measure (or otherwise assess) every person or object I am interested in, the sample is the population; the science of describing such populations is called Descriptive Statistics. When the sample is a proper subset of the population, I can describe the sample, but not the population; the science of determining to what extent the description of the sample reflects the members of the entire population is called inferential statistics.

Caveat: Indeed a sample is just a subset as defined above, but in statistics it is important to have a random or representative sample. Weighing members of the football team will not provide good information about the weights of all UNI students. Breast cancer does occur in men, but the frequency in men is not a good measure of the overall frequency in the population. Experimental design is concerned with obtaining good samples; experimental design is very important, but we shall not study experimental design in this course.

Competency: Give three different samples for the population of high school students in the United States.
Give three different populations for which the UNI women's volleyball team is a sample.

Reflection: We have not discussed whether a sample is good or representative of the population. This is an important aspect of experimental design. But consider which of the above samples are representative of the associated population. Does it depend on what informaton about the population you are interested in?

Challenge:

July 2007