Measures of Central Tendency
Have you taken a statistics course? Don’t wait too long if you are procrastinating. Mastering that material helps with other classes and in life.
One of the key concepts within statistics is measures of central tendency: mean, median, and mode. Each one tells us about how the data, for one variable or concept, cluster together although each are calculated differently.
The mean is the numerical average. You’ve probably already been calculating means —also known as averages. Add up however many scores or values in your data and divide by how many you have.
Your grade point average (GPA) is one example of a mean. (Add up the grade points for all your classes, then divide by how many classes you have taken.) Thus, if your grades (and grade points) last semester were A (4), B (3), A (4), C (2), A (4), add up those grade points (4+3+4+2+4 = 17, then divide by 5, since that’s how many grade points you have, and your grade point mean is 3.4, the equivalent of a B. (We add the grade points, because we cannot add letters! Grades are measured at the ordinal level of measurement; Grade Points are measured at the interval/ratio level of measurement. But that’s a topic for another post.)
The median is the value in the middle of the data distribution. Lay out, or list, all the scores or values in order, and median is the one smack dab in the middle. Thus, if we put our grade points in order, we have 4, 4, 4, 3, 2. Which one is in the middle? A “4” and that is equivalent to an A.
The mode is the score or value that is more prevalent or common than others. Whichever value has the most responses, that’s your mode. Thus, with the grade points we’ve been using, the 4s show up more than the 3 or the 2, so 4 is the mode. (There are three 4s, and only one each of the 3 and the 2.)
So what? What’s the point? (Always good questions to ask when learning new material or research findings.)
Each measure of central tendency tells us how the data cluster but each one has its pros and cons. The mean is usually the best to use as it takes into account all the data. The median and mode only use one value; the median only cares about the middlemost value, the mode only about the one most common. However, if the mean and median are very different from each other – we call this skew  then the median is best to use because the mean has been affected by extreme values on one side or the other. The mode only tells you about the most common value although that can obscure where most data really are.
Yeah, so? Let me illustrate with some recent news items.
Slate recently published a great article about the website, The Knot, a popular source of information for those planning weddings. The article delves into the statistics on wedding costs. I recommend the entire article. The Knot describes the mean wedding cost, not the median cost:
When I pressed TheKnot.com on why they don’t just publish both figures, they told me they didn’t want to confuse people. To their credit, they did disclose the figure to me when I asked, but this number gets very little attention. Are you ready? In 2012, when the average wedding cost was $27,427, the median was $18,086. In 2011, when the average was $27,021, the median was $16,886. In Manhattan, where the widely reported average is $76,687, the median is $55,104. And in Alaska, where the average is $15,504, the median is a mere $8,440. In all cases, the proportion of couples who spent the “average” or more was actually a minority. And remember, we’re still talking only about the subset of couples who sign up for wedding websites and respond to their online surveys. The actual median is probably even lower.
So, what do we see here? An awesome discussion in a popular online magazine about why the mean and median can mislead people! If this paragraph of information is daunting to decode, then reformat it. Here it is in a table so we can better compare the numbers:

Mean 
Median 
2012 
$27,427 
$18,086 
2011 
$27,021 
$16,886 
Manhattan 
$76,687 
$55,104 
Alaska 
$15,504 
$8,440 
In each case, what is the relation of the mean to the median? The mean is much higher.
The Knot reports the mean, not the median. By doing this, readers get the impression that weddings cost a LOT of money. Yes, the median shows that too, but the median is consistently lower than the mean.
Source: http://upload.wikimedia.org/wikipedia/commons/1/1a/At_the_Seattle_Bridal_Show2.jpg
The mean and median are very different, which means there is skew, indicating that the mean is being pulled up by extreme values. When the mean is higher than the median, those extreme values are on the high side. There are some really expensive weddings that are pulling those means up from those medians.
Most of the time, when there is a lot of skew, the median will be reported rather than the mean, and appropriately so. For example, typically the U.S. government agencies report median income and median housing values because we have some really high incomes and some really high value housing that makes it look like our average housing values are higher than they really are.
Why would The Knot report mean instead of median since they have so much skew? They are a marketing tool, not just an informational website for those interested in weddings. The companies who run ads and, actually, the entire wedding industry, benefit from people assuming that the average wedding costs so much. People might not feel as much pressure to bump up their wedding budgets if the median were reported, and then companies would not make as much money off of weddings.
Note also the author’s last part of the paragraph. Not everyone answers their surveys about how much they paid for their wedding. Thus no random sampling has occurred and we cannot generalize this information to all weddings. But does a reader of The Knot either know or think about that? Most likely not.
Where else have you seen the mean reported when it should have been the median? And what are the possible consequences of data that might be misleading?
Comments