MISLEADING STATISTICS Twisting information to your advantage…

MISLEADING STATISTICS
Twisting information to your
advantage…
Statistical thinking will one day be as necessary for efficient
citizenship as the ability to read and write. – H.G. Wells
Indeed, statistics may be one of our most effective and efficient vehicles for
communicating information. It is the natural inclination of people to trust numbers over
words, and statistics present numbers in an attractive format that even the most
innumerate man can follow. In addition, statistics can be presented in a wide variety of
forms, from line graphs to tables to pie charts. Each performs its own unique function and
offers information from a new perspective.
Yet with every benefit comes a setback. Many people do not realize that numbers in a
graph can be easily manipulated to reflect the author’s own wishes. The problem with
graphs is that even with missing information, incomplete figures, and vague captions, they
can still be presented with reasonable realism. People have grown so accustomed to
seeing graphs that they accept its information unquestionably.
In the following presentation, we will show you two such misleading graphs, point out
their errors, and attempt to recreate the same graph using more accurate forms of
presentation. You will see how the same set of information can produce two completely
different graphs, and learn about the many ways in which statistics can deceive you.
This graph is misleading in many ways. Here are some examples of the most commonly
used graph-manipulation tactics.
First of all, there is the title to consider. While retail sales do go down in April 2002, the
title doesn’t accurately reflect what the rest of the graph shows. Yes, the sales do rise and
fall over a period of a year and a half, but in general, they have been steadily rising since
November 1998.
Second, notice that the y-axis does not begin at zero, but
at $225 billion. This has the unfortunate effect of making
the rising slope shown in the graph much steeper than it
actually is.
Third, the little white box that shows the rate of change
from pervious months only includes the last three months in
the graph. This immediately biases the graph in favor of the
title, as it shows that sales have actually gone down since
February. A reader just looking at the box will not know that
sales have also gone down in May and September 1999, and
that these did not affect the rising number sales one bit.
Fourth, note that the year 1999 is written under June and
July, and not January. This may be a minor transgression,
but it will certainly lead some readers to believe that the
time period spans three whole, consecutive years and not
fragments of a year.
One final observation: Is it fair to compare retail sales of
the months of a year all together? Christmas in December,
for example, would prompt gift buying, but slower months
like February might now have any at all. Wouldn’t it be
much fairer to compare the same months and calculate how
much it has grown over the year?
On this second graph, the y-axis begins on zero, therefore making the rising slope much less
dramatic. When presented like this it is also harder to tell which bars are higher and lower. The
last two bars, for example – March and April – look almost exactly the same on this graph. If
the reader wasn’t told that the sales had actually gone down from March in April, he would
never know. The title, likewise, has been changed to something that can encompass all aspects
of this graph. In addition, instead of labeling a group of months with one year, we have given
each month its own year so that its easier to read.
Retail Sales from November 1998 to April 2000
$300.00
Billions
$250.00
$200.00
$150.00
$100.00
$50.00
$0.00
Nov- Dec- Jan- Feb- Mar- Apr- May- Jun- Jul- Aug- Sep- Oct- Nov- Dec- Jan- Feb- Mar- Apr98 98 99 99 99 99
99 99 99
99 99 99
99 99 00 00 00 00
Month
However, this graph still does not address the problem of
comparing all months together as equals. In our next graph, we
will show you what that might look like.
Comparing the months of consecutive years yields yet another perspective to the picture.
From this graph, it is easy to see that sales have steadily risen for each month, and by a
fairly predictable percentage at that. Nearly all the months are rising by the same margin
from one year to the next. Even April sales, which the original graph proclaimed was falling,
have risen compared to its sales from the previous year.
Month
Ap
ril
ar
ch
M
Fe
br
ua
ry
ry
Ja
nu
a
be
r
De
ce
m
be
r
$300.00
$250.00
$200.00
$150.00
$100.00
$50.00
$0.00
No
ve
m
Billions
Retail Sales Rise
First Year
Second Year
Once again, while the original graph seems to be trying to convince us
that April sales have very obviously fallen, these two graphs tell us the
opposite. Appropriately, the title for this third graph has been changed
completely to give the opposite minute.
Of course, there are many different ways to lie with statistics, and now we’ll show you how
it can be done with a pictograph.
The most deceptive aspect of this graph is the way in which it was drawn. Firstly, the
perspective puts barrel 1979 at the forefront and barrel 1973 at the back. This effectively
draws reader’s eyes to the 1979 barrel first and then forces him read the rest of the years in
descending order. Supporting this deceptive tactic is the fact that only the foremost barrels
have complete year to read. The rest are indicated with only the last two digits, as in ‘76.
Obviously, the makers of the graph intend for the audience to read in reverse chronological
order, which has the effect of making oil prices seem to fall.
Secondly, the perspective makes it hard to judge the
numerical difference between each barrel. For example,
even though barrel 1975 appears to be over two thirds
the height of 1976, in reality, the difference between
them is only $0.95. Likewise, barrel 1973 seems less
than half the height of 1974, yet they differ by a
whopping $8.54!
A third misleading aspect is that this pictograph
doesn’t contain a scale or axis’ of any kind. Without it,
the reader’s attention might be directed to the area of
each barrel instead. Numerically, the smallest barrel
should only be about one 1/5 of the largest barrel, but
in terms of area, the ratio is about 1/25. This makes the
different between the two much larger than it actually
is.
Lastly, the way in which the barrels are labeled seem
somewhat awkward. Shouldn’t the prices be on the
barrel instead of years? Prices written on the barrel will
clarify that it is the cost that is changing, not the years.
And with more space to indicate years, readers won’t be
forced to read in reverse.
Price per barrel of crude oil leaving Saudi Arabia on
Jan. 1
$14.00
$12.00
$10.00
$8.00
Price
$6.00
$4.00
$2.00
$0.00
1973
1974
1975
1976
1977
1978
1979
Year
As soon as the information is transferred to a bar graph instead of a pictograph, most of the major
problems, such as perspective, are eliminated. This graph neatly depicts the steadily rising prices of
crude oil, and doesn’t hesitate to show sudden rises or drops. Each bar represents a number by its
height without using fancy images to distract the reader. The presence of the x and y-axis’ also make
it much more organized. While the original graph tended to overstate small differences and gloss
over wide gaps, this graph is much more honest. One can see that the largest rise occurs between
1973 and 1973, and that it continues to rise by smaller amounts steadily over the next five years.
The years on the x-axis are all clearly marked in chronological order as well so that it is easy for
readers to understand.
For this type of information, using a line graph may be even more useful
than a bar graph. With a line to define the rise of fall of oil prices, it is all the
more obvious what the shape of the changing rates look like. This graph even
seems to accentuate the huge rise between 1973 and 1974. The biggest
benefits of a using a line graph, however, lies in the fact that each point is
marked with small, accurate dots. These are much easier to read than bars,
and the line between them outlines the contour of the rise.
Price per barrel of crude oil leaving Saudi Arabia
on Jan. 1
$16.00
$14.00
$12.00
Price
$10.00
$8.00
$6.00
$4.00
$2.00
$0.00
1973 1974 1975 1976 1977 1978 1979
Year
What makes statistical information reliable and accurate?
To make sure statistics are accurate and reliable, one must keep a number of things in mind.
Here are the some of the most important points to remember:
The first and most important is the collection of information. It’s alarmingly easy to make
graphs with missing figures, and this only produces inaccurate results. Before making any
graph, it is wise to make that the data is sufficient. This is especially true in surveys, where
the accuracy of the results is in direct proportion to the number of people surveyed. Next to
quantity in importance is quality. There is little point in making a graph with inaccurate
information.
Even with accurate information, however, you must know which is the best way of using and
presenting it. Many perfectly accurate statistics become misleading when they are unfairly
compared. You would not, for example, compare the average grades of a small school to the
average grades of a large school without making allowances for the larger diversity of
students. Therefore, when presenting data, care must be taken to prevent this.
Although this graph is pleasing to look at, it
can also be confusing. The author meant for the
Number of Buyers to be calculated by the height
of each picture, but the reader’s attention will be
more focused on area. What makes it even
more biased is that each monitor on the graph is
a Macintosh.
Colors should also be used with care, and this applies to most graphs: they should enhance a good
presentation, not act as a crutch for a poor one. Likewise, be very careful if you are drawing in perspective.
Perspective tends to be hard to read and understand, and can easily confuse.
One of the most common ways of deceiving with graphs is to (a) Cut off the y-axis, or
(b) Have the numbers on it rise in an illogical way. Take the following two examples:
(a)
b)
As you can see, graph (a) begins the y-axis on 80 000, making the increase between the two years
seem much larger than it actually is. On graph (b), the numbers on the y-axis start on zero, but then
double itself for each consecutive value. This makes it seem as if the greatest increase occurred between
x-values 1 and 2, and not 3 and 4. These two mistakes should be avoided at all costs on graphs.
The last point to remember pertains to averages. Averages are a tricky business because many people
apply them in places where they should not be applied. That is where medians and modes come in. In
cases where a survey turns up many small numbers but one enormous figure, it is probably better to use
a median. In cases where there are many, many figures between a narrow range, it might be wiser to
use a mode. All three of these tools: mean, median, and mode, should be used only when appropriate.
All of these are factors that make some statistical information accurate and reliable. Put together, they
can make a powerful tool in presenting information. In the end, however, whether or not a graph is
accurate depends on the maker, and whether he or she wishes the graph to be honest or misleading.
That is why we must be true to the data when dealing with statistics, and always remember what makes
the difference between an accurate graph and an inaccurate one.
Bibliography
“Graphing Quantitative Variables.” The Connexions Project. Modification Date: 27 June 2003.
Rice University. Access Date: 01 December 2003.
<http://cnx.rice.edu/content/m10927/latest/>
“Misleading Graphs.” Math and the Media: Deconstructing Graphs and Numbers. Modification
Date: N/A. Reich College of Education. Access Date: 01 December 2003.
<http://www.ced.appstate.edu/~goodmanj/workshops/ABS04/graphs/graphs.html>
“Misleading Graphs.” Maths Data Handling: Foundation/Intermediate. Modification Date: N/A.
BBCi. Access Date: 01 December 2003.
<http://www.bbc.co.uk/schools/gcsebitesize/maths/datahandlingfi/representingdatarev5
.shtml>
“Statistical Manipulation.” Effective Meetings. Modification Date: N/A. SMART Technologies
Inc.. Access Date: 01 December 2003.
<http://www.effectivemeetings.com/productivity/communication/statmanipulation.asp>