Developing Consistency in the Terminology and Display of Bar

Original Article
Developing Consistency in the Terminology
and Display of Bar Graphs and Histograms
Patricia B. Humphrey, Sharon Taylor and
Kathleen Cage Mittag
Department of Mathematical Sciences, Georgia Southern University, USA and
Department of Mathematics, University of Texas at San Antonio, USA
e-mail: [email protected]
Summary
Students often are confused about the differences between bar graphs and
histograms. The authors discuss some reasons behind this confusion and offer
suggestions that help clarify thinking.
Keywords:
Teaching; Bar graph; Bar chart; Histogram.
THE PROBLEM
How many times have these questions come up in
your classroom: ‘Should I use a histogram or a
bar graph?’ ‘Is this a histogram or a bar graph?’
‘Does it matter whether I connect the bars or
not?’ We have been hearing these questions and
others along the same lines (and seeing the
results of not asking the questions) for years.
When two of us began teaching statistics in the
1980s, it was not a big surprise since statistics
had previously not been an important part of the
primary and secondary curriculum. In fact, many
textbooks had statistics as a last chapter that
many teachers never covered. Exploratory data
analysis was just coming into favour in
classrooms with the publication of the Quantitative Literacy series (Gnanadesikan et al. 1986;
Landwehr 1986; Landwehr and Watkins 1986;
Newman et al. 1986). These publications paved
the way for the inclusion of data analysis as a
strand in the National Council of Teachers of Mathematics (NCTM) Standards (NCTM 1989, 2000)
as well as in many state standards.
The question then arises: Why are we still hearing the same questions about these two types of
graphs? Research studies have indicated that some
students in primary, secondary and college-level introductory statistics courses have difficulties with
statistical graphs. Chance et al. (2004) found
undergraduate students demonstrated problems
understanding variability and shapes of distributions. Other studies have shown that students
have trouble with distributions and graphical
© 2013 The Authors
Teaching Statistics © 2013 Teaching Statistics Trust ●●, ●●, pp ●●–●●
representations (Bakker and Gravemeijer 2004;
delMas et al. 2007; Hammerman and Rubin
2004; Konold and Higgins 2003; McClain et al.
2000). Cooper and Shore (2010) wrote, ‘it simply
makes good sense to include rich discussions
connecting an assortment of graphical displays to
their corresponding data sets and methods to
judge center and spread’ (p. 13).
We initially sought answers to our questions by
looking into the history of bar graphs and histograms. Although the histories were interesting,
they did not lead to any clarification. This led us to
look for any useful information on the graphs. Then
our search became interesting. Some of the
websites we visited had conflicting information
about the graphs, appropriate data types, definitions and proper use. Examination copies of
several textbooks yielded still more confusion. As
we looked at various texts, aspects of the problems
students have with these graphs became clearer.
We discuss below some possible reasons for the
confusion and offer possible solutions to the confusion in terms of points to emphasize to students.
THE CAUSES
A lack of consistency in textbooks and websites
seems to be one reason some students might have
problems with bar graphs and histograms. Generally speaking, the inconsistencies lie in three major
areas: definitions of histograms and bar graphs,
the type of data that can be used for histograms
and bar graphs (which often comes from failing to
consider how the data were collected) and labelling
Patricia B. Humphrey
the x-axis in a histogram. Yet another conceptual
problem is confusion between how changes in bar
(bin) width in a histogram change the appearance
of the distribution and how changes in the ordering
of categories in a bar graph change the ‘shape’. A
description of each of these follows.
Definitions
In a review of middle grade mathematics
textbooks, Hillman (2009) found that although
textbook authors tended to be consistent with
the elements they visually displayed in bar
graphs, ‘definitions and descriptions of bar graphs
were quite varied across grade levels, from short
and vague … to detailed descriptions explicitly mentioning multiple features of bar graphs’ (p. 148).
Definitions of bar graphs range from ‘A bar graph
is a graph that compares different amounts using
bars’ (iCoachMath.com) to a more correct ‘Bar
graphs represent each category as a bar. The bar
heights show the category counts or percents’
(Starnes et al. (2012) p. 10). One version of the
popular Triola series (Elementary Statistics Using
the Graphing Calculator: For the TI-83/84 Plus,
2005) did not even discuss bar charts except as
Pareto charts (a bar chart with categories ordered
from most to least-often occurring). Although most
authors display the bars as separated, only a few
(such as Brase and Brase (2012), p. 55) go on to
add in a highlighted box on features ‘Bars are of
uniform width and uniformly spaced’. A wellconstructed bar graph is shown in figure 1. An
example of a bar chart with bars connected is
shown in figure 2, which can lead confused
students to describe the shape of that bar chart as
‘approximately symmetric’ or even Normal! For this
reason, students should be discouraged from
connecting the bars in a bar graph.
Fig. 2. A bar graph without spacing. Many students
will think this is a histogram
We found that many textbook authors do not
provide a definition of a histogram; they simply
start creating histograms with little or no explanation except for mechanics. Some textbook authors
provide an informal definition that a histogram is a
way to sort and organize data. Although this is an
accurate statement, the same is also true for stem
and leaf plots or dot plots. We also found textbooks
and websites that defined a histogram as a
connected bar graph. Although the bars are
connected, this definition only tends to perpetuate
the problem of distinguishing the need for a
histogram versus a bar graph. This can be seen on
the MathisFun website where the definition
includes “The data is grouped into ranges (such as
‘40 to 49’) and then plotted as bars. Similar to a
Bar Graph, but each bar represents a range of
data.” A well-constructed histogram is shown in
figure 3.
Most Popular Car Colors Worldwide 2012
Miles from Home to Georgia Southern University
20
9
15
8
7
Frequency
Percent
25
10
5
0
te
hi
W
ck
a
Bl
er
lv
Si
y
ra
G
ed
R
e
ue
w
ro
n
B
e
/B
/G
w
lo
l
Ye
en
d
ol
ig
Bl
re
G
Color
Fig. 1. A correct bar graph. Bars are of uniform width
and have uniform spacing
6
5
4
3
2
1
0
0
60
120
180
240
Miles
Fig. 3. An example of a good histogram
© 2013 The Authors
Teaching Statistics © 2013 Teaching Statistics Trust ●●, ●●, pp ●●–●●
Bar graphs and histograms
Type of data
The most common distinction between bar graphs
and histograms is that bar graphs are for categorical data and histograms are for numerical data. As
with many statements associated with the two
types of graphs, this is technically correct.
However, there is seldom a distinction between
nominal, ordinal, interval and ratio data (all of
which, at least to most students, appear numeric).
Nominal data can be names such as ice cream
flavours or eye colour. Nominal data also arise
when numbers are used in place of category
names. These arise frequently as UPC (bar) codes
on merchandise, US postal zip codes, telephone
area codes and so on. UPC codes are numeric representations of a particular product. Zip (and area)
codes represent locations within the USA. Ordinal
data can be ordered by their position. In many
instances, these numerical values that indicate a
position can be thought of from a categorical point
of view. For example, a bar graph can be created
to show how many students rank a professor as a
1, 2, 3, 4 or 5 on their students’ evaluations. If
asked to rank a professor from 1 to 5, the student
would not rank the professor as a 1.5; therefore,
the data would be ordinal. One way to distinguish
these in the minds of students from interval and
ratio data is to have them ask ‘Would computation
of a mean and standard deviation make sense
here?’ Although an average zip code might give a
general sense of location (since the first digits
progress generally east to west), such an average
(certainly with the several decimal places most
students would attach) serves no real purpose.
The same holds true of a ranking for a single
category (although a median would make sense).
Arithmetic operations make sense for interval
and ratio data, although interval data have no real
zero. Having no real zero means ratios of these
data have no meaning. For example, you cannot
say that 100° is twice as hot as 50°. However,
ratio data do have this quality. It is possible to say
that 100 pounds is twice as heavy as 50 pounds.
Nominal and ordinal data should be represented
with bar graphs. Interval and ratio data should be
represented with histograms.
age of the respondent, but the answers in the first
question are in the form of categories. With the
second, an accurate value for age can be derived,
presuming of course the respondent answers truthfully. A graph of data from the question in figure 4
should properly be a bar chart; a graph of data from
the question in figure 5 should be a histogram.
The problem is not that there are different types
of data. The problem that leads to some of the confusion is the lack of discussion of data types. This
confusion is further complicated when some
examples of bar graphs show no spaces between
the bars. For example, Hillman (2009) found an example of a middle school textbook that provided a
bar graph with no spaces between bars along with
another bar graph that had equal spaces between
bars. If teaching tools broke numerical data down
into sub-categories and considered how the data
were collected, much of the confusion about bar
graphs and histograms could be clarified.
Axis labelling
The collection of quantitative data into interval
bins and resultant labelling of the x-axis is
Fig. 4. A survey question where Age becomes a categorical variable
How were the data collected?
Further complicating the above discussion on
number types is a consideration (or lack thereof)
about how the data were collected. A person’s age
in years is on a ratio scale (a person 20 years old
has lived twice as long as a 10-year old). Consider
the following two examples from surveys received
by one of the authors. Both questions ask for the
© 2013 The Authors
Teaching Statistics © 2013 Teaching Statistics Trust ●●, ●●, pp ●●–●●
Fig. 5. A survey question where Age becomes a quantitative variable
Patricia B. Humphrey
another area of concern we encountered in our
quest to find out why students might be having
problems with bar graphs and histograms. The
most common labelling problem stems from
using intervals such as 0–5, 6–10 and 11–16
when collecting data by hand into frequency
charts from which to make the histogram; the
endpoints of the intervals were typically defined
by the (rounded) accuracy of the data. We found
examples of this in such texts as Navidi and Monk
(2013) and Triola (2014) (as shown in figure 6b).
This labelling gives students the impression that
the bars do not have to be connected and that a histogram is not for continuous data. Where does the
observation 54.5 go with this labelling system? A
better approach (as shown in figure 6b) is given
by Peck and Devore (2012); there is no ambiguity
about where any value might go, and for these
continuous data, no gaps between bars are indicated. Even when data values must be integers
(IQ scores, for example), intervals should be
constructed, so there is no ambiguity about the
connectedness of the bars.
When texts or websites emphasize the continuous nature of data, there is still confusion over the
endpoints of the interval. Where do one bar end
and the other begin? The height of each bar is
definitely affected by the endpoints of the bar,
yet different sources offer different approaches.
The answer to this question is not obvious and
is exacerbated by the fact that different statistical software packages use different placement
(for example, Minitab includes the left end of
the interval in the bar, whereas SPSS includes
the right end). The answer to this question is to
not focus too much on the issue; as long as the
graph is consistent, whether the left (or right) end
of the interval is included is up to the individual
(and his or her software).
Because a bar graph provides a visual representation of categorical data, the placement of the
label is not really an issue. Most teaching tools label
the bar in the middle. However, for histograms,
references label the bar either in the middle or
at the beginning (software varies as well); the
number used to label the bar also varies on the
basis of the location of that label. All of which leads
to students’ confusion. Further exacerbating this is
the ‘TI-83 effect’. All too often, students copy these
histograms onto paper by using bar labels exactly
as shown on the calculator, instead of converting
to a number line. This conversion of the number
line into categories is also shown in such texts as
Mann (2010: p. 40).
We must constantly remind students that histograms are for numeric data; there is an inherent
ordering that is reflected in the familiar number line
used in other graphs (such as scatterplots) and
that the bars really represent intervals of observation values and not merely categories.
Shape
One reason histograms are constructed is to look
at the distribution of the data. On the other hand,
bar graphs are constructed to provide a visual
display of the counts of nominal or ordinal data.
Because of the nature of the data, the bars on a
bar graph can be placed in any order. When
constructing a bar graph, the person creating the
graph can change the shape by changing the
placement of the categories. This can cause students to have misconceptions that shape is a characteristic of a bar graph or that rearrangement of
bars in a histogram is possible. For example, in the
case of figure 1, some students might say the bar
graph is skewed right or that figure 2 is symmetric.
Since histograms are constructed to look at the
numeric distributions of data, it is important that
the distribution be reasonably portrayed. If the
bin width is simply maximum–minimum values,
there is only one bar and no valuable information
is obtained. If the bin width is simply each observation, again no relevant information is gathered.
Fig. 6. A comparison of creating frequency distributions. The table on the left is from Peck and Devore (2012),
Statistics: The Exploration and Analysis of Data (7th ed.), p. 117. The table on the right is from Triola (2014),
Elementary Statistics (12th ed.), p. 51
© 2013 The Authors
Teaching Statistics © 2013 Teaching Statistics Trust ●●, ●●, pp ●●–●●
Bar graphs and histograms
Bin width and the shape of the histogram are
inextricably linked. Students must realize the
importance of bin width in determining the shape
of the histogram.
WHAT CAN WE DO?
One of the main problems is the lack of consistency in the terminology and displays of bar
graphs and histograms in current resources. The
best way to alleviate this problem is through
established standards and clear definitions. The
Guidelines for Assessment and Instruction in
Statistics Education report (Franklin et al. 2007)
established standards for the K–12 curriculum. Essentially, the same recommendations were stated
in the Teaching Statistics in British Secondary
Schools report by Davies et al. (2012), which
included one telling quote from a post-graduate
student (teacher): ‘I’m clear on the curriculum,
but not how to teach it’. If statisticians can assemble and reach a consensus on curriculum issues,
they should consider the importance of providing
guidance on teaching bar graphs and histograms
(and other topics) so that teachers in the field are
clear on how to teach them. This will help students
reach competence in stated goals of being able to
critically evaluate newspaper and magazine
accounts of statistics, and graphs of data.
Until authors and publishers incorporate those
ideas and standards into textbooks and web
resources, the problems students have in
understanding bar graphs, histograms and the
differences between them will continue to plague
those of us trying to teach statistics. What we
can do as teachers of statistics is keep the
standards in mind when selecting course materials, continually emphasize to students the nature
of the data we deal with and promote wise choices
in the construction of these graphs.
REFERENCES
Bakker, A. and Gravemeijer, K. (2004). Learning to
reason about distribution. In: D. Ben-Zvi and J.
Garfield (eds.) The Challenge of Developing
Statistical Literacy, Reasoning, and thinking, pp.
147–168. The Netherlands: Kluwer. Dordrecht.
Brase, C. H. and Brase, C.P. (2012). Understandable Statistics: Concepts and Methods (10th
edn). Boston: Brooks/Cole Cengage Learning.
Chance, B., delMas, R. and Garfield, J. (2004).
Reasoning about sampling distributions.
Challenge of Developing Statistical Literacy,
© 2013 The Authors
Teaching Statistics © 2013 Teaching Statistics Trust ●●, ●●, pp ●●–●●
Reasoning, and thinking, pp. 295–323. The
Netherlands: Kluwer. Dordrecht.
Cooper, L. L. and Shore, F.S. (2010). The effects
of data and graph type on concepts and visualizations of variability. Journal of Statistics
Education, 18(2). Retreived September 17,
2010, from http://www.amstat.org/publications/jse/v18n2/kader.html
Davies, N., Marriott, J., Gadsden, R. and Bidgood, P.
(2012). Teaching Statistics in British Secondary
Schools a Research Report for the Teaching
Statistics Trust, from https://www.rss.org.uk/
uploadedfiles/userfiles/files/RSSCSE-Teaching%
20Statistics%20Trust%20-%20Teaching%20
Stats%20in%20British%20Secondary%20Schools
%20report.pdf
delMas, R., Garfield, J., Ooms, A. and Chance, B.
(2007). Assessing students’ conceptual understanding after a first year course in statistics. Statistics Education Research Journal, 6(2), 28–58.
Franklin, C., Kader, G., Mewborn, D., Moreno, J.,
Peck, R., Perry, M. and Scheaffer, R. (2007).
Guidelines for Assessment and Instruction in
Statistics Education Report. Alexandria, VA:
American Statistical Association.
Gnanadesikan, M, Schaeffer, R.L. and Swift, J
(1986). Art and Techniques of Simulation. Palo
Alto, CA: Dale Seymour Publications.
Hammerman, J.K. and Rubin, A. (2004). Strategies
for managing statistical complexity with new
software tools. Statistics Education Research
Journal, 3(2), 17–41.
Hillman, S. (2009). Exploring the confusions: Bar
graphs. Paper presented at the 3rd International
Conference to Review Research on Science, Technology and Mathematics Education, Mumbai, India.
iCoachMath.com. Retrieved May 28, 2013 from
http://www.icoachmath.com/math_dictionary/
bar_graph.html
Konold, C. and Higgins, T. (2003). Reasoning
about data. In: J. Kilpatrick, W. G. Martin and
D. Schifter (eds.) A Research Companion to
Principles and Standards for School Mathematics, pp. 193–215. Reston, VA.: National Council
of Teachers of Mathematics.
Landwehr, J.M. (1986). Exploring Surveys and
Information from Samples. Palo Alto, CA: Dale
Seymour Publications.
Landwehr, J.M. and Watkins, A.E. (1986). Exploring
Data. Palo Alto, CA: Dale Seymour Publications.
Mann, P. S. (2010). Introductory Statistics (7th
edn). Hoboken, NJ: John Wiley & Sons, Inc.
Math is Fun.com Retrieved May 28, 2013 from
http://www.mathsisfun.com/definitions/histogram.html
Patricia B. Humphrey
McClain, K., Cobb, P. and Gravemeijer, K. (2000).
Supporting students’ ways of reasoning about
data. In: M. Burke and Curcio F. (eds.) Learning
Mathematics for a New Century, 2000 Yearbook.
Reston VA: National Council of Teachers of
Mathematics.
National Council of Teachers of Mathematics.
(1989). Curriculum and Evaluation Standards
for School Mathematics. Reston, VA: Author.
National Council of Teachers of Mathematics.
(2000). Principles and Standards for School
Mathematics. Reston, VA: Author.
Navidi, W. and Monk, B. (2013). Elementary
Statistics. New York: McGraw-Hill.
Newman, C.M., Obremski, T.E., Schaeffer, R.L.
(1986). Exploring Probability. Palo Alto, CA:
Dale Seymour Publications.
Peck, R. and DeVore, J.L. (2012). Statistics: The
Exploration and Analysis of Data (7th edn).
Boston: Brooks/Cole, Cengage Learning.
Starnes, D.S., Yates, D.S., Moore, D.S. (2012).
The Practice of Statistics (4th edn). New York:
W. H. Freeman and Co..
Triola, M. (2005). Elementary Statistics Using the
Graphing Calculator: For the TI-83/84 Plus.
Boston: Pearson.
Triola, M. (2014). Elementary Statistics (12th
edn). Boston: Pearson Education, Inc.
© 2013 The Authors
Teaching Statistics © 2013 Teaching Statistics Trust ●●, ●●, pp ●●–●●