Original Article Developing Consistency in the Terminology and Display of Bar Graphs and Histograms Patricia B. Humphrey, Sharon Taylor and Kathleen Cage Mittag Department of Mathematical Sciences, Georgia Southern University, USA and Department of Mathematics, University of Texas at San Antonio, USA e-mail: [email protected] Summary Students often are confused about the differences between bar graphs and histograms. The authors discuss some reasons behind this confusion and offer suggestions that help clarify thinking. Keywords: Teaching; Bar graph; Bar chart; Histogram. THE PROBLEM How many times have these questions come up in your classroom: ‘Should I use a histogram or a bar graph?’ ‘Is this a histogram or a bar graph?’ ‘Does it matter whether I connect the bars or not?’ We have been hearing these questions and others along the same lines (and seeing the results of not asking the questions) for years. When two of us began teaching statistics in the 1980s, it was not a big surprise since statistics had previously not been an important part of the primary and secondary curriculum. In fact, many textbooks had statistics as a last chapter that many teachers never covered. Exploratory data analysis was just coming into favour in classrooms with the publication of the Quantitative Literacy series (Gnanadesikan et al. 1986; Landwehr 1986; Landwehr and Watkins 1986; Newman et al. 1986). These publications paved the way for the inclusion of data analysis as a strand in the National Council of Teachers of Mathematics (NCTM) Standards (NCTM 1989, 2000) as well as in many state standards. The question then arises: Why are we still hearing the same questions about these two types of graphs? Research studies have indicated that some students in primary, secondary and college-level introductory statistics courses have difficulties with statistical graphs. Chance et al. (2004) found undergraduate students demonstrated problems understanding variability and shapes of distributions. Other studies have shown that students have trouble with distributions and graphical © 2013 The Authors Teaching Statistics © 2013 Teaching Statistics Trust ●●, ●●, pp ●●–●● representations (Bakker and Gravemeijer 2004; delMas et al. 2007; Hammerman and Rubin 2004; Konold and Higgins 2003; McClain et al. 2000). Cooper and Shore (2010) wrote, ‘it simply makes good sense to include rich discussions connecting an assortment of graphical displays to their corresponding data sets and methods to judge center and spread’ (p. 13). We initially sought answers to our questions by looking into the history of bar graphs and histograms. Although the histories were interesting, they did not lead to any clarification. This led us to look for any useful information on the graphs. Then our search became interesting. Some of the websites we visited had conflicting information about the graphs, appropriate data types, definitions and proper use. Examination copies of several textbooks yielded still more confusion. As we looked at various texts, aspects of the problems students have with these graphs became clearer. We discuss below some possible reasons for the confusion and offer possible solutions to the confusion in terms of points to emphasize to students. THE CAUSES A lack of consistency in textbooks and websites seems to be one reason some students might have problems with bar graphs and histograms. Generally speaking, the inconsistencies lie in three major areas: definitions of histograms and bar graphs, the type of data that can be used for histograms and bar graphs (which often comes from failing to consider how the data were collected) and labelling Patricia B. Humphrey the x-axis in a histogram. Yet another conceptual problem is confusion between how changes in bar (bin) width in a histogram change the appearance of the distribution and how changes in the ordering of categories in a bar graph change the ‘shape’. A description of each of these follows. Definitions In a review of middle grade mathematics textbooks, Hillman (2009) found that although textbook authors tended to be consistent with the elements they visually displayed in bar graphs, ‘definitions and descriptions of bar graphs were quite varied across grade levels, from short and vague … to detailed descriptions explicitly mentioning multiple features of bar graphs’ (p. 148). Definitions of bar graphs range from ‘A bar graph is a graph that compares different amounts using bars’ (iCoachMath.com) to a more correct ‘Bar graphs represent each category as a bar. The bar heights show the category counts or percents’ (Starnes et al. (2012) p. 10). One version of the popular Triola series (Elementary Statistics Using the Graphing Calculator: For the TI-83/84 Plus, 2005) did not even discuss bar charts except as Pareto charts (a bar chart with categories ordered from most to least-often occurring). Although most authors display the bars as separated, only a few (such as Brase and Brase (2012), p. 55) go on to add in a highlighted box on features ‘Bars are of uniform width and uniformly spaced’. A wellconstructed bar graph is shown in figure 1. An example of a bar chart with bars connected is shown in figure 2, which can lead confused students to describe the shape of that bar chart as ‘approximately symmetric’ or even Normal! For this reason, students should be discouraged from connecting the bars in a bar graph. Fig. 2. A bar graph without spacing. Many students will think this is a histogram We found that many textbook authors do not provide a definition of a histogram; they simply start creating histograms with little or no explanation except for mechanics. Some textbook authors provide an informal definition that a histogram is a way to sort and organize data. Although this is an accurate statement, the same is also true for stem and leaf plots or dot plots. We also found textbooks and websites that defined a histogram as a connected bar graph. Although the bars are connected, this definition only tends to perpetuate the problem of distinguishing the need for a histogram versus a bar graph. This can be seen on the MathisFun website where the definition includes “The data is grouped into ranges (such as ‘40 to 49’) and then plotted as bars. Similar to a Bar Graph, but each bar represents a range of data.” A well-constructed histogram is shown in figure 3. Most Popular Car Colors Worldwide 2012 Miles from Home to Georgia Southern University 20 9 15 8 7 Frequency Percent 25 10 5 0 te hi W ck a Bl er lv Si y ra G ed R e ue w ro n B e /B /G w lo l Ye en d ol ig Bl re G Color Fig. 1. A correct bar graph. Bars are of uniform width and have uniform spacing 6 5 4 3 2 1 0 0 60 120 180 240 Miles Fig. 3. An example of a good histogram © 2013 The Authors Teaching Statistics © 2013 Teaching Statistics Trust ●●, ●●, pp ●●–●● Bar graphs and histograms Type of data The most common distinction between bar graphs and histograms is that bar graphs are for categorical data and histograms are for numerical data. As with many statements associated with the two types of graphs, this is technically correct. However, there is seldom a distinction between nominal, ordinal, interval and ratio data (all of which, at least to most students, appear numeric). Nominal data can be names such as ice cream flavours or eye colour. Nominal data also arise when numbers are used in place of category names. These arise frequently as UPC (bar) codes on merchandise, US postal zip codes, telephone area codes and so on. UPC codes are numeric representations of a particular product. Zip (and area) codes represent locations within the USA. Ordinal data can be ordered by their position. In many instances, these numerical values that indicate a position can be thought of from a categorical point of view. For example, a bar graph can be created to show how many students rank a professor as a 1, 2, 3, 4 or 5 on their students’ evaluations. If asked to rank a professor from 1 to 5, the student would not rank the professor as a 1.5; therefore, the data would be ordinal. One way to distinguish these in the minds of students from interval and ratio data is to have them ask ‘Would computation of a mean and standard deviation make sense here?’ Although an average zip code might give a general sense of location (since the first digits progress generally east to west), such an average (certainly with the several decimal places most students would attach) serves no real purpose. The same holds true of a ranking for a single category (although a median would make sense). Arithmetic operations make sense for interval and ratio data, although interval data have no real zero. Having no real zero means ratios of these data have no meaning. For example, you cannot say that 100° is twice as hot as 50°. However, ratio data do have this quality. It is possible to say that 100 pounds is twice as heavy as 50 pounds. Nominal and ordinal data should be represented with bar graphs. Interval and ratio data should be represented with histograms. age of the respondent, but the answers in the first question are in the form of categories. With the second, an accurate value for age can be derived, presuming of course the respondent answers truthfully. A graph of data from the question in figure 4 should properly be a bar chart; a graph of data from the question in figure 5 should be a histogram. The problem is not that there are different types of data. The problem that leads to some of the confusion is the lack of discussion of data types. This confusion is further complicated when some examples of bar graphs show no spaces between the bars. For example, Hillman (2009) found an example of a middle school textbook that provided a bar graph with no spaces between bars along with another bar graph that had equal spaces between bars. If teaching tools broke numerical data down into sub-categories and considered how the data were collected, much of the confusion about bar graphs and histograms could be clarified. Axis labelling The collection of quantitative data into interval bins and resultant labelling of the x-axis is Fig. 4. A survey question where Age becomes a categorical variable How were the data collected? Further complicating the above discussion on number types is a consideration (or lack thereof) about how the data were collected. A person’s age in years is on a ratio scale (a person 20 years old has lived twice as long as a 10-year old). Consider the following two examples from surveys received by one of the authors. Both questions ask for the © 2013 The Authors Teaching Statistics © 2013 Teaching Statistics Trust ●●, ●●, pp ●●–●● Fig. 5. A survey question where Age becomes a quantitative variable Patricia B. Humphrey another area of concern we encountered in our quest to find out why students might be having problems with bar graphs and histograms. The most common labelling problem stems from using intervals such as 0–5, 6–10 and 11–16 when collecting data by hand into frequency charts from which to make the histogram; the endpoints of the intervals were typically defined by the (rounded) accuracy of the data. We found examples of this in such texts as Navidi and Monk (2013) and Triola (2014) (as shown in figure 6b). This labelling gives students the impression that the bars do not have to be connected and that a histogram is not for continuous data. Where does the observation 54.5 go with this labelling system? A better approach (as shown in figure 6b) is given by Peck and Devore (2012); there is no ambiguity about where any value might go, and for these continuous data, no gaps between bars are indicated. Even when data values must be integers (IQ scores, for example), intervals should be constructed, so there is no ambiguity about the connectedness of the bars. When texts or websites emphasize the continuous nature of data, there is still confusion over the endpoints of the interval. Where do one bar end and the other begin? The height of each bar is definitely affected by the endpoints of the bar, yet different sources offer different approaches. The answer to this question is not obvious and is exacerbated by the fact that different statistical software packages use different placement (for example, Minitab includes the left end of the interval in the bar, whereas SPSS includes the right end). The answer to this question is to not focus too much on the issue; as long as the graph is consistent, whether the left (or right) end of the interval is included is up to the individual (and his or her software). Because a bar graph provides a visual representation of categorical data, the placement of the label is not really an issue. Most teaching tools label the bar in the middle. However, for histograms, references label the bar either in the middle or at the beginning (software varies as well); the number used to label the bar also varies on the basis of the location of that label. All of which leads to students’ confusion. Further exacerbating this is the ‘TI-83 effect’. All too often, students copy these histograms onto paper by using bar labels exactly as shown on the calculator, instead of converting to a number line. This conversion of the number line into categories is also shown in such texts as Mann (2010: p. 40). We must constantly remind students that histograms are for numeric data; there is an inherent ordering that is reflected in the familiar number line used in other graphs (such as scatterplots) and that the bars really represent intervals of observation values and not merely categories. Shape One reason histograms are constructed is to look at the distribution of the data. On the other hand, bar graphs are constructed to provide a visual display of the counts of nominal or ordinal data. Because of the nature of the data, the bars on a bar graph can be placed in any order. When constructing a bar graph, the person creating the graph can change the shape by changing the placement of the categories. This can cause students to have misconceptions that shape is a characteristic of a bar graph or that rearrangement of bars in a histogram is possible. For example, in the case of figure 1, some students might say the bar graph is skewed right or that figure 2 is symmetric. Since histograms are constructed to look at the numeric distributions of data, it is important that the distribution be reasonably portrayed. If the bin width is simply maximum–minimum values, there is only one bar and no valuable information is obtained. If the bin width is simply each observation, again no relevant information is gathered. Fig. 6. A comparison of creating frequency distributions. The table on the left is from Peck and Devore (2012), Statistics: The Exploration and Analysis of Data (7th ed.), p. 117. The table on the right is from Triola (2014), Elementary Statistics (12th ed.), p. 51 © 2013 The Authors Teaching Statistics © 2013 Teaching Statistics Trust ●●, ●●, pp ●●–●● Bar graphs and histograms Bin width and the shape of the histogram are inextricably linked. Students must realize the importance of bin width in determining the shape of the histogram. WHAT CAN WE DO? One of the main problems is the lack of consistency in the terminology and displays of bar graphs and histograms in current resources. The best way to alleviate this problem is through established standards and clear definitions. The Guidelines for Assessment and Instruction in Statistics Education report (Franklin et al. 2007) established standards for the K–12 curriculum. Essentially, the same recommendations were stated in the Teaching Statistics in British Secondary Schools report by Davies et al. (2012), which included one telling quote from a post-graduate student (teacher): ‘I’m clear on the curriculum, but not how to teach it’. If statisticians can assemble and reach a consensus on curriculum issues, they should consider the importance of providing guidance on teaching bar graphs and histograms (and other topics) so that teachers in the field are clear on how to teach them. This will help students reach competence in stated goals of being able to critically evaluate newspaper and magazine accounts of statistics, and graphs of data. Until authors and publishers incorporate those ideas and standards into textbooks and web resources, the problems students have in understanding bar graphs, histograms and the differences between them will continue to plague those of us trying to teach statistics. What we can do as teachers of statistics is keep the standards in mind when selecting course materials, continually emphasize to students the nature of the data we deal with and promote wise choices in the construction of these graphs. REFERENCES Bakker, A. and Gravemeijer, K. (2004). Learning to reason about distribution. In: D. Ben-Zvi and J. Garfield (eds.) The Challenge of Developing Statistical Literacy, Reasoning, and thinking, pp. 147–168. The Netherlands: Kluwer. Dordrecht. Brase, C. H. and Brase, C.P. (2012). Understandable Statistics: Concepts and Methods (10th edn). Boston: Brooks/Cole Cengage Learning. Chance, B., delMas, R. and Garfield, J. (2004). Reasoning about sampling distributions. Challenge of Developing Statistical Literacy, © 2013 The Authors Teaching Statistics © 2013 Teaching Statistics Trust ●●, ●●, pp ●●–●● Reasoning, and thinking, pp. 295–323. The Netherlands: Kluwer. Dordrecht. Cooper, L. L. and Shore, F.S. (2010). The effects of data and graph type on concepts and visualizations of variability. Journal of Statistics Education, 18(2). Retreived September 17, 2010, from http://www.amstat.org/publications/jse/v18n2/kader.html Davies, N., Marriott, J., Gadsden, R. and Bidgood, P. (2012). Teaching Statistics in British Secondary Schools a Research Report for the Teaching Statistics Trust, from https://www.rss.org.uk/ uploadedfiles/userfiles/files/RSSCSE-Teaching% 20Statistics%20Trust%20-%20Teaching%20 Stats%20in%20British%20Secondary%20Schools %20report.pdf delMas, R., Garfield, J., Ooms, A. and Chance, B. (2007). Assessing students’ conceptual understanding after a first year course in statistics. Statistics Education Research Journal, 6(2), 28–58. Franklin, C., Kader, G., Mewborn, D., Moreno, J., Peck, R., Perry, M. and Scheaffer, R. (2007). Guidelines for Assessment and Instruction in Statistics Education Report. Alexandria, VA: American Statistical Association. Gnanadesikan, M, Schaeffer, R.L. and Swift, J (1986). Art and Techniques of Simulation. Palo Alto, CA: Dale Seymour Publications. Hammerman, J.K. and Rubin, A. (2004). Strategies for managing statistical complexity with new software tools. Statistics Education Research Journal, 3(2), 17–41. Hillman, S. (2009). Exploring the confusions: Bar graphs. Paper presented at the 3rd International Conference to Review Research on Science, Technology and Mathematics Education, Mumbai, India. iCoachMath.com. Retrieved May 28, 2013 from http://www.icoachmath.com/math_dictionary/ bar_graph.html Konold, C. and Higgins, T. (2003). Reasoning about data. In: J. Kilpatrick, W. G. Martin and D. Schifter (eds.) A Research Companion to Principles and Standards for School Mathematics, pp. 193–215. Reston, VA.: National Council of Teachers of Mathematics. Landwehr, J.M. (1986). Exploring Surveys and Information from Samples. Palo Alto, CA: Dale Seymour Publications. Landwehr, J.M. and Watkins, A.E. (1986). Exploring Data. Palo Alto, CA: Dale Seymour Publications. Mann, P. S. (2010). Introductory Statistics (7th edn). Hoboken, NJ: John Wiley & Sons, Inc. Math is Fun.com Retrieved May 28, 2013 from http://www.mathsisfun.com/definitions/histogram.html Patricia B. Humphrey McClain, K., Cobb, P. and Gravemeijer, K. (2000). Supporting students’ ways of reasoning about data. In: M. Burke and Curcio F. (eds.) Learning Mathematics for a New Century, 2000 Yearbook. Reston VA: National Council of Teachers of Mathematics. National Council of Teachers of Mathematics. (1989). Curriculum and Evaluation Standards for School Mathematics. Reston, VA: Author. National Council of Teachers of Mathematics. (2000). Principles and Standards for School Mathematics. Reston, VA: Author. Navidi, W. and Monk, B. (2013). Elementary Statistics. New York: McGraw-Hill. Newman, C.M., Obremski, T.E., Schaeffer, R.L. (1986). Exploring Probability. Palo Alto, CA: Dale Seymour Publications. Peck, R. and DeVore, J.L. (2012). Statistics: The Exploration and Analysis of Data (7th edn). Boston: Brooks/Cole, Cengage Learning. Starnes, D.S., Yates, D.S., Moore, D.S. (2012). The Practice of Statistics (4th edn). New York: W. H. Freeman and Co.. Triola, M. (2005). Elementary Statistics Using the Graphing Calculator: For the TI-83/84 Plus. Boston: Pearson. Triola, M. (2014). Elementary Statistics (12th edn). Boston: Pearson Education, Inc. © 2013 The Authors Teaching Statistics © 2013 Teaching Statistics Trust ●●, ●●, pp ●●–●●
© Copyright 2024