Commentariesare informative essays dealingwithviewpointsof statisticalpractice,statisticaleducation, and othertopicsconsideredto be of generalinterestto the board readershipof The AmericanStatistician.Commentariesare similarin spiritto Lettersto theEditor,but theyinvolvelongerdiscussionsof background,issues, and perspectives. All commentarieswill be refereedfor their meritand compatibilitywiththese criteria. How to Display Data Badly HOWARD WAINER* Methods for displayingdata badly have been developing formanyyears,and a wide varietyof interesting and inventiveschemeshave emerged.Presentedhere is a synthesisyieldingthe 12 most powerfultechniques thatseem to underliemanyof the realizationsfoundin practice.These 12 (the dirtydozen) are identifiedand illustrated. KEY WORDS: Graphics; Data display;Data density; Data-ink ratio. categorized. This article is the beginningof such a compendium. The aim of good data graphicsis to displaydata accuratelyand clearly.Let us use thisdefinitionas a starting pointforcategorizingmethodsof bad data display.The definitionhas threeparts. These are (a) showingdata, (b) showing data accurately, and (c) showing data clearly.Thus, ifwe wishto displaydata badly,we have three avenues to follow. Let us examine them in sequence, parse themintosome oftheircomponentparts, and see if we can identifymeans for measuringthe success of each strategy. 2. SHOWING DATA 1. INTRODUCTION The displayof data is a topic of substantialcontemporaryinterestand one thathas occupied the thoughts of manyscholarsforalmost200 years.Duringthistime there have been a numberof attemptsto codifystandards of good practice (e.g., ASME Standards 1915; Cox 1978; Ehrenberg 1977) as well as a number of books that have illustrated them (i.e., Bertin 1973,1977,1981; Schmid 1954; Schmid and Schmid 1979; Tufte 1983). The last decade or so has seen a tremendousincreasein the developmentof new display techniquesand tools thathave been reviewedrecently (Macdonald-Ross 1977; Fienberg 1979; Cox 1978; Wainer and Thissen 1981). We wish to concentrateon methodsof data displaythatleave the viewersas uninformedas theywerebeforeseeingthedisplayor, worse, those thatinduce confusion.Althoughsuch techniques are broadlypracticed,to my knowledgetheyhave not as yet been gatheredinto a single source or carefully Obviously,if the aim of a good displayis to convey information, the less information carriedin the display, Change in Science Achievement of 9-, 13-,and 17-Year-Olds,by Type of Exercise: 1969-1977 Changeinpercentcorrect 1, O1 A ,,i,,,, biologicalscience _. Physical science 9-YEAR-OLDS s ss llll.||| ________ -2 -3 _ 13-YEAR-OLDS 0I -2 -3 _ -4 _5 *Howard Waineris Senior Research Scientist,Educational Testing Service,Princeton,NJ08541. This is the textof an invitedaddressto the AmericanStatisticalAssociation. It was supportedin partby the ProgramStatisticsResearch Project of the Educational TestingService. The authorwould like to expresshis gratitudeto the numerous friendsand colleagues who read or heard this article and offered valuable suggestionsfor its improvement.Especially helpfulwere David Andrews, Paul Holland, Bruce Kaplan, James 0. Ramsay, Edward Tufte, the participantsin the StanfordWorkshopon Advanced Graphical Presentation,two anonymousreferees,the longassociate editor,and Gary Koch. suffering -6 -2 -h -4 1969 Is_ _ 1970 17-YEAR-OLDS = _ _ _ _ _ _ _ __ _ _ _ _ a ............... ...1 1_ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _ _ 1973 1977 Figure 1. An example ofa low densitygraph (fromS13 [ddi = .3]). C) The AmericanStatistician,May 1984, Vol. 38, No. 2 137 Schools PublicandPrivate Elementary Selected Years1929-1970 m Public -Prjvale d0oi Schools Thousan 0.8 F 300 2 i_ 0. - 0 t.4 0.0 0.6 LOCATION DIFFERENCE: 1929-30 1.0 0.Z the worse it is. Tufte(1983) has devised a scheme for in displays,called measuringthe amountof information the data densityindex (ddi), whichis "the numberof numbersplotted per square inch." This easily calcuIn popular informative. lated indexis oftensurprisingly and technicalmedia we have founda range from.1 to 362. This provides us with the firstrule of bad data display. jyy US.vsJapan _~~~~~~~~~~~~~. pe r mon-ho urmQanus in 15- 14 - 13C,, Is * 12C/ 70% 62.3%/ ur ngfn cia reta-oU 1930 1940 1950 1960 1970 10 _ upt 44%/ 9 0" Post)with graph(? 1978,TheWashington Figure3. Alowdensity tofillinthespace (ddi = .2). chart-junk 138 1969-70 THENUMBER SCHOOLS OFPRIVATE ELEMENTARY FROM1930-1970 What does a data graphicwitha ddi of .3 look like? Shown in Figure 1 is a graphicfromthe book Social IndicatorsIII (S13), originallydone in fourcolors (original size 7" by9") thatcontains18 numbers(18/63= .3). The median data graphin S13 has a data densityof .6 thisone is notan unusualchoice. Shownin numbers/in2; Figure 2 is a plot fromthe article by Friedman and Rafsky(1981) witha ddi of .5 (it shows4 numbersin 8 100%-moutpu 1959-60 in2).Thisis unusualforJASA,wherethemediandata ofthis graphhasa ddiof27. In defenseoftheproducers plot,thepointofthegraphis to showthata methodof analysissuggestedby a criticof theirpaperwas not I suspectthatprosewouldhaveworkedpretty fruitful. wellalso. canbe madethathighdatadenarguments Although willbe good,norone sitydoes notimplythata graphic of on theefficiency bad,itdoesreflect withlowdensity ifwe hold ofinformation. Obviously, thetransmission and accuracyconstant, moreinformation is betclarity Rule 1-Show as Few Data as Possible (Minimize the Data Density) .00 1949-50 SchoolYear Figure 4. Hidingthe data in the scale (fromS13). 3 JNIT Figure 2. A low densitygraph (fromFriedmanand Rafsky1981 [ddi = .5]). Labor 1939-40 1930 1940 1950 1960 9.275 10.000 10.375 13.574 14.372 1910 Figure5. Expandingthescale and showingthedata inFigure4 (from S13). (? The AmericanStatistician, May 1984, Vol. 38, No. 2 A New Set ofProjectins forthe U.S. Supplyof Energy Compared are two proctlons ot United State *rtrgy upply In th, y.r 2000 made by the Pftedynt s Council of EnvirnonottalOuallty and th ectual 1977 supply Alltigurasar * inquads a uunitaootm"aurmomntthat reprnt millionbillin-on quadrilion- Britlsh thettal unlts (8T U a), a standard masure otfergy . T0~~i_ tal 777 1977 M~~~~~~S 5 5_ / 2000 - AEr,ph syes,egy c_e,to I U 42 -14 -1 a t Nca, Solad '40 Total 1a - ,l&U^,* (inmillions ofU S dollars) 3,000 6,000 U.S. exports to China 4,000 U.S. exports to Taiwan U.S. imports fromChina 1 9 2000 1,000 * l ____ U.S. imports fromTaiwan 2,000 19 2000-8 Erphas.zes.rc,eased 37d erergyoduct.on 1.: Cmnlions ofU.S dollars) Coal ' Tutal 05 7 7 7) O.la,daa* .N (1979 The New York Times Figure 6. Ignoringthe visual metaphor (? 1978, The New York Times). ter than less. One of the great assets of graphical techniques is that theycan convey large amounts of information in a small space. We note that when a graph contains littleor no information the plot can look quite empty (Figure 2) and thus raise suspicions in the viewer that there is nothing to be communicated. A way to avoid these suspicions is to fill up the plot with nondata figurations-what Tufte has termed "chartjunk." Figure 3 shows a plot of the labor productivity of Japan relative to that of the United States. It contains one number for each of three years. Obviously, a graph of such sparse information would have a lot of blank space, so filling the space hides the paucity of information from the reader. A convenient measure of the extent to which this practice is in use is Tufte's "data-ink ratio." This measure is the ratio of the amount of ink used in graphing the data to the total amount of ink in the graph. The closer to zero this ratio gets, the worse the graph. The notion of the data-ink ratio brings us to the second principle of bad data display. Rule 2-Hide WhatData You Do Show (MinimizetheData-Ink Ratio) One can hide data in a variety of ways. One method that occurs with some regularityis hiding the data in the grid. The grid is useful for plotting the points, but only rarely afterwards.Thus to display data badly, use a fine grid and plot the points dimly (see Tufte 1983, pp. 94-95 for one repeated version of this). A second way to hide the data is in the scale. This corresponds to blowing up the scale (i.e., looking at the data from far away) so that any variation in the data is obscured by the magnitude of the scale. One can justify this practice by appealing to "honesty requires that we start the scale at zero," or other sorts of sophistry. In Figure 4 is a plot that (from S13) effectivelyhides the growth of private schools in the scale. A redrawing of the number of private schools on a differentscale conveys the growththat took place duringthe mid1950's (Figure5). The relationshipbetweenthisriseand Brownvs. TopekaSchoolBoardbecomes an immediate question. To conclude thissection,we have seen thatwe can displaydata badlyeitherbynotincludingthem(Rule 1) 1972 1974 1976 1978 1980 1970 1972 1974 1976 1978 1980 Source DpartmentofCommerce Figure 7. Reversing the metaphorin mid-graphwhilechanging scales on both axes (? June 14, 1981, The New YorkTimes). or by hidingthem(Rule 2). We can measuretheextent to whichwe are successfulin excludingthedata through the data density;we can sometimesconvince viewers that we have included the data throughthe incorporationof chartjunk.Hidingthe data can be done either by usingan overabundanceof chartjunkor by cleverly choosingthe scale so thatthe data disappear. A measure of the success we have achieved in hidingthe data is throughthe data-inkratio. 3. SHOWING DATA ACCURATELY The essence of a graphicdisplayis thata set of numbers havingboth magnitudesand an order are represented by an appropriatevisual metaphor-the magnitude and order of the metaphoricalrepresentation matchthenumbers.We can displaydata badlybyignoring or distortingthisconcept. Rule 3-Ignore the Visual MetaphorAltogether If the data are orderedand ifthevisualmetaphorhas a naturalorder,a bad displaywill surelyemergeifyou shufflethe relationship.In Figure 6 note thatthe bar labeled 14.1 is longerthanthe bar labeled 18. Another methodis to changethemeaningofthemetaphorin the middleof the plot. In Figure7 the dark shadingrepresentsimportson one side and exportson theother.This is but one of the problemsof thisgraph; more serious stillis the change of scale. There is also a differencein the timescale, but thatis minor.A commonthemein Playfair's(1786) work was the differencebetweenimports and exports. In Figure 8, a 200-year-oldgraph tellsthe storyclearly.Two such plots would have illustratedthe storysurroundingthis graph quite clearly. Rule 4-Only Order Matters One frequenttrickis to use lengthas thevisualmetaphorwhenarea is whatis perceived.Thiswas used quite effectively by The WashingtonPost in Figure 9. Note thatthisgraphalso has a low data density(.1), and its data-inkratio is close to zero. We can also calculate Tufte's (1983) measure of perceptual distortion(PD) forthisgraph.The PD in thisinstanceis the perceived ?) The AmericanStatistician,May 1984, Vol. 38, No. 2 139 C IIA ItTT & 1511'01TS8I E-xPORT.i rLA_A E-.N; C_ ~ ......... . - ~ ~ ~ ~ ~ 5tf . eariler(from 1786). Playfair Figure8. A ploton thesame topicdone welltwocenturies to changein thevalueof thedollarfromEisenhower Carterdividedbytheactualchange.I readandmeasure thus: Til E tIXITEI, S1A'M1IS, (WFAMIEICiA.A E 1430I3632 Measured Actual U5 5 1.00- .44 =44 1.27 5 PD = 9.68/1.27= 7.62 1958- ESENHOWER: $1. TiE FAtV ' butby no This distortion of over700% is substantial meansa record. A less distorted viewof thesedata is providedin Figure10. In addition,the spacingsuggestedby the 0 94c 1963 - KENNEDY: pw~~~~~~~~~~~ E I SENHOWER KENNE DT JOHNSON t 4 362tX 0.8 1968- JOHNSON:53Uc IN: 22.00- 2.06 96 2.06 I 1TEDISTA\TE:SM'A3.11:11 . _ ~0. 4 =0.2 CC ofthel lXnlshllng r4 ffiN^>: 0.2 Dollar rc:LuborDportment 0. O.I 44CAR 1978-CTER: (August) Figure 9. An example of how to goose up the effectby squaring the eyeball (? 1978, The WashingtonPost). 140 I 1958 1963 1968 YERR 1973 I 1978 Figure10. Thedata inFigure9 as an unadornedlinechart(from Wainer,1980). ? The AmericanStatistician, May 1984, Vol. 38, No. 2 presidentialfaces is made expliciton the time scale. Rule 5-Graph Data Out of Context Often we can modifythe perceptionof the graph (particularlyfortimeseriesdata) by choosingcarefully theintervaldisplayed.A precipitousdropcan disappear if we choose a startingdate just afterthe drop. Similarly,we can turnslightmeandersintosharpchangesby focusingon a singlemeanderand expandingthe scale. but can have proOftenthe choice of scale is arbitrary of foundeffectson theperception thedisplay.Figure11 shows a famous example in which PresidentReagan viewof theeffectsof histax cut. givesan out-of-context The Times'alternativeprovidesthecontextfora deeper understanding.Simultaneouslyomittingthe contextas well as any quantitativescale is the keyto the practice of Ordinal Graphics(see also Rule 4). Automaticrules do not always work, and wisdom is always required. In Section3 we discussedthreerulesforthe accurate displayofdata. One can compromiseaccuracybyignoringvisual metaphors(Rule 3), byonlypayingattention to the order of the numbersand not theirmagnitude (Rule 4), or by showingdata out of context(Rule 5). We advocated the use of Tufte'smeasureof perceptual distortionas a wayof measuringtheextentto whichthe accuracyof the data has been compromisedby the disthatwouldallow it play. One can thinkof modifications to be applied in other situations,but we leave such expansion to otheraccounts. 4. SHOWING DATA CLEARLY In this section we discuss methods for badly displayingdata that do not seem as serious as those deYORK THE NEW TIMES, Paymentsunderthe Ways and Means Committeeplan $2500 2000 1 ??? w $ NEWS 1,829,000 ; 1,636,000 2000 % 1,500,000(- INCOME - ... 1986 $ 1,555,000 - S20.000 1982 11W0 m The soaraway Post the daily paper New Yorkers trust 1 700,000 AVERAGE FAMILY by SOfl4Nfl~ York Post compared to the plummetingDaily News circulation.The reason given is that New Yorkers "trust"the Post. It takesa carefullook to note the 700,000jump thatthe scale makesbetweenthe two lines. In Figure13 is a plot of physicians' incomesover time.It appearsto be linear,witha slighttapering off inrecentyears.A carefullookat thescaleshowsthatit starts outplotting everyeightyearsandendsupplotting yearly.A moreregularscale(in Figure14) tellsquitea different story. 1,800000 $2500 YOUR TAXES Tawpld Thisis a powerful techniquethatcan makelargediflooksmallandmakeexponential ferences changeslook linear. In Figure12 is a graphthatsupportstheassociated storyabout the skyrocketing circulation of The New __ Paymentsundr the Prdentfs proposW 1500 Rule 6-Change Scales in Mid-Axis 1,900,000. 2, 1981 SUNDAY, AUGUST thatis, thedata are displayed, scribedpreviously; and theymight evenbe accurateintheirportrayal. Yetsubtle(and notso subtle)techniques can be usedto effectivelyobscurethe mostmeaningful or interesting aspectsofthedata.Itismoredifficult toprovideobjective measuresof presentational clarity, butwe relyon the readerto judgefromtheexamplespresented. bu,000 - 1,491,000 - THEIR ~~~~~~~BILL 500l OUR ... BILL k00t0 - E~~~17 .:_ Fiur 12. Chngn 1982 1983 1984 1985 .scl -_ 197 __ 198 ~ a., :a1.... 198 1982_-- .-E inmid-ai tomak lag differences. 198 scale norcontext Figure11. The WhiteHouse showingneither withpermission). (? 1981,The NewYorkTimes,reprinted May 1984, Vol. 38, No. 2 ? The AmericanStatistician, 141 ofDoctors hIcomes Vs.Other Profesionals (MEDIANNETINCOMES) Median Income of Year-Round, Full-Time Workers 25 to 34 Years Old, by Sex and Educational Attainment:1961977 Constant 1977 dollars $20,000 $18000 SOURCE:Council onWageandPrice Stability 54,14 5,4 1i 50,823 _ _, ~~, I~-- ,_~_ _s r - $8,000 - $4,000 - $2,000 25,050 years or 6to16 M ~ ... $14,001 A $ l0 1951 1955 1963 1965 1967 1970 1972 1973 1974 1975 1976 Figure 13. Changingscale inmid-axistomake exponentialgrowth linear (? The WashingtonPost). Rule 7-Emphasize the Trivial(Ignore theImportant) Sometimesthe data thatare to be displayedhave one importantaspect and othersthatare trivial.The graph can be made worse by emphasizingthe trivialpart. In Figure 15 we have a page fromS13 that compares the incomelevels of men and womenby educationallevels. It reveals the not surprisingresultthatbettereducated individualsare paid betterthan more poorlyeducated ones and thatchangesacrosstimeexpressedin constant dollars are reasonably constant. The comparison of greatestinterestand currentconcern, comparingsalaries between sexes withineducation level, must be made clumsilyby verticallytransposingfromone graph to another.It seems clear that Rule 7 musthave been operatinghere,forit would have been easy to place the graphsside byside and allow the comparisonofinterest to be made more directly.Looking at theproblemfrom a strictly data-analyticpointof view, we note thatthere are two large main effects(education and sex) and a small time effect.This would have implied a plot that INCOMES OF DOCTORS VS. -__- -ale' - -|-- effct ~a own, larges _ _ _ - clal - pla a Feor $4,000 - _ _ $2,0 0 0 _ _ _ _ _ _ __ _ _ _ _ _ $0 1968 - _ _ _ _ _ _ - - an $1,000 9 - _ _ lees 1970 1972 1974 _ _ _ _ __ _ _ _ _ 1976 1978 S13). showedthe large effectsclearlyand placed the smallish timetrendinto the background(Figure 16). 110 12_1`1F ,IA,A 1 '1 20- Males 16 C" 12 ~MalesV - Females 4 1964 1969 -maximum -median (uveriime) STRRTE0 n010. 1974 YEAR Figure14. Data fromFigure13 redonewithlinearscale (from Wainer1980). C) The AmericanStatistician,May 1984, Vol. 38, No. 2 ~~~~~~~~~~~~~~~Female -' Legend z30/ 1959 e MEDIAN INCOME OFYEAR-ROUND FULL TIME WORKERS 25-34YEARS OLDBYSEXANDEDUCATIONAL ATTAINMENT: 1968-1977 1977DOLLARS) (INCONSTANT 8 1954 1980 differencesin income throughthe verticalplacement ofplots (from PROFESSIONALS 1949 y ars _ thetrivial: ofsex Hidingthemaineffect Figure15. Emphasizing DOCTORS OTHER 19414 - ic 20 OTHER PROFESSIONRLS H~~~~~~~~~~~EOICRRE d- eard thessals o _ _ I-- 142 _ - SW_L71V - showed~~ -60 cso 1939 ein meo ino 2 1 year _ _ 710 z ore 13 ro 1,5yeror -- 8,744 z20 2 = 16,107 $3,262 n50 = --LIL16eaorm s__ -_ >_ __ _ $1200 34,740 = _| ~~~ ____ ~ -n = $10,0000 46,780 1939 1947 s_ ~~~~ ~~ mm L A__ $12,000 43,100 13,150 ? = $16000 _ 62.799 - . $14,000 OFFICED-BASED NONSALARIEDPHYSICIANS MALE - BIL. LB. 2.5 LAMB MUTTON ........ . ... 1.5 ....... ...... Canada, 1970-1972 --- - 1.0-...: - AN . .;; ' : .: : :: : ',:: BEEF -:... o~~~~~~~.'.:;::-: 0. .;.... VEAL ....: .. ::.;. Finland,1974 .: ,- :. ' : : : : : : :; ..,...,...; 'X'-* ....,,.;.-France,1972 1960 Female HIM Austria,1974 1975 ' , ,\ : : :, : Female AND GOATMEAT 2.0 1 N. 0 ' N": m Male LifeExpectancy at Birth,by Sex, Selected Countres, Most Recent Available Year: 1970-1IMi U.S. IMPORTS OF RED MEATS 1963 i90 1969 1972 1975 1978 Germany(Fed Rep , ^~~~~~~~~~~~~~~~~~~~~~~~~~a 1973-1975 FA *eAftcA WG, r EOUIVALENT 1 '.:,: ill i -:-@:-:-:9o Figure 17. Jigglingthebaseline makes comparisonsmore difficult 4 | R l Japan, 1974 Charts). (fromHandbookofAgricultural U S S R., 1971-1972) i S S Rule 8-Jiggle theBaseline Making comparisonsis alwaysaided when the quantitiesbeingcomparedstartfroma commonbase. Thus we can alwaysmake the graphworse by startingfrom differentbases. Such schemes as the hangingor suspended rootogramand the residual plot are meant to facilitatecomparisons.In Figure 17 is a plot of U.S. importsof red meat takenfromthe Handbook of AgriculturalChartspublishedby the U.S. Departmentof Agriculture.Shadingbeneatheach line is a convention thatindicatessummation,tellingus thatthe amountof each kind of meat is added to the amounts below it. Because of the dominance of and the fluctuationsin importationof beef and veal, it is hard to see whatthe changesare in theotherkindsof meat-Is the importation of porkincreasing?Decreasing? Stayingconstant? The onlypurposeforstackingis to indicategraphically the total summation.This is easily done throughthe addition of another line for TOTAL. Note that a TOTAL willalwaysbe clear and willneverintersectthe otherlineson theplot. A versionof thesedata is shown U.S. IMPORTS OF RED rMEATS* BIL. LB.- 2.5 e-_____~~_ POkK 101960 Source: Chart 1963 1966 1969 Charts , Handbook of Agri_ultural 1976, p. 93. Agriculture, Source: 1972 U.S . 1978 1975 Department of Origzinal line versionofFigure17 witha straight Figure18. Analternative used as thebasis ofcomparison. Sweden, 1971-1975 UnitedKingdom,1970-1972 UnitedStates, 1975 0 50 60 70 80 90 Years of lifeexpectancy Figure 19. AustriaFirst!Obscuring the data structureby alphabetizingthe plot (fromS13). in Figure18 withtheseparate amountsof each meat,as well as a summationline, shown clearly. Note how easily one can see the structureof importof each kind of meat now that the standard of comparison is a straightline (the time axis) and no longerthe import amountof those meats withgreatervolume. Rule 9-Austria First! Ordering graphs and tables alphabeticallycan obin thedata thatwouldhave been obvious scurestructure had the display been ordered by some aspect of the data. One can defend oneself against criticismsby pointingout thatalphabetizing"aids in findingentries of interest."Of course,withlistsof modestlengthsuch aids are unnecessary;with longer lists the indexing schemescommonin 19thcenturystatisticalatlases provide easy lookup capability. Figure 19 is anothergraphfromSf3 showinglifeexpectancies,dividedby sex, in 10 industrializednations. The order of presentationis alphabetical (with the USSR positionedas Russia). The messagewe getis that thereis littlevariationand thatwomenlive longerthan men. Redone as a stem-and-leafdiagram(Figure 20 is simplya reorderingof the data with spacing proportional to the numericaldifferences),the magnitudeof leaps out at us. We also note thatthe the sex difference USSR is an outlierformen. Rule JO-Label (a) Illegibly,(b) Incompletely, and (d) Ambiguously (c) Incorrectly, There are manyinstancesof labels thateitherdo not C) The AmericanStatistician,May 1984, Vol. 38, No. 2 143 LIFE EXPECTANCYAT BIRTH, BY SEX, MOST RECENTAVAILABLEYEAR YEARS WOMEN SWEDEN FRANCE, US, JAPAN, CANADA FINLAND, AUSTRIA, UK USSR,GERMANY 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 673 62 r3Ef Commtssion Payqrents to Travel Agents 15o m L 1 20- L JUNITE I SWEDEN JAPAN 0 N CANADA,UK,US, FRANCE GERMANY,AUSTRIA FINLAND 9 0- TWA 5 E A S TERN 0 F 60D ELTA D 0 USSR L 30- L I Figure 20. Orderingand spacing the data fromFigure 19 as a stem-and-leaf diagram provides insights previously difficultto extract(fromS13). A R 5 0 1976 tell the whole story,tell the wrongstory,tell two or more stories,or are so small thatone cannotfigureout whatstorytheyare telling.One ofmyfavoriteexamples of small labels is fromThe New York Times (August 1977 1978 (ea t I ma t e d Y EAR Figure 22. Figure 21 redrawn with 1978 data placed on a comparable basis (fromWainer1980). 1978), in whichthearticlecomplainsthatfarecutslower commissionpaymentsto travelagents.The graph(Figure 21) supportsthisviewuntilone noticesthetinylabel indicatingthatthe small bar showingthe decline is for just the firsthalfof 1978. This omitssuch heavytravel periodsas Labor Day, Thanksgiving, Christmas,and so on, so thatmerelydoublingthe first-half data is probablynot enough. Nevertheless,whenthisbar is doubled (Figure 22), we see thatthe agentsare doingverywell indeed comparedto earlieryears. ToTravel Agents In lkosoldofdlAer $57~~~~~~~~~~~~~~~'6 Rule 11-More Is Murkier:(a) More Decimal Places and (b) More Dimensions O web ~~E,ASTIEEN discount of d and airlines' telephone unITE=D s areras fars (ravel agents'overhead,offsetting revenuegainsfromhighervolume. Figure21. Mixing a changedmetaphor witha tiny labelreverses themeaningofthedata (? 1978,The NewYorkTimes). 144 We oftensee tables in whichthe numberof decimal places presentedis farbeyondthe numberthatcan be perceived by a reader. They are also commonly presentedto show more accuracythan is justified.A displaycan be made clearerbypresentingless. In Table 1 is a sectionof a table fromDhariyal and Dudewicz's (1981) JASA paper. The table entriesare presentedto five decimal places! In Table 2 is a heavilyrounded versionthatshowswhatthe authorsintendedclearly.It also shows thatthe variouscolumnsmighthave a substantialredundancyin them (the maximumexpected gain withb/c = 10 is about 1/10ththatof b/c = 100 and 1/100th thatof b/c = 1,000). If theydo, theentiretable could have been reduced substantially. Justas increasingthe numberof decimal places can make a table harderto understand,so can increasing the number of dimensionsmake a graph more con- (C The AmericanStatistician,May 1984, Vol. 38, No. 2 Table 1. OptimalSelection Froma Finite Sequence WithSampling Cost b/c = 10.0 N r* 3 4 5 6 7 8 9 10 2 2 2 3 3 3 3 4 100.0 (GN(r*) - a)/c .20000 .26333 .32333 .38267 .44600 .50743 .56743 .62948 1,000.0 r* (GN(r*) - a)/c r* (GN(r*) - a)/c 2 2 3 3 3 4 4 4 2.22500 2.88833 3.54167 4.23767 4.90100 5.57650 6.26025 6.92358 2 2 3 3 3 4 4 4 22.47499 29.13832 35.79166 42.78764 49.45097 56.33005 63.20129 69.86462 NOTE: g(Xs + r - 1) = bR(Xs + r - 1) + a, ifS =s, and g(Xs +r - 1)= 0, otherwise. Source: Dhariyaland Dudewicz (1981). fusing.We have alreadyseen how extradimensionscan cause ambiguity(Is it lengthor area or volume?). In addition, human perceptionof areas is inconsistent. Justwhatis confusingand whatis notis sometimesonly a conjecture,yet a hintthat a particularconfiguration willbe confusingis obtainedifthe displayconfusedthe grapher.Shownin Figure23 is a plot of per shareearningsand dividendsovera six-yearperiod. We note (with some amusement)that 1975 is the side of a bar-the third dimension of this bar (rectangular parallelopiped?) charthas confusedtheartist!I suspectthat1975 is reallywhatis labeled 1976, and the unlabeled bar at the end is probably1977. A simpleline chartwiththis interpretation is shownin Figure 24. In Section4 we illustratesix morerulesfordisplaying data badly. These rulesfall broadlyunderthe heading of how to obscurethe data. The techniquesmentioned were to change the scale in mid-axis,emphasize the trivial,jiggle the baseline, orderthe chartby a characteristicunrelatedto the data, label poorly,and include more dimensionsor decimalplaces thanare justifiedor needed. These methodswillworkseparatelyor in combinationwith othersto produce graphs and tables of littleuse. Their commoneffectwill usuallybe to leave the reader uninformedabout the points of interestin the data, althoughsometimestheywill misinformus; the physicians'incomeplot in Figure 13 is a primeexample of misinformation. Finally,the availabilityof color usually means that there are additionalparametersthat can be misused. The U.S. Census' two-variablecolor map is a wonderful example of how using color in a graph can seduce us Table 2. OptimalSelection Froma FiniteSequence WithSampling Cost (revised) b/c = 10 b/c = 100 b/c = 1,000 N r* G r* G r* G 3 2 .2 2 2.2 2 22 4 2 .3 2 2.9 2 29 5 6 7 8 9 10 2 3 3 3 3 4 .3 .4 .4 .5 .6 .6 3 3 3 4 4 4 3.5 4.2 4.9 5.6 6.3 6.9 3 3 3 4 4 4 NOTE:g(Xs + r- 1) =bR(Xs + r - 1) + a, ifS = s, andg(Xs + r - 1) intothinkingthatwe are communicating more thanwe are (see Fienberg 1979; Wainer and Francolini1980; Wainer 1981). This leads us to the last rule. Rule 12-If It Has Been Done Wellin thePast, Thinkof AnotherWayto Do It The two-variablecolor map was done ratherwell by Mayr(1874), 100 yearsbeforethe U.S. Census version. He used bars of varyingwidthand frequencyto accomplish gracefullywhat the U.S. Census used varying saturationsto do clumsily. A particularlyenlighteningexperience is to look carefullythroughthe six books of graphsthatWilliam Playfairpublished duringthe period 1786-1822. One discoversclear, accurate, and data-laden graphs containingmanyideas thatare usefuland too rarelyapplied today. In the course of preparingthis article,I spent manyhours lookingat a varietyof attemptsto display Earns g~~~ PerSha LAnd DMndenldxs (Dollars) 1.71 111. 1.21.70 1.63 1.53 S I ,.S L....I < m . Earning 1~~~.....'.. ',. -:- ... '' .. .....1 D......d. Figure,"'.''. 23'.'''. An extra .. . .::: , :.:.: .: ::::::.:;:;: '' :..:.:. . . ... :: . . :. ::::::..: : ::: . . ::.. .:::: .::.: :. .: :::::: .: . . .cnueevnt . . ~::" ~.:::. '~ ~dimensn.. : :.: :, .: : :"',.. .: .. .;;:. . . ( : 1979,::::The Washington Post)::.:::::: :: : . . :.'. :: .:- . :.. . : '. . ' .::: :::::::: :::::::::;:::~ : .' .: . .::::: ~~~~~~~~~~.,. :'.::'.. :. . :.. ' .,:.. . . ':~~~~~~~~~~~ .. . . . . . . '.,...::~ . . . . . . . . . . . . . . . . .::::: . .. .::: .... .:::: ..... . .:: :: :: .: .:::::::: : .: .::: ::: Fiur .t .99 .::::::::::::::: .: .: . ; . : .: :: :: ::.:.. ..:: .:: :.. . :. : . ,'' ... .': . ". '. .: : ::: .:: .::::::. :: ::: .: . .. . .. . . . .::.. .' . . :.':: . . . . . . .. . . . . . . . .'' .:' . ~~ .:: ~~~~~~ ": . .': .,., ,,,., ...,., .::::::'' X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 73 972~~~~~~~~~~~~. ... .6.77 7S ...... l . .Dvded 1 .l gr .. ;::::::: . . . . :.: . .: ::: .: .. . . :.' ,. ,. ,.::" '::" ~~~~~~~~~~~~ .: .. .: : . . :'. .: ......... .4. ,. . ,..................... :':' .. ... .. . Eanig An.h xr 2. . .ieso Wahngo .h .Post).... cofue .grapher .eve .... .. .... ... 36 43 49 56 63 70 0, otherwise. ?D The AmericanStatistician,May 1984, Vol. 38, No. 2 145 2. 00 Beginning at the left on the Polish-Russian border near the Nieman River,the thickband showsthe size of the army(422,000 men) as it invaded Russia in June 1812. The widthof the band indicatesthe size of the armyat each place on the map. In September,the armyreached Moscow, whichwas by thensacked and deserted,with100,000men. The path of Napoleon's retreatfrom Moscow is depictedby the darker,lowerband, whichis linkedto a temperaturescale and dates at the bottomof the chart.It was a bitterlycold winter,and manyfrozeon the marchout of Russia. As the graphicshows, the crossingof the Berezina River was a disaster,and the armyfinallystruggledback to Poland withonly 10,000menremaining.Also shownare themovementsof auxiliary troops, as theysought to protectthe rear and flankof the advancingarmy.Minard'sgraphictellsa rich,coherentstorywithits thanjust a singlenumber multivariate data, farmore enlightening bouncingalong over time.Six variablesare plotted:the size of the surface,directionof the army,its location on a two-dimensional army'smovement,and temperatureon various dates duringthe retreatfromMoscow. It may well be the best statisticalgraphicever drawn. 1. 75 EornL n. I . 50 cca -J M 1.25 Di vXdend. 1.00 1972 1976 1974 Y E RR Figure 24. Data fromFigure 23 redrawn simply (fromWainer 1980). 5. SUMMING UP data. Some of the horrorsthat I have presentedwere the fruitsof thatsearch. In addition,jewels sometimes emerged. I saved the best for last, and will conclude withone of those jewels-my nominee forthe titleof "World's Champion Graph." It was produced by Minard in 1861 and portraysthe devastatinglosses sufferedby the Frencharmyduringthe course of Napoleon's ill-fatedRussian campaign of 1812. This graph (originallyin color) appears in Figure 25 and is reproducedfromTufte'sbook (1983, p. 40). His narrative follows. Althoughthe tone of thispresentationtended to be lightand pointed in the wrong direction,the aim is serious. There are manypathsthatone can followthat willcause deteriorating qualityof our data displays;the 12 rules that we described were only the beginning. Nevertheless,theypointclearlytowardan outlookthat providesmanyhintsforgood display.The measuresof display described are interlocking.The data density cannotbe highifthe graphis clutteredwithchartjunk; the data-inkratio growswiththe amount of data displayed; perceptualdistortionmanifestsitselfmost fre- dans la campagnede Russ\e1812-1813. CARTE FIGURATIVE des pertes successivesen hommesde l Armnee Franr&ise Ceneralt de. Ponts et Chausseesen retra'cte. Dresscepar M.Mtnard. Inspecteur C,~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0 0t'r'' = 6I)7ollzhtr e~~~~~~~~~ =I Awer G.&'&- j 0 - 2 0 1. Z 8 9 1 " -tie-, - 146 Mnar'sTABLE - - I )cc:iibtr - ofthis0initsoriginalcolor-p.176). q_ superb_ reproduction Tufte_ 1983_ Fiue 5 for_inr' (81)gah fte rnc ry' l-ftd oa it Tufe183for~ upeb eprducio ofths i is oigialcolr-. 16) Figue 2. gaUho tRAhe Urenh Army'sil-ftued Z~~~~~~~~~~~~~~~*~~~~~~~~*~~~~~~~ 9b~~~~~~~~~~~~~~~. 5$~~~~~~~~~~~~C -- ~~~~~~~~~~~~~~~~~ XIr = , N'~~~ciubcr usi-Acnidt deRedusieoa nt C The AmericanStatistician, May 1984, Vol. 38, No. 2 - - _ _ _ _ _ _ _ _ _ _ _ _r._ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ carmn~rdiae - orte il o Wrl' 10~~~~~~ October CaponGah"(e foa~r the ditesouf deWorldsCapo rp"(e quentlywhenadditionaldimensionsor worthlessmetaphorsare included.Thus, therulesforgood displayare quite simple. Examine the data carefullyenough to know what theyhave to say, and then let them say it witha minimumof adornment.Do thiswhilefollowing reasonableregularity practicesin thedepictionof scale, and label clearlyand fully.Last, and perhapsmostimportant,spend some time looking at the work of the mastersof the craft.An hour spent with Playfairor Minard will not only benefityour graphicalexpertise but will also be enjoyable. Tukey (1977) offers236 graphsand littlechartjunk.The workofFrancisWalker (1894) concerningstatisticalmaps is clear and concise, and it is trulya mysterythattheircurrentcounterparts do not make betteruse of the schemadeveloped a centuryand more ago. [ReceivedSeptember1982. RevisedSeptember1983.] REFERENCES BERTIN, J. (1973), Semiologie Graphique (2nd ed.), The Hague: Mouton-Gautier. (1977), La Graphique et le TraitementGraphique de l'Information,France: Flammarion. (1981), Graphicsand theGraphicalAnalysisof Data, translation, W. Berg, tech. ed., H. Wainer,Berlin: DeGruyter. COX, D.R. (1978), "Some Remarks on the Role in Statisticsof Graphical Methods," Applied Statistics,27, 4-9. DHARIYAL, I.D., and DUDEWICZ, E.J. (1981), "Optimal Selection From a FiniteSequence WithSamplingCost," Journalof the AmericanStatisticalAssociation,76, 952-959. EHRENBERG, A.S.C. (1977), "Rudimentsof Numeracy,"Journal of theRoyal StatisticalSociety,Ser. A, 140, 277-297. FIENBERG, S.E. (1979), Graphical Methods in Statistics, The AmericanStatistician, 33, 165-178. FRIEDMAN, J.H., and RAFSKY, L.C. (1981), "Graphics forthe MultivariateTwo-SampleProblem,"Journalof theAmericanStatisticalAssociation,76, 277-287. JOINT COMMITTEE ON STANDARDS FOR GRAPHIC PRESENTATION, PRELIMINARY REPORT (1915), Journal of theAmericanStatisticalAssociation, 14, 790-797. MACDONALD-ROSS, M. (1977), "How NumbersAre Shown: A Review of Research on the Presentationof QuantitativeData in Texts," Audiovisual CommunicationsReview,25, 359-409. MAYR, G. VON (1874), "Gutachen Uber die Anwendungder Graphischenund Geographischen,"Methodin der Statistik,Munich. MINARD, C.J. (1845-1869), Tableaus Graphiqueset CartesFigurativesde M. Minard,Bibliothequede l'Ecole Nationale des Pontset Chaussees, Paris. PLAYFAIR, W. (1786), The Commercialand PoliticalAtlas, London: Corry. SCHMID, C.F. (1954), Handbook of Graphic Presentation,New York: Ronald Press. SCHMID, C.F., and SCHMID, S.E. (1979), Handbook of Graphic Presentation(2nd ed.), New York: JohnWiley. TUFTE, E.R. (1977), "ImprovingData Display," University of Chicago, Dept. of Statistics. (1983), The Visual Display of QuantitativeInformation, Cheshire,Conn.: Graphics Press. TUKEY, J.W. (1977), ExploratoryData Analysis,Reading, Mass: Addison-Wesley. WAINER, H. (1980), "Making Newspaper Graphs Fit to Prinit,"in Processingof Visible Language, Vol. 2, eds. H. Bouma, P.A. Kolers, and M.E. Wrolsted,New York: Plenum, 125-142. , "Reply" to Meyerand Abt (1981), TheAmericanStatistician, 57. (1983), "How Are We Doing? A Review of Social Indicators III," Journalof theAmericanStatisticalAssociation,78, 492-496. WAINER, H., and FRANCOLINI, C. (1980), "An Empirical InquiryConcerningHuman Understandingof Two-VariableColor Maps," The AmericanStatistician,34, 81-93. WAINER, H., and THISSEN, D. (1981), "Graphical Data Analysis,"Annual Reviewof Psychology,32, 191-241. WALKER, F.A. (1894), Statistical Atlas of theUnitedStatesBased on theResultsoftheNinthCensus,Washington,D.C.: U.S. Bureau of the Census. ?) The AmericanStatistician,May 1984, Vol. 38, No. 2 147
© Copyright 2024