How to Avoid Statistical Skullduggery The Power of Data to Clarify or Confuse Gordon Bell President, LucidView May 6, 2009 The Power of Data “The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify… Many a statistic is false on its face. It gets by only because the magic of numbers brings about a suspension of common sense.” – Darrell Huff, How to Lie with Statistics (1954) 2 Statistical Skullduggery - Key Tactics • • • • • Benefits of precise inaccuracy (useful information for 97.6% of you) There’s no room for uncertainty (I think) The less you know, the more you can see (do you want answers, or do you want the truth?) Some number is always going up (get the right answer by measuring the wrong thing) And if this isn’t enough… 6 sure-fire ways to make your numbers look good 3 #1: Benefits of Precise Inaccuracy • Power • Objectivity (fact) 4 Precise Numbers = Power “About a third of all Americans love marketers.” “30.0% of all Americans love marketers.” (in a survey of 10 of my friends) 5 Precise Numbers = Fact “Our conversion rate is 12.40%” Ignoring the fact that it’s from… • A holiday promotion • To 1,000 of our best customers • Offering a best-selling product at a 20% discount with free shipping and a free gift 6 “Objective” numbers are often subjective • “Objective” numbers mean little… – Out of context – Without some measure of data uncertainty – When they’re only a piece of the picture – Presented in a biased way 7 #2: There’s no room for uncertainty from The Cartoon Guide to Statistics (Gonick and Smith) 8 #2: There’s no room for uncertainty • Never let a statistician present to senior management (just kidding) • You want clear, actionable answers – But you should understand the difference between “I think” and “I know” – What is the chance (and risk) of a bad call? – The science of statistics helps you quantify uncertainty 9 Understanding Uncertainty • Specific numbers too often ignore the natural variation in the marketplace • Like a fog, variation hides small effects and distant relationships • Calculating variation allows you to separate clear insights from the unknowable 10 2 Key Statistics 1. Average 2. Standard deviation (and variance), I 2 and I • I measures the average spread of the data • The average distance individual values are from the overall average 11 Uncertainty (I) • • • • Everyone understands “average” The standard deviation helps you to see how far off the answer may be from the truth 2I usually gives good “confidence limits” Generally, 95% of all values will be within the range: Average ± 2 12 Response Rate Variation is based on 1. Response rate (R) 2. Sample size (n) Iavg = But beware: this simple equation may underestimate the true market variation R•(1-R)/n With 1,000 responses from a mailing of 10,000 Response = 10% ± 0.6% 13 Response Rate – a different measure of I • A simple way to measure variation = 1. Split your control into 5 (or more) equal and random groups 2. Mail the same catalog to each group, but with a different source code 3. Combine all five together to calculate the average 4. Calculate the standard deviation among the five groups as a measure of real-world variation in the key metrics (response rate, AOV, etc.) 14 Example: variation in response Response Rate among 5 Controls 11% 9.26%+ 2I = 10.8% 10% Average = 9.26% I = 0.77% 9% 8% 9.26%- 2I = 7.7% 7% Group 1 Group 2 Group 3 Group 4 Group 5 (n=40,000) (n=40,000) (n=20,000) (n=40,000) (n=40,000) 15 Sales • Variation in sales must be calculated – Calculate I of individual orders – After removing outliers (usually $0 and very large orders) But look into these outliers – they may provide valuable insights into your exceptional customers (or problems) 16 Sales example: Are these two significantly different? AOV: Cover 1 versus 2 (average) $150 $125 $100 $75 $70.59 $50 $42.12 $25 $0 0 1 2 3 17 $841.70 Are these two significantly different? $239.84 AOV: Cover 1 versus 2 (all orders) AOV: Cover 1 versus 2 (average) $150 $150 $125 $125 $100 $100 $75 $75 $70.59 $50 $50 $42.12 $25 $25 $0 $0 0 1 2 3 0 1 2 3 18 $841.70 $239.84 Step 1: Remove outliers AOV: Cover 1 versus 2 (all orders) $150 Step 2: Calculate I of all individual orders $125 $100 Step 3: Compare adjusted data Avg+2I $75 $50 $25 $0 0 1 2 3 19 $841.70 Step 3: Compare adjusted data Avg+2I Cover 1 = $31.77 ± $4.43 Cover 2 = $54.85 ± $8.96 $239.84 AOV: Cover 1 versus 2 (all orders) $150 $125 $100 Upper limit = $36.20 $75 Lower limit = $45.90 $50 $25 Yes! Cover 2 wins! $0 0 1 2 3 20 #3: The less you know, the more you can see “Jessep: You want answers?” “Lieutenant Kaffee: I want the truth!” “Jessep: You can't handle the truth!” – Jack Nicholson and Tom Cruise in the court scene from “A Few Good Men” 21 Sample Size Statistical Significance Truth • Sample size – providing statistical significance – draws the line between answers and the truth – “Directional data” seems scientific enough but risks a wrong conclusion with long-term impact • Sample size is one way to ensure your numbers have meaning – More data = greater confidence – Greater confidence = better discrimination between close numbers 22 Example: Test versus Control • Mail 1,000 packages with the “control” envelope and get a 1.0% response rate • Mail 1,000 packages with a new envelope and get a 1.5% response Do we have a winner? Not necessarily 23 Example: sample size of 1,000 Response Rates with Uncertainty 0.010 t te s co nt ro l (mailing 1,000 control and 1,000 test packages) 0.015 24 Example: Increase sample size to 50,000 Response Rates with Uncertainty 0.010 st te co nt ro l (mailing 50,000 control and 50,000 test packages) 0.015 25 Sample Size Equation N= Two standard deviations Size effect you want to see 2 26 Sample Size N= Two standard deviations Size effect you want to see 2 4•(t /2 +t R) N = (smallest change)2 2 ) •R•(1- 27 Sample Size Equation* About 2 standard deviations for 95% confidence (t /2) with extra room for error (t ) to ensure “effect” is seen Response Rate (R) x 1-R equals one standard deviation (+2) 2 ) •R•(1-R) 4•(t /2 +t N = (smallest change)2 Total sample size (1 test cell = ½ x N) * Available in an Excel worksheet What change you want to be statistically significant 28 Sample Size Equation (simplified) 31.38•R•(1-R) N= (smallest change)2 • Sample size must be larger with – Lower response rate – Smaller change you want to see • N = sample size of – One test cell plus the control (1 cell alone = ½N) – All test recipes in a multivariable test (divide N by number of recipes) 29 Example: sample size • If your catalog response rate = 3.0% and you want to see if a new cover increases response by 10% or more • Then: R = 0.03, (1-R) = 0.97 (smallest change) = 0.003, (s.c.)2 = 0.000009 31.38•R•(1-R) 31.38•0.03•0.97 N = (smallest change)2 = = 101,462 2 (0.003) • So, you need to – Mail more than 50,000 of the test cover (and at least that much of the control) to see a significant difference of ± 10% 30 #4: Some number is always going up • With so much data, there’s almost always some metric that looks good • Strong, statistically-valid results are possible from weak, off-target metrics – – – – Survey vs. sales Clicks vs. conversion Forecast vs. actual Response from a single mailing vs. full campaign 31 Clear Key Metrics • If you want the “truth” then… – Focus on reliable, meaningful, accurate data – % response, $ sales, and $ profit (the final outcome of the marketing process) • If you can’t measure it, you can’t improve – Impact of direct mail on retail sales • But… you can test DM by region, or include a coupon – Impact of long-term branding • Hmm… tough to test – “Testing” everything at once (without a multivariable test design) • What really helped, hurt, or made no difference? 32 Example: e-mail metrics Open rate Clickthrough • Exciting, vague subject line • General offer • Short copy • Less information Visit length Conversion rate ($ sales) • Targeted subject line • Specific offer • Long copy • More information 33 6 Sure-fire ways to make your numbers look good 34 1. Use Less Data After 4 hours, a new landing page increased conversion 200% ! • • 15 people ordered through the new page and only 5 from the control page By the end of the week, the numbers were about even 35 2. Lump everything together Average retail sales among 12 markets (year-over-year % change) 15% 10% 5% 0% -5% 1 2 3 4 5 6 7 8 9 10 11 12 average wks 1-3 36 Average retail sales among 12 markets (year-over-year % change) 15% 10% 5% 0% -5% 1 2 3 4 5 week 1 6 7 week 2 8 9 10 11 12 week 3 37 Lump all segments together Monthly New Subscribers 25,000 20,000 15,000 Total 10,000 5,000 Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct 38 Monthly New Subscribers 25,000 20,000 Total House list Prospects 15,000 10,000 5,000 Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct 39 3. Define arbitrary segments • Segmentation is extremely valuable – If different segments respond or act consistently differently • Arbitrary (or too many) segments waste resources and can lead to “version creep” 40 Example: segmentation Original Segmentation based on monthly purchases Very low = Low = Medium = High = 1-3 4-6 7-17 18+ Segments 4 months later (average monthly purchases) Very low = Low = Medium = High = 17 19 26 42 41 4. Make clear (il)logical assumptions • You can’t test everything – sometimes you need to make an educated guess • Which assumptions are more valid than others? 42 Example: interpolate vs. extrapolate Can we say?… Price vs. Conversion rate 6% 5% We can expect 4% conversion at $14.99 5% 4% 3% 3% 2% 1% 0% $9.99 $19.99 Product Price We can expect 2% conversion at $24.99 43 5. Find a relationship to prove your point • Tests prove cause-and-effect • But regression, data-mining, and other advanced statistical techniques can uncover some interesting correlations • Correlations = proof, right? 44 Correlations to “prove” relationships 1. Direct mail causes global warming • • Fact 1: DM revenue increased over $190B from 1995 to 2000 Fact 2: following the same slope, average global temperatures increased consistently over the same time period Direct Mail Sales Revenue $550 $500 $450 $400 $350 $300 $250 1995 200045 Year Correlations to “prove” relationships 1. Direct mail causes global warming 2. Storks carry babies From “Statistics for Experimenters,” Box, Hunter, Hunter 46 6. Don’t test what you don’t want to know • No statistical analysis can uncover the impact of variables that were never changed • The best way to keep long-held beliefs is to never test them 47 Statistical Skullduggery and Deceptive Data • Simple data can be surprisingly deceptive • Like a good urban legend, deceptive data is usually based on some grain of truth • If you want the truth, you should: – Understand the full story behind those precise numbers – Assess the statistical uncertainty (confidence limits) around important numbers – Calculate the best sample size for each test – Find the right metrics for actionable decisions 48 Statistical Skullduggery and Deceptive Data • Watch our for those 6 sure-fire ways to make data look good… 1. 2. 3. 4. Be sure you have enough data for a good decision Assess granular data, but not too large or too small Define a few stable, logical segments Apply insights to future campaigns without extrapolating beyond what you know 5. Separate correlations (“I think”) from causation (“I know from testing”) 6. The more you test, the more you’ll learn 49 Thank you! Questions?