How to Avoid Statistical Skullduggery Gordon Bell President, LucidView

How to Avoid Statistical Skullduggery
The Power of Data to Clarify or Confuse
Gordon Bell
President, LucidView
May 6, 2009
The Power of Data
“The secret language of statistics, so appealing
in a fact-minded culture, is employed to
sensationalize, inflate, confuse, and oversimplify…
Many a statistic is false on its face. It gets by only
because the magic of numbers brings about a
suspension of common sense.”
– Darrell Huff, How to Lie with Statistics (1954)
2
Statistical Skullduggery - Key Tactics
•
•
•
•
•
Benefits of precise inaccuracy
(useful information for 97.6% of you)
There’s no room for uncertainty (I think)
The less you know, the more you can see
(do you want answers, or do you want the truth?)
Some number is always going up
(get the right answer by measuring the wrong thing)
And if this isn’t enough…
6 sure-fire ways to make your numbers
look good
3
#1: Benefits of Precise Inaccuracy
• Power
• Objectivity (fact)
4
Precise Numbers = Power
“About a third of all Americans love
marketers.”
“30.0% of all Americans love marketers.”
(in a survey of 10 of my friends)
5
Precise Numbers = Fact
“Our conversion rate is 12.40%”
Ignoring the fact that it’s from…
• A holiday promotion
• To 1,000 of our best customers
• Offering a best-selling product at a 20% discount
with free shipping and a free gift
6
“Objective” numbers are often subjective
• “Objective” numbers mean little…
– Out of context
– Without some measure of data uncertainty
– When they’re only a piece of the picture
– Presented in a biased way
7
#2: There’s no room for uncertainty
from The Cartoon Guide to Statistics (Gonick and Smith)
8
#2: There’s no room for uncertainty
• Never let a statistician present to senior
management (just kidding)
• You want clear, actionable answers
– But you should understand the difference
between “I think” and “I know”
– What is the chance (and risk) of a bad call?
– The science of statistics helps you quantify
uncertainty
9
Understanding Uncertainty
• Specific numbers too often ignore the natural
variation in the marketplace
• Like a fog, variation hides small effects and
distant relationships
• Calculating variation
allows you to separate
clear insights from
the unknowable
10
2 Key Statistics
1. Average
2. Standard deviation (and variance),
I
2
and I
•
I measures the average spread of the data
•
The average distance individual values are
from the overall average
11
Uncertainty (I)
•
•
•
•
Everyone understands “average”
The standard deviation helps you to see how
far off the answer may be from the truth
2I usually gives good “confidence limits”
Generally, 95% of all values will be within the
range:
Average ± 2
12
Response Rate
Variation is based on
1. Response rate (R)
2. Sample size (n)
Iavg =
But beware: this simple equation
may underestimate the true
market variation
R•(1-R)/n
With 1,000 responses from a mailing of 10,000
Response = 10% ± 0.6%
13
Response Rate – a different measure of I
•
A simple way to measure variation =
1. Split your control into 5 (or more) equal and random
groups
2. Mail the same catalog to each group, but with a
different source code
3. Combine all five together to calculate the average
4. Calculate the standard deviation among the five
groups as a measure of real-world variation in the
key metrics (response rate, AOV, etc.)
14
Example: variation in response
Response Rate among 5 Controls
11%
9.26%+ 2I = 10.8%
10%
Average = 9.26%
I = 0.77%
9%
8%
9.26%- 2I = 7.7%
7%
Group 1
Group 2
Group 3
Group 4
Group 5
(n=40,000) (n=40,000) (n=20,000) (n=40,000) (n=40,000)
15
Sales
• Variation in sales must be calculated
– Calculate I of individual orders
– After removing outliers (usually $0 and very large orders)
But look into these outliers
– they may provide valuable insights
into your exceptional customers
(or problems)
16
Sales example:
Are these two significantly different?
AOV: Cover 1 versus 2 (average)
$150
$125
$100
$75
$70.59
$50
$42.12
$25
$0
0
1
2
3
17
$841.70
Are these two significantly different?
$239.84
AOV: Cover 1 versus 2 (all orders)
AOV: Cover 1 versus 2 (average)
$150
$150
$125
$125
$100
$100
$75
$75
$70.59
$50
$50
$42.12
$25
$25
$0
$0
0
1
2
3
0
1
2
3
18
$841.70
$239.84
Step 1: Remove outliers
AOV: Cover 1 versus 2 (all orders)
$150
Step 2: Calculate I of all
individual orders
$125
$100
Step 3: Compare adjusted data
Avg+2I
$75
$50
$25
$0
0
1
2
3
19
$841.70
Step 3: Compare adjusted data
Avg+2I
Cover 1 = $31.77 ± $4.43
Cover 2 = $54.85 ± $8.96
$239.84
AOV: Cover 1 versus 2 (all orders)
$150
$125
$100
Upper limit = $36.20
$75
Lower limit = $45.90
$50
$25
Yes! Cover 2 wins!
$0
0
1
2
3
20
#3: The less you know, the more you can see
“Jessep: You want answers?”
“Lieutenant Kaffee: I want the truth!”
“Jessep: You can't handle the truth!”
– Jack Nicholson and Tom Cruise in the court scene
from “A Few Good Men”
21
Sample Size
Statistical Significance
Truth
• Sample size – providing statistical significance –
draws the line between answers and the truth
– “Directional data” seems scientific enough
but risks a wrong conclusion with long-term impact
• Sample size is one way to ensure your numbers
have meaning
– More data = greater confidence
– Greater confidence = better discrimination between
close numbers
22
Example: Test versus Control
• Mail 1,000 packages with the “control” envelope
and get a 1.0% response rate
• Mail 1,000 packages with a new envelope and
get a 1.5% response
Do we have a winner?
Not necessarily
23
Example: sample size of 1,000
Response Rates with Uncertainty
0.010
t
te
s
co
nt
ro
l
(mailing 1,000 control and 1,000 test packages)
0.015
24
Example: Increase sample size to 50,000
Response Rates with Uncertainty
0.010
st
te
co
nt
ro
l
(mailing 50,000 control and 50,000 test packages)
0.015
25
Sample Size Equation
N=
Two standard deviations
Size effect you want to see
2
26
Sample Size
N=
Two standard deviations
Size effect you want to see
2
4•(t /2 +t
R)
N = (smallest change)2
2
) •R•(1-
27
Sample Size Equation*
About 2 standard deviations
for 95% confidence (t /2)
with extra room for error (t )
to ensure “effect” is seen
Response Rate (R) x 1-R
equals one standard deviation (+2)
2
) •R•(1-R)
4•(t /2 +t
N = (smallest change)2
Total sample size
(1 test cell = ½ x N)
* Available in an Excel worksheet
What change you want to be
statistically significant
28
Sample Size Equation (simplified)
31.38•R•(1-R)
N=
(smallest change)2
• Sample size must be larger with
– Lower response rate
– Smaller change you want to see
• N = sample size of
– One test cell plus the control (1 cell alone = ½N)
– All test recipes in a multivariable test (divide N by number of recipes)
29
Example: sample size
• If your catalog response rate = 3.0% and you want to
see if a new cover increases response by 10% or more
• Then: R = 0.03, (1-R) = 0.97
(smallest change) = 0.003, (s.c.)2 = 0.000009
31.38•R•(1-R)
31.38•0.03•0.97
N = (smallest change)2 =
= 101,462
2
(0.003)
• So, you need to
– Mail more than 50,000 of the test cover (and at least that much
of the control) to see a significant difference of ± 10%
30
#4: Some number is always going up
• With so much data, there’s almost always some
metric that looks good
• Strong, statistically-valid results are possible
from weak, off-target metrics
–
–
–
–
Survey vs. sales
Clicks vs. conversion
Forecast vs. actual
Response from a single mailing vs. full campaign
31
Clear Key Metrics
• If you want the “truth” then…
– Focus on reliable, meaningful, accurate data
– % response, $ sales, and $ profit
(the final outcome of the marketing process)
• If you can’t measure it, you can’t improve
– Impact of direct mail on retail sales
• But… you can test DM by region, or include a coupon
– Impact of long-term branding
• Hmm… tough to test
– “Testing” everything at once (without a multivariable test design)
• What really helped, hurt, or made no difference?
32
Example: e-mail metrics
Open rate
Clickthrough
• Exciting, vague subject line
• General offer
• Short copy
• Less information
Visit length
Conversion rate
($ sales)
• Targeted subject line
• Specific offer
• Long copy
• More information
33
6 Sure-fire ways
to make your numbers look good
34
1. Use Less Data
After 4 hours, a new landing page
increased conversion 200% !
•
•
15 people ordered through the new page
and only 5 from the control page
By the end of the week, the numbers were about even
35
2. Lump everything together
Average retail sales among 12 markets
(year-over-year % change)
15%
10%
5%
0%
-5%
1
2
3
4
5
6
7
8
9
10
11
12
average wks 1-3
36
Average retail sales among 12 markets
(year-over-year % change)
15%
10%
5%
0%
-5%
1
2
3
4
5
week 1
6
7
week 2
8
9
10
11
12
week 3
37
Lump all segments together
Monthly New Subscribers
25,000
20,000
15,000
Total
10,000
5,000
Nov Dec Jan Feb Mar Apr May Jun Jul
Aug Sep Oct
38
Monthly New Subscribers
25,000
20,000
Total
House list
Prospects
15,000
10,000
5,000
Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct
39
3. Define arbitrary segments
• Segmentation is extremely valuable
– If different segments respond or act consistently
differently
• Arbitrary (or too many) segments waste
resources and can lead to “version creep”
40
Example: segmentation
Original Segmentation
based on monthly purchases
Very low =
Low =
Medium =
High =
1-3
4-6
7-17
18+
Segments 4 months later
(average monthly purchases)
Very low =
Low =
Medium =
High =
17
19
26
42
41
4. Make clear (il)logical assumptions
• You can’t test everything – sometimes you need
to make an educated guess
• Which assumptions are more valid than others?
42
Example: interpolate vs. extrapolate
Can we say?…
Price vs. Conversion rate
6%
5%
We can expect 4%
conversion at $14.99
5%
4%
3%
3%
2%
1%
0%
$9.99
$19.99
Product Price
We can expect 2%
conversion at $24.99
43
5. Find a relationship to prove your point
• Tests prove cause-and-effect
• But regression, data-mining, and other
advanced statistical techniques can uncover
some interesting correlations
• Correlations = proof, right?
44
Correlations to “prove” relationships
1. Direct mail causes global warming
•
•
Fact 1: DM revenue increased over $190B from 1995 to 2000
Fact 2: following the same slope, average global temperatures
increased consistently over the same time period
Direct Mail Sales Revenue
$550
$500
$450
$400
$350
$300
$250
1995
200045
Year
Correlations to “prove” relationships
1. Direct mail causes global warming
2. Storks carry
babies
From “Statistics for Experimenters,”
Box, Hunter, Hunter
46
6. Don’t test what you
don’t want to know
• No statistical analysis can uncover
the impact of variables that were never
changed
• The best way to keep long-held beliefs
is to never test them
47
Statistical Skullduggery and Deceptive Data
• Simple data can be surprisingly deceptive
• Like a good urban legend, deceptive data is
usually based on some grain of truth
• If you want the truth, you should:
– Understand the full story behind those precise numbers
– Assess the statistical uncertainty (confidence limits)
around important numbers
– Calculate the best sample size for each test
– Find the right metrics for actionable decisions
48
Statistical Skullduggery and Deceptive Data
•
Watch our for those 6 sure-fire ways to make
data look good…
1.
2.
3.
4.
Be sure you have enough data for a good decision
Assess granular data, but not too large or too small
Define a few stable, logical segments
Apply insights to future campaigns without
extrapolating beyond what you know
5. Separate correlations (“I think”) from causation
(“I know from testing”)
6. The more you test, the more you’ll learn
49
Thank you!
Questions?