Lecture 8: Statistical Decision Making

Statistical decision
Frequentist statistics
frequency interpretation of probability: any given
experiment can be considered as one of an infinite
sequence of possible repetitions of the same
experiment, each capable of producing statistically
independent results.
the frequentist inference approach to drawing
conclusions from data is effectively to require that the
correct conclusion should be drawn with a given (high)
probability, among this notional set of repetitions.
Frequentist view: quantify measures of
central tendency
Central tendency refers to tendency of data to cluster
around its mean or expected value.
As more data is averaged, the tendency toward the
mean gets stronger (think of the mean as a magnet and
the size of the data being averaged as the size of the
The question of belief whether the coin is fair is
entirely a question how many times the coin was
If a fair coin is flipped sufficiently many times, then
the inference that it is a fair coin is likely to be drawn if
it is flipped enough times. How many is enough?
If an unfair coin is flipped sufficiently many times,
then the inference that it is a fair coin is likely to be
drawn if it is flipped enough times. How many is
Sample mean and population
X1, X2 ,…, Xn random events
m= (X1+X2 +…+Xn )/n sample mean
μ= true expected value of X.
The central limit theorem implies that the sample mean should
converge to the true mean.
If n is large then with high probability, the sample mean is close to
the true mean.
How large is large? How close is close?
Central limit theorem
A sum of independent, identically distributed random
variables is approximately normally distributed.
Normal distribution:
Some normal distributions
Probability that variable takes
value between a and b is the
area under the graph
Confidence interval
One would like a relationship between N and the
probability that m- μ is smaller than a given fixed value.
Error: how precise do you need to be versus
Probability of error: what risk are you willing to take
that you are correct?
Confidence interval example
You want to know whether a coin is fair. You flip it 100
times. You observe that it comes up heads 60 times.
Your question: what is the probability that it would
come up heads 60 times (or more) if the coin is a fair
Plot of probabilities of a given number
of heads out of 100 flips of a fair coin:
100th row of Pascal’s triangle
The odds of 60 or more heads
from 100 coin flips is about 3
Confidence intervals
Hypothesis: the expected value of h, the proportion of trials on which the
coin should land on heads in the long run, will be within a certain error of
the sample average, with high probability.
E: experiment of repeating the coin flip N times
H: the observed number of heads.
Desired: if E is repeated infinitely often then the sample mean m will be
within Err of the true mean h a high proportion P of the time.
We are 100P percent confident that the true mean lies in the interval
(H/N-err, H/N+err)
Measures of central tendency
Coin flips: can compute the binomial distribution
explicitly and the probabilities associated with various
The confidence interval derives from adding the
probabilities of the various outcomes corresponding to
that interval and excluding the remaining probabilities.
The precise statement is a subtle reflection of the
approximability of the Gaussian curve by a binomial
Bayesian reasoning: Looking
for the most likely explanation
Fair coin example
Example: Suppose that a coin has an unknown probability r of landing on
Bayesian approach:
Posterior probability: the conditional probability of the causes, given the
observed effects.
Example: probability that a coin is fair, given that it has landed on heads
on some observed proportion of tosses.
Prior: distribution of an uncertain quantity, before any measurements are
Coin flip: p(H) doesn’t change (new shiny penny vs old grimy penny)
Example: what is the probability that a coin is fair if it landed on
heads H times out of N tosses?
Denotes the prior probability that a coin would land on heads H
times out of N tosses if it were already known that the coin has
probability r of landing on heads.
Posterior probability :
Example: what is the probability that a coin is fair if it landed on
heads H times out of N tosses?
Bayes theorem
Bayes theorem is a way to compute a posterior probability if a
prior conditional probability is known and a likelihood is known.
In our example,
Problem: we do not know
But our best guess is
A chilling example
Current age
10 years
20 years
30 years
†Source: SEER Cancer Statistics Review, 1975–2007, National
Cancer Institute. Bethesda, MD, 2010.
The mammogram question
In 2009, the U.S. Preventive Services Task Force (USPSTF) — a group
of health experts that reviews published research and makes
recommendations about preventive health care — issued revised
mammogram guidelines. Those guidelines included the following:
Screening mammograms should be done every two years beginning at
age 50 for women at average risk of breast cancer.
Screening mammograms before age 50 should not be done routinely
and should be based on a woman's values regarding the risks and
benefits of mammography.
Doctors should not teach women to do breast self-exams.
The mammogram question (cont)
These guidelines differ from those of the American Cancer Society (ACS). The
ACS mammogram guidelines call for yearly mammogram screening beginning at
age 40 for women at average risk of breast cancer. Meantime, the ACS says the
breast self-exam is optional in breast cancer screening.
According to the USPSTF, women who have screening mammograms die of
breast cancer less frequently than do women who don't get mammograms.
However, the USPSTF says the benefits of screening mammograms don't
outweigh the harms for women ages 40 to 49. Potential harms may include falsepositive results that lead to unneeded breast biopsies and accompanying anxiety
and distress.
A statistical question
The rate of incidence of new cancer in women aged 40 is about 1
Of existing tumors, about 80 percent show up in mammograms.
9.6% of women who do not have breast cancer will have a false
positive mammogram
Suppose a woman aged 40 has a positive mammogram. What is the
probability that the woman actually has breast cancer?
According to Gigerenzer and Hoffrage 1995; and other
studies, only about 15% of doctors can compute this
probability correctly.
False positives in a medical test
False positives: a medical test for a disease may return a positive
result indicating that patient could have disease even if the patient
does not have the disease.
Bayes' formula: probability that a positive result is a false positive.
The majority of positive results for a rare disease may be false
positives, even if the test is accurate.
Hypothetical Example
A test correctly identifies a patient who has a particular disease 99% of the time, or with
probability 0.99
The same test incorrectly identifies a patient who does not have the disease 5% of the
time, or with probability 0.05.
Is it true that only 5% of positive test results are false?
Suppose that only 0.1% of the population has that disease: a randomly selected patient
has a 0.001 prior probability of having the disease.
A: the condition in which the patient has the disease
B: evidence of a positive test result.
The probability that a positive result is a false positive is about 1 − 0.002 = 0.998,
or 99.8%.
The vast majority of patients who test positive do not have the disease: The fraction of
patients who test positive who do have the disease (0.019) is 19 times the fraction of
people who have not yet taken the test who have the disease (0.001). Retesting may
To reduce false positives, a test should be very accurate in reporting a negative result when
the patient does not have the disease. If the test reported a negative result in patients
without the disease with probability 0.999, then
 False negatives: a medical test for a disease may return a negative
result indicating that patient does not have a disease even though
the patient actually has the disease.
 Bayes formula for negations:
 In our example = 0.01 x .001/(.01x.001 + .05x .999)=0.0000105 or
about 0.001 percent. When a disease is rare, false negatives will not
be a major problem with the test.
 If 60% of the population had the disease, false negatives would be
more prevalent, happening about 1.55 percent of the time
Clicker question
On a certain island, 1% of the population has a certain disease. A certain test
for the disease is successful in detecting the disease, if it is present, 80% of the
time. The rate of positive test results in the population is 4%.
What is the probability that someone who tests positive actually has the
A) 1%
B) 2%
C 4%
D) 8%
Prosecutors fallacy
the context in which the accused has been brought to court is
falsely assumed to be irrelevant to judging how
confident a jury can be in evidence against them with a
statistical measure of doubt.
This fallacy usually results in assuming that the prior
probability that a piece of evidence would implicate a
randomly chosen member of the population is equal to the
probability that it would implicate the defendant.
Defendant’s fallacy
Comes from not grouping the evidence together.
In a city of ten million, a one in a million DNA
characteristic gives any one person that has it a 1 in 10
chance of being guilty, or a 90% chance of being
Factoring in another piece of incriminating would give
much smaller probability of innocence.
OJ Simpson
In the courtroom
Bayesian inference can be used by an individual juror to see whether the evidence
meets his or her personal threshold for 'beyond a reasonable doubt.
G: the event that the defendant is guilty.
E: the event that the defendant's DNA is a match crime scene.
P(E | G): probability of observing E if the defendant is guilty.
P(G | E): probability of guilt assuming the DNA match (event E).
P(G): juror's “personal estimate” of the probability that the defendant is guilty,
based on the evidence other than the DNA match.
Bayesian inference: P(G | E)= P(E|G) p(G)/p(E)
On the basis of other evidence, a juror decides that there is a 30% chance that the defendant
is guilty. Forensic testimony suggests that a person chosen at random would have DNA 1 in a
million, or 10−6 chance of having a DNA match to the crime scene.
E can occur in two ways: the defendant is guilty (with prior probability 0.3) so his DNA is
present with probability 1, or he is innocent (with prior probability 0.7) and he is unlucky
enough to be one of the 1 in a million matching people.
P(G|E)= (0.3x1.0)/(0.3x1.0 + 0.7/1 million) =0.99999766667
The approach can be applied successively to all the pieces of evidence presented in court, with
the posterior from one stage becoming the prior for the next.
P(G)? for a crime known to have been committed by an adult male living in a town containing
50,000 adult males, the appropriate initial prior probability might be 1/50,000.
Nicole Brown was murdered at her home in Los Angeles on the night of June
12,1994. The Prime suspect was her husband 0.J.Simpson, at the time a wellknown celebrity famous both as a TV actor and as a retired professional football
star. This murder led to one of the most heavily publicized murder trial in U.S.
during the last century. The fact that the murder suspect had previously physically
abused his wife played an important role in the trial. The famous defense lawyer
Alan Dershowitz, a member of the team of lawyers defending the accused, tried to
belittle the relevence of the fact by stating that only 0.1% of the men who
physically abuse their wives actually end up murdering them.
Question: Was the fact that O.J.Simpson had previously physically abused his wife
irrelevant to the case?
E = all the evidence, that Nicole Brown was murdered and was
previously physically abused by her husband.
G = O.J. Simpson is guilty
What about
Posterior odds = prior odds x Bayes factor In the example above, the juror who
has a prior probability of 0.3 for the defendant being guilty would now express
that in the form of odds of 3:7 in favor of the defendant being guilty, the Bayes
factor is one million, and the resulting posterior odds are 3 million to 7 or about
429,000 to one in favor of guilt.
In the UK, Bayes' theorem was explained to the jury in the odds form by a
statistician expert witness in the rape case of Regina versus Denis John Adams.
The Court of Appeal upheld the conviction, but it also gave their opinion that
"To introduce Bayes' Theorem, or any similar method, into a criminal trial
plunges the jury into inappropriate and unnecessary realms of theory and
complexity, deflecting them from their proper task.”
Bayesian assessment of forensic DNA data remains controversial.
Gardner-Medwin : criterion is not the probability of guilt, but rather the
probability of the evidence, given that the defendant is innocent (akin to a frequentist
If the posterior probability of guilt is to be computed by Bayes' theorem, the
prior probability of guilt must be known.
A: The known facts and testimony could have arisen if the defendant is guilty,
B: The known facts and testimony could have arisen if the defendant is
innocent, C: The defendant is guilty.
Gardner-Medwin : the jury should believe both A and not-B in order to
convict. A and not-B implies the truth of C, but B and C could both be true.
Lindley's paradox.
Other court cases in which probabilistic arguments played some role: the
Howland will forgery trial, the Sally Clark case, and the Lucia de Berk case.