Chapter 1

Chapter 1
Know how to create a graphical representation of data with two variables (data in a two way table), (10, 6a; 09, 1a)
Know how to describe differences in categorical data? (10, 6b; 09, 1b)
Know how to compare distributions (Shape, center, spread)? Make sure you use comparative words (larger, wider, etc),
(10b, 1a; 08, 1a; 08b, 1a; 07, 1b; 06, 1a; 05, 1a;05b, 1a; 04b, 5a)
Know how to make and interpret a stemplot or back to back stemplot, including labels and a legend, (10b, 1b; 07b, 1a;
M7, 7)
What is a gap, and what are clusters? Remember to state where they are in the distribution and what that means in
context of a problem, (10b, 1c;07b, 1c)
Know how skewness and symmetry affect the relationship between mean and median, (09, 6b; 09, 1c; 05, 1d; 05b, 1b)
Know how to create (including labels) and interpret a boxplot, including how to find the range, interquartile range,
25th/75th quartile, and how to determine which way, if, it’s skewed (09b, 1a; 04, 1ab; 04b, 5a; 03, 1ab; M7, 29; M2, 14)
Know what effect adding a constant has on the center and spread of a set of data. Know what multiplying a constant
has on the center and spread of a set of data, (09b, 1b;09, 1b; M2, 7)
Know how to create a dotplot, (08b, 1a)
Know how to interpret standard deviation in the context of a problem, (07, 1a)
Know how to describe a distribution (shape, center, spread) when looking at a stemplot, boxplot, dotplot, and other
types of graphical representations of quantitative data, (07b, 1b)
Know what the standard deviation or overall spread of a distribution will look like for a set of data that is more
consistent than a different set of data, (06, 1b)
Know how to interpret center in the context of a problem, (06, 1c)
Know how to read and interpret a cumulative frequency or cumulative relative frequency plot and interpret points and
slope in context, (06b, 1; M2, 27)
Know what a median is, and what it represents in the context of a problem, (M7, 1)
Know how to compare standard deviations when looking at two or graphical displays (If you have two histograms, know
which one has a larger standard deviation), (M7, 15)
2010B #1
2009 #1
2008 #1
2007B #1
2006 #1
2006B #1
2005B #1
Slide 1
___________________________________
___________________________________
Chapter 1
___________________________________
Exploring Data
___________________________________
___________________________________
Slide 2
___________________________________
___________________________________
Section 1.1
Displaying Distributions
with Graphs
___________________________________
___________________________________
___________________________________
Slide 3
Definitions: Individuals and variables
• Individuals are the objects described by a set
of data. Individuals can be people, animals, or
things.
• A variable is any characteristic of an individual
– Gender, Height, Weight, Race, Religion, etc.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 4
Categorical versus Quantitative
Variables

A categorical variable (qualitative variable) records
which of several groups or categories an individual
belongs to.

Gender, Race, City, Zip Code, Area Code, Religion, Color,
Age Group (21-25)
• A quantitative variable takes numerical values
for which it makes sense to do arithmetic
– Age, Height, Weigh, Number of RBCs, Score
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 5
More on Quantitative Variables
• Units are what are used to describe the
numbers.
• Quantitative data can be categorized
– Discrete is where every possible value can be
listed
• AP scores:1, 2, 3, 4, 5
• Test Score: 0-100
– Continuous is where there is an infinite number of
possibilities
• Weight: 20, 19, 18.8, 18.81, 18.806
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 6
Displaying Categorical Data
• Pie Chart
– Make sure it’s labeled
– You could not represent
quantitative data with a
pie chart
• Bar Chart
– The bars do not touch
– Make sure each bar is
labeled
– What 2 things is this bar
chart missing?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 7
More Bar Charts
• The only time the bars would touch if you
were data on more than one variable
– Here we are looking at scores for each grade
– We are still missing a couple of labels
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 8
Stemplots, Splitstems, and
Back-to-Back Stemplots
• Stemplots are used when you have a smaller
amount of data, because it displays all of the
individual values of data.
• You would split the stems if you had a small range
of data
– For example if I had collected everybody’s height in
inches, I wouldn’t want to only have 6’s and 7’s
___________________________________
___________________________________
___________________________________
• Back to back stemplots are used when you collect
similar data from different samples
– For example, if I collected heights from 1st period and I
wanted to compare them to 3rd period
___________________________________
___________________________________
Slide 9
Creating A Stemplot
• Here is data collected on the weight of
packages delivered to Ford
•
___________________________________
___________________________________
103, 118, 131, 134, 134, 168, 191, 222, 232, 242, 268, 280, 280, 290, 301, 361, 381, 401, 431, 431, 441,
481
1
2
3
4
0, 2, 3, 3, 3, 7, 9
2, 3, 4, 7, 8, 8, 9
0, 6, 8
0, 3, 3, 4, 8
4|3 = 430 pounds
Slide 10
Splitstem
• Here is the same data but represented with a
splitstem
•
___________________________________
___________________________________
___________________________________
___________________________________
103, 118, 131, 134, 134, 168, 191, 222, 232, 242, 268, 280, 280, 290, 301, 361, 381, 401, 431, 431, 441,
481
Weight of Packages delivered to Ford
•
•
•
•
•
•
•
•
___________________________________
1
1
2
2
3
3
4
4
0, 2, 3, 4
7, 9
2, 3, 4
7, 8, 8, 9
0
6, 8
0, 3, 3, 4
8
___________________________________
___________________________________
___________________________________
Slide 11
Back to Back Stemplot
___________________________________
• Here is data from two different days
Tuesday
Wednesday
2, 4, 6, 6 1 3, 4
3, 3, 3 2 4, 5, 6
1 3 5, 6, 6
___________________________________
1|4 = 140 lbs
___________________________________
___________________________________
___________________________________
Slide 12
Know how to create a dotplot
•
Dotplots are used when you have a small amount of data, and, like a
stemplot, can display every piece of data collected
– This shows every piece of data, but we might not know every possible
value
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 13
Histograms
• Histograms are generally used when you have
a large amount of data and it wouldn’t be
reasonable to display every one collected
• Histograms can be confusing if the scales
aren’t described
___________________________________
___________________________________
___________________________________
– Let’s take a look at the histogram on page 50
___________________________________
___________________________________
Slide 14
Cumulative Frequency
• A cumulative frequency chart is a histogram
that, instead of showing how much many
items are in each interval, shows how many
that have occurred up to that point
– It’s usually a percentage, but it does not have to
be
http://stattrek.com/AP-Statistics-1/CumulativeFrequency-Plot.aspx?Tutorial=AP
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 15
Timeplots
•
A timeplot is a way to display quantitative data measure against time
– You are collecting the same data, sometime from the same individuals,
and trying to discover a patter over time
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 16
___________________________________
Describing
Data
Shape, Center,
and Spread
___________________________________
___________________________________
___________________________________
___________________________________
Slide 17
Describing data: shape, center, and spread
 DESCRIBE is one of the major tipoff words on the AP exam
 Whenever you see that word it automatically tells you to talk
about these three items
___________________________________
___________________________________
 Shape
 Symmetric, skewed, uniform
 Shape can be unimodal, bimodal, multimodal.
 Shape can have clusters and gaps.
___________________________________
 Center
 We start by just guessing where the middle is
 We will get more in depth in the next section
 Spread
 We start by talking about spread in terms of the range of the
data.
 We will get more in depth in the next section
Slide 18
___________________________________
___________________________________
___________________________________
Shape
___________________________________
___________________________________
___________________________________
___________________________________
Slide 19
Skewed
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 20
Uniform
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 21
Unimodal
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 22
Bimodal
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 23
Gaps and Clusters
1|1, 1, 4, 7, 9
2|5, 6, 7
3|
4|1, 2
5|0
In this skewed right stemplot there is a gap in
the 30’s and you could say that there are two
clusters (one in the 10s and 20s and the other
is in the 40’s and 50s)
Slide 24
Making lists and histograms with the
calculator
• STAT/Edit… allows you to create a list
• STAT PLOT (on the function keys) allows you to
create a histogram with L1
• WINDOW (on the function keys) allows you to
set up the criteria for your window
• Talk about the idea of range of data
• GRAPH (on the function keys) allows you to
see the histogram
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
– You can then trace the data using the Trace
button.
___________________________________
Slide 25
___________________________________
___________________________________
Section 1.2
Describing Distributions With
Numbers
___________________________________
___________________________________
___________________________________
Slide 26
Basics of Central Tendency
• We use these three measurements to find the
center of the data.
– Mean=average
– Median=middle
– Mode=Most
• Which one is better? Which one is least
useful? Why?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 27
Finding and Using Mean
 Add all the numbers up and divide by the
amount
 We can use our calculator to find the mean as
well
 Create our list
 STAT gives us our screen, then go over to CALC/1-Var
Stats
 Mean is a number that is nonresistant because
outliers can affect it
 For the most part, mean is only an appropriate
number when you are looking at a symmetric
distribution of data
Slide 28
Finding and Using Median
• Find the middle number when you put them in order.
– If there are an even number of data, we have to
take the average of the middle two
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
• Let’s look at a couple
– 1, 4, 6, 7, 8
– 1, 4, 6, 8
• You can also use your calculator.
• Median is resistant to outliers
• Median represents the point in your data, where there is 50%
above and below that point
– It is usually not halfway between your min and
your max
___________________________________
___________________________________
___________________________________
Slide 29
Finding and Using Quartiles
 The quartiles are the medians of the median
 They would be used as a measure of spread
___________________________________
___________________________________
 We can use our calculator as well
 Create our list
 STAT gives us our screen, then go over to CALC/1-Var
Stats
 After choosing this, choosing the list you want to look
at
 Scroll down to Q1 and Q3
 Quartiles are useful in telling us where the
middle 50% are, the top 25%, and the bottom
25%
Slide 30
Outliers
 The Interquartile Range is the difference of
the two quartiles (Q3 – Q1)
 We can use the IQR to find outliers
 Multiply the IQR by 1.5 and that tells how far to
move up and down from the quartiles
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
 If data are past these limits, then they are outliers.
• 2, 4, 8, 10, 11
– The IQR = 6
– 1.5*6=9
– So any number biggert than 10 + 9 = 19 is an
outlier
Slide 31
Know how to create and interpret a
boxplot…
•
•
•
The five number summary shows the…
– Min, Q1, Med, Q3, Max
A boxplot is a visual representation of a five number summary, and can be used to
show/determine skewness
Outliers are marked with stars
This distribution is slightly skewed
left. It has a median near 82 and
an interquartile range near 20.
There appears to be an outlier in
the 20’s. The 25th quartile is at 70
and the 75th quartile is at 90. That
means that the middle 50% of the
data is between 70 and 90. The
bottom 50% is below 82 and the
upper 50% is above 82.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 32
Skewed Left
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 33
Skewed Right
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 34
Calculating Standard Deviation
 What do we use to measure spread with
average if we use quartiles with median?
 It is still measured in the same number of
units
 For example, if I were taking an average age, the
standard deviation would be measured in years
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 35
Calculating Standard Deviation By
Hand



1, 2, 4, 6, 12
 Calculating standard deviation
 Start by calculating the average.
 Now look at the differences of all the numbers with the average.
 -4, -3, -1, 1, 7
 What do you get when you add these up?
• 0
 What we do instead is look at the squares of these differences.
 16, 9, 1, 1
 Add them up and that gives you a number that has little meeting.
 Divide that number by one less than the total number of data.
 That number is your variance.
 The square root of the variance is the standard deviation, s = 4.36.
This is the number that is used to measure spread when the average is used.
 It is really only used with symmetric data.
Make sure you get the right number when you use 1-var stats.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 36
What does standard deviation mean?
• What does it mean if something has a small
standard deviation?
– The numbers are close together
– 1, 1, 1, 1, 1, 1
• In this case s = 0 because there is not spread
___________________________________
___________________________________
___________________________________
• Which one has the largest standard deviation
– 80, 80, 80, 80
– 1, 2, 3, 4
– 2, 4, 6, 8
___________________________________
___________________________________
Slide 37
Know how to interpret standard
deviation in the context of a problem
 In layman’s terms, standard deviation is the
average or typical distance away from the
mean
 Again, if I were looking at a collection of data
involving age, I would say that the average age is
17.5 years, and the standard deviation is 2 years.
___________________________________
___________________________________
___________________________________
• Interpretation
– The typical person in this study is 2 years older or
younger than 17.5 years.
– The average distance from 17.5 years is 2 years
___________________________________
___________________________________
Slide 38
Know what effect adding/multiplying
a set of data by a constant can have
• What if I take a group of data with the five
number summary: 1, 5, 10, 12, 20 and I…
___________________________________
___________________________________
– Add three to every number?
• The summary becomes 4, 8, 13, 15, 23
– Multiply every number by 3?
___________________________________
• The summary becomes 3, 15, 30, 36, 60
• What if I take a group of data with avg = 10
and s = 2 and I…
– Add three to every number
___________________________________
• Avg = 13 and s = 2 (spread did not change)
– Multiply every number by 3
Slide 39
___________________________________
___________________________________
___________________________________
Things I might have missed
___________________________________
___________________________________
___________________________________
Slide 40
Know how to compare distributions
• Compare means to talk about shape center
and spread
• Compare also means to use comparative
The distribution of the raw group
words
appears to be skewed to the right, while
the smoothed distribution appears to be
approximately symmetric. They both
appear to have the same median near
25, but the raw group appears to have a
larger spread, as is seen by a larger
range and interquartile range.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 41
Know how skewness and symmetry
affect the relationship between mean
and median
Since this distribution is skewed
right (toward the higher numbers)
the mean would be larger than the
median, because the mean is non
resistant to the higher numbers,
while the median is. If it were
skewed left (or toward the lower
numbers) the mean would be
lower than the median.
Since this distribution is
symmetric, the mean and median
should be (if not the same) very
close. This is true of any
symmetric distribution, even if it
has a bizarre shape.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Know how to compare standard
deviations when looking at two
graphical displays
Slide 42
Since the raw group has an overall
larger spread than the smoothed group,
the standard deviation of the raw would
be larger than the smoothed group,
because standard deviation is a
measure of spread
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Know how to create a graphical
representation of data with two
variables
Slide 43
• Here is a distribution of M+M’s that were left
in a bowl at the of a party last weekend
Plain
Peanut
Red
10
5
Yellow
12
8
Blue
15
10
___________________________________
___________________________________
___________________________________
• You would get a distribution that looks like…
20
15
10
Plain
Peanut
5
___________________________________
0
Red
Yellow
Blue
___________________________________
Slide 44
Know how to describe differences in
categorical data
16
14
12
10
8
6
4
2
0
Plain
___________________________________
___________________________________
Peanut
Red
Yellow
Blue
• There were fewer peanut M+M’s in each category of color and
there were fewer Red M+M’s in each of the type of M+M.
Assuming that the bowl started with the same number of
peanut and plain and the same number of each color, this
would make me think that people prefferred peanut over
plain and red over the other two. If there was no preference,
we would expect these graphs to all have the same height
___________________________________
___________________________________
___________________________________
Chapter 2
Know what a percentile is and how to find a value at a certain percentile (What is the 60th percentile?), (09, 2a; 06b, 3c;
04, 6c; M7, 3)
Know how to find the probability of an event occurring or the percentage of time that an event will occur of a long
period of time (
or
), (08b, 5b; 06, 3a; 06b, 3a; 05b, 6b; 04b, 3a, 03, 3ab)
Know how to solve for a population average if you know a probability of an event happening and its standard deviation
(What does the average have to be for the class to get a 90 1% of the time?), (08b, 5c)
Know how to use the symmetry of a normal distribution to find a probability ( P(z>1)=P(z<-1), (M7, 8)
Know how to use z-scores to compare two individual from two different groups, (M7, 22; M2, 3)
Know how to find the middle 20% (or other number) or a normal distribution, (M2, 10)
Normal Distribution


If there is reason to believe that a distribution is normal, you must state that it is normal and state the average (  ) and the
standard deviation (  )
o This can be done by simply writing the shorthand version:
You must draw a normal curve picture with the problem’s numbers referenced in it, and it would also be good to reference
the formula for a standardized score (z-score)
o


z
x
x
or z 

/ n
Calculate your probability, and verify with a calculator
State your probability and its meaning in the context of the question
Example1
A box of candy is known to have an average weight of 50 oz. If it is known that the amount of packaged candy is normally
distributed with a standard deviation of 5, is it likely to get a box that weighs 62 oz or more?

z
62  50
5
There is a 0.82% chance that a box would weigh 62 oz or more. So this is very unlikely.
Example 2
Using the information above, find the middle 20% of the data.
Solution: Since we want the middle 20%, that means that there will be 40% in the top tail and 40% in the bottom tail. So
the question has really become, what is the 40th percentile and what is the 60th percentile.
The 40th percentile occurs when there is a z-score of about -0.25. So…
x = 48.75
Since it is symmetric the 60th percentile will be at 51.25.
So, the middle 20% of the data is between 48.75 and 51.25 oz
Slide 1
___________________________________
___________________________________
Chapter 2
The Normal Distributions
___________________________________
___________________________________
___________________________________
Slide 2
___________________________________
___________________________________
Z-Scores and Density Curves
___________________________________
___________________________________
___________________________________
Slide 3
A Question
• Last year, Eunice had Mr. Allen for math and
received a 87% in the class, while Irene had Mr.
Merlo for the same math class and received a
80%. It has been mathematically proven that Mr.
Merlo is a much harder teacher. In fact, his class
average was 15% lower than Mr. Allen last year.
Who is smarter?
• Why can you argue that Irene is smarter?
• What extra piece of information might prove that
Eunice is actually smarter?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 4
The Standardized Value
• The Standardized Value (z-score) is a measure of
the number of standard deviations a piece of
data is away from the mean in a normal
distribution.
___________________________________
___________________________________
___________________________________
• If a test or other measure has been standardized,
z-scores can be used to determine whether or
not individuals are better.
___________________________________
___________________________________
Slide 5
A More Detailed Question
• Last year, Eunice had Mr. Allen for math and
received a 87% in the class, while Irene had Mr.
Merlo for the same math class and received a
80%. It has been mathematically proven that Mr.
Merlo is a much harder teacher. In fact, Mr.
Allen’s class average was 15 points higher than
Mr. Merlo’s 70% average. If we know that Mr.
Allen’s class had a standard deviation of 2% and
Mr. Merlo’s class had a standard deviation of
10%, Who is smarter?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 6
Density Curves
• A density curve is what you get when you
collect a lot of data and you get a fluid shaped
graph.
• It has an area of exactly 1 underneath it.
– That’s because it represents 100% of your data.
– The median cuts the area in half.
– The mean is the balance point.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 7
Two Different Density Curves
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 8
What Is the Most Common Density
Curve?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 9
Normal Distribution
• This is the standard bell-shaped curve.
• The mean and median are always the same in
a normal distribution.
• Although different normal distributions are
similar, they might have different shapes.
– Some are “taller” or “wider” than others.
– What determines how “tall” or “wide” a normal
distribution is?
___________________________________
___________________________________
___________________________________
___________________________________
• The standard deviation.
___________________________________
Slide 10
One Up, One Down
• Although the shape may change, the
proportion of the data between the two
standard deviations remains the same.
– 68% of the outcomes are between one standard
deviation above and below the average.
– Notice one standard deviation away is at the
inflection point.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 11
The Empirical Rule
• The Empirical Rule (68-95-99.7) Rule tells you
the proportion of the data that is in the middle
when you move 1-2-3 standard deviations away
from the mean.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 12
___________________________________
___________________________________
Standard Normal Calculations
And what the ap graders are looking
for
___________________________________
___________________________________
___________________________________
Slide 13
Finding a Probability
• If a population is known to have a normal
distribution of ages with an average of 16 and
a standard deviation of 1.2, what is the
probability that a randomly chosen individual
will be older than 18?
• N(µ, σ)N(16, 1.2)
P(x>18)
= P(z>(18-16)/1.2)
= P(z>1.67)
= 1-.9525
Slide 14
Know how to find the probability
of an event occuring
• Using the same information from the previous
slide, what proportion of the population is
between the ages of 16 and 17?
N(16, 1.2)
P(16 < X < 17)
= P((16-16)/1.2)< Z < (17-16)/1.2)
= P(0 < Z < 0.83)
= 0.7967 – 0.5
=0.2967
Slide 15
Know what a percentile is and how
to a value at a certain percentile
• Using the same information from the previous
two slides, what age does an individual have
to be in order to be above the 35th percentile?
N(16, 1.2)
P(Z < -0.39) = 0.35
-0.39 = (x – 16)/1.2
-.47 = x – 16
X = 15.53
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 16
Calculator
___________________________________
• Normalcdf(lowerbound, upperbound, avg,
s.d.)
___________________________________
Example
Find P(x>18)=normalcdf(18, 999999999, 16, 1.2)
___________________________________
• InvNorm(percent behind, avg, s.d.)
___________________________________
Example
P(x < ___) = 0.35InvNorm(0.35, 16, 1.2)
___________________________________
Slide 17
Know how to find a population average if
you know the probability of an event and s.d.
 In a certain baseball league 20% of the individuals
have more than 60 RBIs. If the standard deviation of
all the players’ RBIs is 15 and the distribution is
known to be approximately normal, what is the
average number of RBIs in this league?
 The league average is 47.4 RBIs
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 18
Know how to find the middle n% of
a normal distribution
• Looking at a distribution that is N(10, 2), what
interval contains the 20% of the population
with the largest number of individuals?
Solution
In any normal distribution, the n% with the
will be in the middle, because that is where
your largest percent of data is. So, this
question is really just, “where is the middle
20%?”
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 19
Solution Continued
• Since we’re looking for the middle 20% of a
N(10, 2), we will look for the z-score that have
40% above and 40% below.
___________________________________
___________________________________
___________________________________
• A similar method using z=0.25 would give us
an x value of 10.5. So, the smallest interval
contain 20% of the data is between 9.5 and
___________________________________
___________________________________
Chapter 3
Can you plot a scatter plot? What goes on the x-axis? Y axis? Don’t forget to label. (10, 1b;08, 4a)
Know how to get the equation of a LSRL from a Minitab printout and how to make it context specific, (10b, 6a; 08, 6b;
06, 2a; 05b, 5a; 05b, 5b)
Know how to interpret the slope and y-intercept of a LSRL in a the context of a problem, (10b, 6a; 08, 6b; 07, 6abe; M2,
31)
Know what a residual, how to calculate it give data points and a LSRL, and know how to interpret it contextually, (10b,
6b; 07b, 4b; M2, 17)
Know how to describe a scatterplot (direction, strength, linear/nonlinear), (08, 4b; 08b, 6b; 04b, 1a)
Know how to graph a least squares regression line on a x and y plane, (07, 6d; 07b, 4a)
Know what happens to the slope of a LSRL if new data points are added, (07b, 4c; 03b, 1)
Know what happens to the correlation of a set of data if new data points are added, (07b, 4c; 03b, 1)
Know that the s in the bottom left of a Minitab is the standard deviation of the residuals, or typical distance each
observation is from the LSRL, and know how to interpret that in the context of a problem, (06, 2b)
Know that you need a scattered residual plot to prove that something is linear (correlation does not prove linearity), (05,
3a; 04b, 1c; M7, 40)
Know how to find an expected number for a LSRL if you are given an x-value (plug it in), (05, 3b)
Know what r-sq is and how to interpret it in the context of the problem and how to find it if you know the correlation,
(05, 3c; M2, 34)
Know what extrapolation is and when it is and is not appropriate, (05, 3d)
Know what happens to correlation if you change the units of measurement (Change weight from lbs to kgs), (M7, 10;
M2, 6)
Know how correlation relates to the slope of a LSRL, (M7, 19)
2010 #1 (If you do Chapter 5 before)
2010B # 6 (If you do Chapter 10 before)
2008 #4 (If you do Chapter 7 before)
2008B #6 (If you do Chapter 13 before)
2007B #4
2006 #2
2005 #3
2003B #1
Slide 1
___________________________________
___________________________________
Chapter 3
Examining Relationships
___________________________________
___________________________________
___________________________________
Slide 2
___________________________________
___________________________________
Scatterplots and Correlation
___________________________________
___________________________________
___________________________________
Slide 3
Variables Review
An explanatory variable is the variable that
we believe is causing the change
If we were testing a new blood pressure drug, the
explanatory variable would be the level of the
dosage of the drug
A response variable is the variable that we
believe is changing due to the explanatory
variable
In the blood pressure example, it is the blood
pressure
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 4
Scatterplots
A scatterplot shows the relationship between two
quantitative variables measure on the same
individual
If there is an explanatory variable it goes on the x
axis
If there is a response variable, it goes on the yaxis
If there does not appear to be a clear explanatory
or response variable, it does not matter which
variable goes where
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 5
Example
Here is a scatterplot comparing the age and
height of plants
We know that there are
22 plants
We believe that age is
the explanatory variable
We should have a top
label as well as a unit
label for variables
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 6
Describe a scatter plot
• When describing a scatter plot, one must take
about strength, association, and form
• Strength
– We use general terms like strong, weak, slightly
strong, slightly weak, very strong, etc
• Association
___________________________________
___________________________________
___________________________________
– It is either positive, negative, or neither
• From
– It is linear or non-linear
___________________________________
___________________________________
Slide 7
Describe a scatterplot
• This appears to have a strong, positive, linear
relationship
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 8
Describe another scatterplot
• This scatterplot has a very strong, negative,
non-linear relationship
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 9
Correlation
___________________________________
• Correlation is a number that we use to
measure HOW linear a relationship is
• It is a number between -1 and 1
___________________________________
– Negative association=negative correlation
– Positive association=positive correlation
– No associate=0 correlation
___________________________________
• The closer to 1 or -1, the more linear a
relationship is
___________________________________
___________________________________
Slide 10
Correlation
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 11
Notes about correlation
• Never switch correlation and association
___________________________________
___________________________________
when describing a relationships
– You would never say there is a strong, positive,
linear correlation
• Correlation is a number
– You would never say something has a strong
correlation
– That’s like saying it has a strong 0.76
___________________________________
___________________________________
___________________________________
Slide 12
Notes about correlation
• Just because something has a correlation very
close to 1 or -1, does not mean it necessarily is
linear
– This graph is non linear, but would have a very
high correlation
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 13
Know what happens to correlation if
you change units
Changing the units used by the variables will
not change the correlation
For example, in our plant problem, if we
changed all the age measurement to days
(instead of years) and all the height
measurements to cm (instead of inches) we
would get a very different looking scatterplot
But, it would not change the correlation, because
changing the units, does not change the overall
relationship between age and height
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 14
Know how correlation relates to the
slope of a LSRL
___________________________________
• One way to find the slope a LSRL is using the
equation
___________________________________
• So, as correlation gets larger, so does the slope
• You can also use this to see how data affects
slope
___________________________________
___________________________________
___________________________________
Slide 15
___________________________________
___________________________________
Least-Squares Regression
___________________________________
___________________________________
___________________________________
Slide 16
Least Squares Regression Line (LSRL)
• A LSRL is a line of best fit for a linear
association
• It is usually written in the form ŷ = a + bx
– This is just another way to write y = mx + b
– We use ŷ because it is a predicted value for y, not
the actual y value that will occurr
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 17
LSRL
•This scatterplot has a LSRL of ŷ =
0.83 + 0.96x
•So a plan that is 4 months old
should be close to 4.67 inches
tall
•Can we make a prediction that at 20
months old the plant will be 20.02
inches
•No. This is called extrapolation
and is very dangerous.
•You cannot make a prediction
about data outside of the domain
of the data that you collected
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 18
Interpreting a LSRL
Let’s look at ŷ = 0.83 + 0.96x comparing age and
height of plants
 Interpret the slope
 In general, we would say for every increase in 1 of x, y increase an average
of “b”.
 In this case, for every increase in one month of the age of a plant, the
height increases an average of 0.96 inches
 Interpreting y intercept
 In general, we would say that when the item is 0 x, the average item should
be “a” y
 In this case, a plant that is 0 moths old should be 0.83 inches
 This obviously does not make sense, which does occurr sometimes
when you interpret y-intercept.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 19
Residuals
• A residual is the distance between an
observed point in a scatterplot and the
predicted point from a LSRL
– Residual = observed – expected= y – ŷ
• For the three points at age 4,
there are 3 residuals
– 7.5 - 4.67 = 2.82
– 5 – 4.67 = 0.33
– 4 – 4.67 = -0.67
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 20
Residual Plot
• A residual plot is used to determine if
something is linear
– Note: correlation does not determine if something
is linear, it determines how linear
– Here is an example of a residual plot of a linear
association
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 21
Residual Plot
• You know that it is a linear relationship,
because the residuals are scattered
• This residual plot is plotted against ŷ
– You can do this, but I usually plot it against the x
variable
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 22
More on residual plots
___________________________________
• Here is a residual plot
for a non linear
association.
___________________________________
– We know that it is
non linear because
the residual plot is
NOT SCATTERED
___________________________________
___________________________________
___________________________________
Slide 23
Coefficient of determination
(R-Squared)
___________________________________
The coefficient or determination (or the square of
the correlation) is a number that represents that
“percent of the variation in y that can be
explained by x”
Let’s say that scatter plot comparing age to
amount of hair is -0.7 (because you lose hair as
you age)
In this case r squared would be 0.49
___________________________________
So, 49% of the change in people amount of hair can be
explained by their age. The remaining 51% of the
change is due to other factors.
___________________________________
___________________________________
___________________________________
Slide 24
Reading a Minitab Printout
 Here is a minitab printout of amount tile versus cost of laying it in a house
 You can see the LSRL on the top left
 R-squared is 0.81 (we never use adj)
 So, we could conclude that the correlation is 0.9, because that is the square
root of 0.81 and we know that the correlation is positive, because the slope
is positive
 The s = 9.282 is the standard deviation of the residuals
 So, the typical point on the scatterplot will be 9.282 dollars above or below
the predicted value
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Chapter 4
Know how to use the inverse function to solve for ln y, (09b, 6c)
Know how to interpret r squared in terms of transformed data, (04b, 6b)
Know how to look at a residual plot with a relationship between x and log y to determine if something is linear or non
linear, (M2, 28)
2009B #6
2004B #1
Chapter 5
Know what a treatment is and how to list them based on the description of an experiment with more than one factor?
(10, 1a; 06, 5a; 06b, 5a)
What is an experimental unit? (10, 1a; 06b, 5a)
What is a response variable? (10, 1a; 06b, 5a)
Know what a stratified random sample is, how to describe it so a reasonable person could do it, and how to implement
one where you need to represent different proportion in a population (10% Hispanic, 20% Native American, etc), (10,
4c;10b, 2b, 05, 5c; M7, 20; M2, 15)
Know how to take a simple random sample from a large population, and how to describe it so a reasonable person could
do it, (10b, 2a;08, 2c; 04b, 2a)
Why is a stratified sample sometime better than a SRS? Better than a cluster? (10b, 2c)
Why wouldn’t it be appropriate to assign people by flipping a coin at times? (09, 3a)
Why is it important to assign individuals in an experiment as opposed to letting them pick? (09, 3b; 03, 4a)
Know what a block design is and how to randomly assign experimental units within that design, (09b, 4a; 07b, 3b; 04,
2ab; M2, 16)
Know what it means for an experiment to be double blind and how it can be implemented, (09b, 6a)
Know what nonresponse bias is and how it can affect results of an observational study, (08, 2a; M7, 9)
Know how to create a completely randomized design experiment, including how to randomly assign your experimental
units so that a reasonable statistician could do it, (09, 3a; 08b, 4a; 07, 2b; 06, 5b; 06b, 5b; 05b, 3a; 03b, 4a; M7, 35; M2,
25)
Know what a control group is, why we control experiments, and how to describe its benefits in the context of a problem,
(07, 2a; 03, 4b; 03b, 4b)
Know why it is beneficial to do a block design at times, and why it is important to create homogeneous groups, (07, 2c;
07b, 3a; 04, 2c; M7, 14; M7, 31)
Know the difference between an experiment and an observational study, (07, 5a; 03b, 3a; M2, 1)
Know that you can only make conclusions about the population that your sample is taken from (If I choose five people
from 4th period I can make conclusions about 4th period. If I take 5 people from CV, I can make conclusions about CV),
(06, 5cd; 05, 1b; 04, 3c; 04, 5b; 03, 4c; 03b, 4d, M7, 16)
Know how replication is used to improve a study/experiment and how it is implemented correctly, (06b, 5c; 05, 1c)
Know what confounding is and how it can affect the results of an experiment, (06b, 5d)
Know what bias is and be able to explain in context how a bias can directly affect the results (the proportion would be
higher if the sample was truly random, (05, 5a; 04b, 2a)
Know how to create a matched pairs design, including how you randomly assign treatments, (05b, 3b; 04b, 4b)
Know what wording bias is, how to fix it, and how it affects an outcome in the context of the problem, (04b, 2b)
Know what a census is, (M7, 2)
2010B #2
2008 #2
2007 #2
2007B #3
2006 #5
2006B #5
2005 #1
2004 #2
2004B #2
2003 #4
Slide 1
___________________________________
___________________________________
Chapter 5
Producing Data: Sample and
Experiments
___________________________________
___________________________________
___________________________________
Slide 2
___________________________________
DESIGNING SAMPLES
___________________________________
___________________________________
___________________________________
___________________________________
Slide 3
Does my mommy really love me?
• An advice columnist, Ann Landers, once asked
her readers, “If you had it to do over again,
would you have children?” A few weeks later,
her column was headlined, “70% OF PARENTS
SAY KIDS NOT WORTH IT.” Indeed, 70 % of the
10,000 respondents said they would not have
children.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 4
Designing Samples
• Population
– This is who we are trying to study.
___________________________________
___________________________________
• We usually can’t get everyone, though.
• Sample
– A part of the population that represents the
whole.
– What is a true sample?
• Is our class a sample that represents the school?
___________________________________
___________________________________
___________________________________
Slide 5
Types of Samples
 Census
 When you can survey/test everyone in the population.
 Voluntary Response (Self-Selected) Sample
___________________________________
___________________________________
 When people choose whether or not to respond.
 American Idol
 Mail home survey
 Convenience Sample
 When you survey/test those easiest to reach.
 Taking a survey in the quad at lunch.
___________________________________
 Quota Sample
 When you hand pick a group that seems to match your population
 Probability Sample
 Each member of the population has a known probability of being in the
sample.
___________________________________
___________________________________
Slide 6
Probability Sampling
 Simple Random Sample (SRS)
 A sample of size n so that every set of n individuals is equally likely to be
chosen.
___________________________________
___________________________________
 This is the “best” type of sampling.
 Systematic Sample
 Picking every nth individual
 Every third person that comes through the door will win a prize.
 This is random, but it is not an SRS.
 The first two people through the door can’t both win.
 Stratified Random Sample
___________________________________
 Subgroups (strata) are picked that are similar in some way and then individuals
are chosen out of the group.
 They can be split up by proportion
 If 55% of the population is female, then I will make sure that my sample is 55% female.
 This is beneficial if you want to represent certain groups of a population, or you need to
make sure a certain group is represented
 For example, in a large group you might have a 2% population of native americans, but you
might not get a large group if you took a SRS. You would want to make sure you get them in
your sample by doing a stratified sample.
___________________________________
___________________________________
Slide 7
Other Sampling Methods
• Cluster (Area) Sampling
– The population is split into clusters and only
certain clusters are studied to get a feel for the
population.
• If I want to get a feel for town governments, an SRS will
cause us to have to do too much travelling.
• So, we randomly choose five counties (these are your
clusters) and then study every town government in
those counties.
• This saves us travelling time, but still gives us a random
sample of the population.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 8
More Sampling Methods
 Multistage Sampling
 This is when you use a sampling or combination of
sampling methods more than once to get a sample of
the population.
 If I want to interview Ca resident, I might do a cluster sample
to pick different counties, and then a SRS to pick individuals.
 This is a two-stage sample
 If I want to study US seniors we might do a stratified random
sample to get districts of certain demographics, then do an
SRS to get a smaller number of schools, then do an SRS of
seniors in those schools.
 This is a three-stage sample
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 9
Sampling Bias
 Bias
 A design is biased if it systematically favors a certain outcome.
 Undercoverage
 This is when certain groups are left out of the sample.
___________________________________
___________________________________
 A telephone survey can have undercoverage, because people without phones aren’t included.
 How is a systematic sample biased?
 Most samples, no matter how good, suffer from some undercoverage
 Nonresponse
 An individual can’t be contacted or refuses to participate
 This occurs if I randomly call 100 houses, but only 50 are reached or only 50 agree to participate
 Reponse Bias
___________________________________
 Occurs if a respondent gives false answers, they can’t understand the question, they want to
please the interviewer, or the ordering of the question favors and answer.
 Wording Bias
 The wording of a question affects the outcome.
 “Don’t you think the driving age should be raised to 18 since teenagers are so reckless?”
___________________________________
___________________________________
Slide 10
Using a Table of Random Digits
 When you pick a SRS, you need to be
Random.
 We can use a table of random digits (Table B)
 Assign a numerical label to every individual.
 Make sure that every individual has the same number of
digits.
 Don’t do 0001-1000 because then you have to use four-digit
numbers.
 Instead use 000-999.
 Use table B to select at random.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 11
Using Your Calculator
• Go to MATH/PRB
– Choose randInt
___________________________________
___________________________________
• randInt(1, 100, 23)
– You will randomly pick 23 numbers between 1 and
100.
___________________________________
• ranInt(0, 99, 45)
– You will randomly pick 45 numbers between 0 and
99
___________________________________
___________________________________
Slide 12
___________________________________
___________________________________
Designing Experiments
___________________________________
___________________________________
___________________________________
Slide 13
Observational Study vs. Experiment
• An observational study observes and records
behavior but does not impose a treatment.
– I’m going to take a survey to see how many
students drink energy drinks.
• An experiment is a study in which the
researcher imposes some sort of treatment.
– I want to determine the effects of energy drinks
on hours of sleep. So, I’m going to give some
students energy drinks and the others aren’t
allowed to drink energy drinks.
• The difference is that an experiment is
Slide 14
Experimental units and treatments
•
•
An experimental unit on which a treatments is being imposed.
– An experimental unit is called a subject if it is a person.
A treatment is a specific experimental condition applied to the experimental units.
– Two different individuals in an experiment might get two different treatments.
• One might get an energy drink, another might not.
• To find the number of treatments when there is more than one variable
you use the multiplication principle
– For example, I am testing based on energy drinks and number of
classes
» So, there are two treatments in energy drinks (yes or no) and 3
in number of classes (4, 5, 6), which means that there are six
total treatments
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 15
Explanatory and Response Variables
 An explanatory variable is what is being
implemented.
 This is the amount of caffeine given or dosage of blood
pressure medicine.
 Each explanatory variable is referred to as a factor.
 A factor can have different levels.
 In our drink and classes experiment there are two
factors
 Energy drink and number of classes
 Two levels in one factor (yes/no) and three in the other (4, 5,
6) creates six different treatments.
 A Response Variable is what is being measured
 This would be blood pressure or the number of hours
of sleep.
 An experiment usually is trying to determine if or how
Slide 16
Principles of Experimental Design
• There are three principle of experimental
design:
1.Control
2.Randomization
3.Replication
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 17
1. Control
 The biggest aspect of the actual experiment is whether
or not you are controlling the lurking variables and
confounding
 Is it the treatment that is affecting the response variable or
is it something else?
 Lurking Variables are those that are not among the explanatory
and response variables but can influence results
___________________________________
___________________________________
___________________________________
 Many experiments are controlled with a placebo
 Half of the class will get the love potion while the other half gets
sugar water.
 This way we know if it’s the love potion or just a new found
confidence.
 Controlling experiments reduces the chances of
confounding
 Confounding occurs when you cannot distinguish if the
explanatory variable is causing an affect or if another
Slide 18
Controlling Bias
 You can avoid some personal bias by blinding
experiments.
___________________________________
___________________________________
___________________________________
___________________________________
 All experiments should at least be single blind
 The subjects should not be aware that their treatment
is different than someone else’s.
 You don’t tell the subject her dosage is higher.
 In order to avoid bias from the person
implementing the experiment, it can be made
double blind.
 In this case the implementer and the subjects are not
aware of the differences in the treatments.
 The doctor does know if he is giving medicine or a
placebo?
Slide 19
2. Randomization
• How are you picking your units/subjects?
• You want to equalize groups so that lurking
variables will be equal among the different
groups.
• We want to make the groups as equal as
possible except for difference in treatments.
– If I were to study heart medicine I wouldn’t put all
the people who have had heart attacks in one
group. I would want them to be in both groups.
• You can use the different methods of sampling
in order to create randomization.
Slide 20
3. Replication
• The more units/subjects I have the better.
• The bigger the number, the more likely you
are to have a representation of the
population.
• This reduces bias or systematic favoritism.
• I don’t have to run the experiment more than
once.
– I just need to have a lot of experimental units.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 21
___________________________________
___________________________________
Types of Experiments
___________________________________
___________________________________
___________________________________
Slide 22
Completely Randomized Design
 A completely randomized design takes a random
sample from the population that we are trying to
study.
 This is like a SRS.
 In a completely randomized design each treatment is
unique and independent from the other
Example
I want to test the affects of energy drinks and number
of classes on sleep. I have created six treatment
groups based on the two factors. I put the names of
the 300 high school students that have volunteered in
a hat. The first fifty names pulled will be in the yes/4
group, the next 50 in the yes/5 group, and so on. We
will measure every individuals sleeping patterns for a
month and then compare.
Slide 23
Block Design
 A block design separates the population into blocks and tests them individually.
 This is the same as a stratified random sample.
 We could create gender blocks of men and women.
 Each block receives the exact same treatments.
 Although it is nice, blocks do not have to be the same size.
 We can have 55 men and 45 women.
Example
Using the same information on energy drinks from the previous slide, I
will split up the 300 volunteers into two groups based on gender. I will
then take all the men and randomly put them into six groups (one for
each treatment) using a SRS and run the experiment as before. I will then
take the women and put them into six groups (one for each treatment)
using a SRS and run the experiment. I will collect data for a month and
then compare the results.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 24
Matched Pairs
•
•
A matched pairs design is a type of block design that compares only two
treatments.
– I will have several pairs of fish tanks in different parts of the room. One gets
one fish food, one gets the other.
• In this case the different parts of the room are the blocks.
You can also have one subject get both treatments.
– Which is better, Dr. Pepper or Diet Dr. Pepper.
• In this case, each individual is the block.
Example
I want to determine if a new type of bicycle tire will last longer than the
other. I have found 100 bicyclists and asked them to take one new tire
and one old tire. 50 of them will put the new tire on the front and old on
the back, and the other 50 will do the opposite. We will measure each
tire on a 10 point scale and find the difference between the new and old
(n – o), and review our results.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 25
What the data looks like
•
•
Completely Randomized and Block designs
– You will have at least two lists of data, one for each treatment group
– In our example, the group that had the energy drink and four classes should
have 50 pieces of data measuring each individuals average hours of slep
during that month
• y/4—{7.1, 8.0, 6.8, …}
• y/5—{7.0, 8.0, 6.6,…}
• y/6—{6.8, 7.1, 7.2…}
• …
Matched Pairs
– Since we are comparing two treatments in individual blocks, we will be looking
at one list of data, usually representing a difference
• In our example with the tires, we would have 100 numbers representing
the difference (New – Old) from each biker’s tires
– Difference—{1.0, 0.5, -0.2, 0.0, 2.0…}
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 26
___________________________________
___________________________________
Simulating Experiments
___________________________________
___________________________________
___________________________________
Slide 27
Simulations
• You can run simulations the same way that
do a SRS.
• I want to run a simulation of picking ten
people where 53% are men and 47% are
women.
– 00-52 represent men; 53-99 represent women
– 01-53 represent men; 54-99, 00 represent
women
• I can use table B or randint on my calculator.
• How many women were picked in this
simulation?
Slide 28
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
What We Missed
___________________________________
___________________________________
___________________________________
Slide 29
Know how to take a SRS from a
large population
•
•
Observational study
– Put a name in a hat for every individual from a population and choose n individuals
– Assign every individual in the population a number and use a RNG or a table of random
digits to pick n people
Experiment
– Put all the experimental units names/assigned #s in a hat, the first n/2 you pull go into
one group, the remaining go in the other group
– For every individual, we flip a coin, if it’s heads they go into one group, if it’s tails it goes
in the other group.
• Once one group fills n/2, the remaining individuals go in the other group
• You have to make sure that the individuals are chosen in a random order. You
would not want to go through students in order of grade in a class, because the last
students would all be put into a group, but they are all the student with the lowest
grades
– For every individual, roll a die. If it’s a 1 o2 they go into one group…
• As with the coin, you have to make sure that individuals are chosen in a random
order
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 30
Know who you can draw
conclusions about
• You can only draw conclusions about the group from which you
drew your sample
– If I took 100 random student from CV, I could only draw
conclusions about CV students, not students
– If I took 100 students from California, I could only draw
conclusions about students from California, not the nation
• It also does not matter how many you take as long as it’s random
– If I randomly chose 5 students form CV, I could make a
conclusion about students form CV as long as it’s random
– It does not matter how large the sample is
– We will talk about the setbacks of small sample second
semester
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Chapter 6
Know how to find a probability, a union probability (A or B), and a conditional probability using a two way table, (10b,
5ab; 03b, 2ab; M7, 18)
Know the two ways to check for independence (10b, 5c; 03b, 2c)
Know how and when to find a conditional probability, (09b, 2a)
Know how to find union probabilities of disjoint events, (09b, 2bc;08, 3cd; 04, 4a)
Know how to find a joint probability (A and B) of two or more independent events, (08, 3b; 04, 3b; 04, 4a; 03b, 5a; M7,
38; M2, 36)
Know how to interpret a probability in the context of a problem and deem if it is likely to occur or not, (04, 3c)
Know what it means for events to be mutually exclusive and how that affects their joint probability and their union
probability, (M7, 36; M2, 23)
Know how to set up a simulation and perform it using a table of random digits, (M2, 4)
2009B #2
2003B #2
Slide 1
___________________________________
___________________________________
Chapter 6: PROBABILITY
The Study of Randomness
___________________________________
___________________________________
___________________________________
Slide 2
___________________________________
___________________________________
Simulation
___________________________________
___________________________________
___________________________________
Slide 3
Simulating Randomness
• Simulation is the imitation of chance or behavior, based on a
model that accurately reflects the phenomenon under
consideration
Example
A statistician wants to simulate pulling ten people at random from the US
population. Describe a simulation attempting to establish how many women there
will be.
Solution
The statistician will assume that women and men are split up 50-50. So, he will flip
a coin ten times and every head he gets will represent choosing a woman from the
population. After ten flips he will record the number of women. He will run this
simulation 100 times, recording the number of women every time, and then will
average his 100 numbers to make an estimate at the number of women that he
“should” get.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 4
Another Simulation
Example
Gary is a pretty decent free throw shooter, converting 81% of his
free throws last season. He figures this year that he will take about
50 free throws. Run a simulation to establish how many of them he
should convert.
Solution
Using the numbers 1-100 we will assign 1-81 as a make, and 82-100
as a miss. Using the the random number generator on our
calculator we will simulate 50 free throws: RandInt(1, 100, 50). We
will record the number of makes. Then we will run this simulation
19 more times, and take the average of the makes to give him an
idea of how many he “should” make this season.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 5
___________________________________
Randomness
___________________________________
___________________________________
___________________________________
___________________________________
Slide 6
The Language of Probability
 Random
◦ A phenomenon where outcomes are unpredictable,
but a pattern will emerge in the long run.
___________________________________
___________________________________
 What is the pattern when I flip a coin?
 Probability
◦ The proportion (percentage) of times that an event will
occur after many repetitions.
___________________________________
 What the proportion of heads that we will get?
 Independence
◦ Events are independent if one event has no effect on
another event.
 Flipping a coin twice.
___________________________________
___________________________________
Slide 7
___________________________________
___________________________________
Probability Models
___________________________________
___________________________________
___________________________________
Slide 8
Sample Spaces and Events
 Sample Space
◦ All possible outcomes.
 Flipping a coin: {H, T}
 Rolling a die: {1, 2, 3, 4, 5, 6}
___________________________________
___________________________________
 Event
◦ An outcome or set of outcomes of a random phenomenon.
 Multiplication Principle
◦ When you combine two phenomenon, the new sample space
conatains the product of the size of the two phenomenon.
 When I flip a coin and roll a die, there are 2 X 6 = 12 events in the sample
space.
 We used this last chapter when we were establishing how many
experimental treatments there are
___________________________________
___________________________________
___________________________________
Slide 9
Probability Rules
1. The Probability of any event must be
between 0 and 1.
2. The Probability of the sample space is 1.
1. P(S) = 1
3. The complement of an event is the
probability that the event won’t occur.
4. The addition rule.
1. Disjoint (Mutually Exclusive) events are events
that don’t have anything in common.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 10
Complement rule
• The complement rule states that
Example #1
If the P(G) = 0.4, the P(not G) = 0.6
___________________________________
___________________________________
___________________________________
___________________________________
Example #2
If the P(getting an A) = 0.1, then the
P(not getting an A) = 0.9
Slide 11
The Addition Rule
• If two events are disjoint (mutually exclusive),
then the addition rule is
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 12
___________________________________
___________________________________
What is Independence?
___________________________________
___________________________________
___________________________________
Slide 13
Independent and “And”
A Partner Question
If 50% of the population are men and 20% of
the population have a college degree, what
percent of the population falls under both
categories. It might help to pretend that there
are only 100 people in the entire population.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 14
The Question
• What did we have to assume in order to get
an answer of 10%?
– That men were just as likely to have a degree as
women.
• That is independence
• How does our answer change if we know that
30% of women have a degree, while only 10%
of men have a degree.
– That gives us the 20% of the population that have
a degree, but gives us a different answer to our
problem.
Slide 15
What Statistically is Independence?
• Two events are independent if
• In our example
P(man and degree) = P(man) X P(Degree)
P(man and degree) = 0.5 X 0.2
P(man and degree) = 0.1 or 10%
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 16
What If They’re Not Independent?
• If two events are not independent then
• If we know that 50% are men and 20% have a
degree, but that only 5% have both, are these
events independent?
P(man and degree) ?=? P(man) X P(Degree)
0.05 ?=? 0.5 X 0.2
These are not equal. So, being a man and having a degree are not
independent. By being a man, the likelihood of a randomly chosen person
having a degree changes.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 17
What is independence?
 Events
are independent if one event
has no effect on another event.
• Essentially that means that if one
event occurs, it does not change the
likelihood of another event occurring
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 18
___________________________________
___________________________________
General Probability Rules
___________________________________
___________________________________
___________________________________
Slide 19
The General Addition Rule
• If two events are NOT disjoint, then
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 20
Our Question
Example
If the probability of choosing a man from a population is 50%,
the probability of choosing a college grad from a population is
20%, and the probability of choosing someone who is both is
5%, what is the probability that we would choose an
individual who has at least one of those qualities?
Solution
P(M or D) = P(M) + P(D) – P(M and D)
P(M or D) = 0.5 + 0.2 - 0.05
P(M or D) = 0.65
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 21
What if events are Independent?
Example
In a population, 25% of the group are seniors
and 40% are hispanic. If the two events have
been shown to be independent, what is the
probability of randomly choosing someone
who is either a senior or hispanic?
Solution
P(S or H) = P(S) + P(H) – P(S and H)
P(S or H) = 0.25 + 0.4 – (0.25)(0.4)
P(S or H) = 0.55 or 55%
Slide 22
A conditional probability
• A conditional probability is a probability of an
event based on the fact that a different event
has all ready occurred
• If I have a bag full or blue and red marbles and
I have all ready pulled a red marble, then the
probability of pulling a blue marble is written
– P(Blue|Red) = Probability of Blue “given” that a
red has been drawn
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 23
General Multiplication Rule
• The general multiplication rule is
Example
If there are 6 red marbles in a bag, and 4 blue marbles in the
same bag. What is the probability of pulling a red marble
followed by a blue marble?
Solution
P(R and B) = P(R) X P(B|R)
P(R and B) = (6/10) X (4/9) = .266… or 27%
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 24
Know how and when to find a
conditional probability
•
Given the following data of M+Ms
Plain
Peanut
Red
10
5
Yellow
12
8
Blue
15
10
• What percent of blue M+Ms are peanut?
Solution
Notice that is says “of blue” not “are blue and peanut.” That’s why it is a
conditional probability, not a joint probability.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 25
___________________________________
___________________________________
More On Independence
___________________________________
___________________________________
___________________________________
Slide 26
More on the Multiplication Rule
• Here are two things that we know…
– The general multiplication rule says
– If two events are independent, then
___________________________________
___________________________________
___________________________________
___________________________________
• Our Conclusion: If two events are
independent, then A has no effect on B. So,
___________________________________
Slide 27
Two Ways to Check Independence
• Two events are independent if one of the two
following things are true:
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 28
___________________________________
___________________________________
Tree Diagrams
___________________________________
___________________________________
___________________________________
Slide 29
Tree Diagrams
• Tree diagrams can be used to show
probabilities when you have two or more
events
– Here is an example of pulling marbles out of a bag
with replacement
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 30
Tree Diagrams Part II
•
Here is a probability tree based on the likelihood of meeting with an individual on
three house visits
– What is the likelihood of missing the person every time?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 31
___________________________________
___________________________________
Itmes We Might Have Missed
___________________________________
___________________________________
___________________________________
Slide 32
Know how to find the union
probabilities of disjoint events
An HIV test a 99% of a giving a negative result if an individual
does not have HIV. If an individual fails, you take a second test
to make sure, and if that is positive, an individual is cleared.
How likely is it that an individual who is not HIV positive will
pass.
Solution
In this case, there are two disjoint events: passing the first
time and passing the second time
P(A) +P(B)*P(A)
=0.99+(0.01)(0.99)=0.9999
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 33
Know what it means for events to be
mutually exclusive and how that
afffects their joint/union probabilities
•
•
•
•
Disjoint (mutually exclusive) events have P(A and B) = 0
– Thus P(A and B) ≠ P(A)*P(B)
– Disjoint events are not independent and independent events cannot
be disjoint
Events that are not disjoint have a
P(A and B) > 0
– They could be independent or dependent
– Events that are not disjoint could be either
Independent events must NOT be disjoint
Dependent events could be disjoint or not
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Chapter 7
Know how to find the expected outcome of a discrete random variable, (08, 3a; 05, 2a; 05b, 2a; 04, 4b; 03b, 5b; M2, 5)
Know how to find the standard deviation when you average two variables, (08, 4c)
If you have two normal distributions that are both normal and you add or subtract them, what is the shape of the new
distribution? How do you find the average of the new distribution? How do you find the standard deviation of the new
distribution? (08b, 5a; 05b, 2b; M7, 26)
Know how to create and read a graph of the probability distribution for a discrete random variable, (07b, 1a)
Know what the law of large numbers is and how to describe in the context of a problem, (05, 2b)
Know how to find the median, quartiles, and interquartile range by looking at a probability distribution for a discrete
random variable, (05, 2c; M7, 12; M7, 24)
Know how to find the standard deviation of a discrete random variable, (05b, 2a)
Know how to find the average and standard deviation when you combine distributions and change the
units(
), (05b, 2c)
2008 #3
2008B #5
2004 #4
Probablity Distribution (non specific)
o
o
Probablities must add to 1
Expected Outcome (Mean of the distribution)
o
o
 x   xi pi
 Multiply each outcome by its probability and add together
Standard deviation
o
Example: The number of hats sold at Landry’s per week is as follows
X
0
1
2
3
P(X)
0.4
0.3
0.2
0.1
 x  (0)(0.4)  (1)(0.3)  2(0.2)  3(01
. )  10
.
On average, Landry’s sells 1 hat per week with a standard deviation of 1 hat. He should buy fifty two hats per year
Other notes: We know that the five number summary is 0 0 1 2 3 based on where the percentiles are.
Slide 1
___________________________________
___________________________________
Chapter 7
Random Variables
___________________________________
___________________________________
___________________________________
Slide 2
___________________________________
Discrete and Continuous Random
Variables
___________________________________
___________________________________
___________________________________
___________________________________
Slide 3
Random Variables
• A random variable is a variable whose value is
a numerical outcome of a random
phenomenon.
– Remember that a random phenomenon is where
outcomes are unpredictable, but a pattern will
emerge in the long run.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 4
Discrete Random Variable
• A discrete random variable has a countable
number of possible values.
___________________________________
___________________________________
– I know when I roll a die that there are exactly six
possibilities.
– I know when I pick an integer between 1 and 10 that
there are exactly ten possibilities.
___________________________________
• These individual probabilities are all between 0
and 1, and they add up to 1.
• Discrete probability histograms use bars to show
all the individual probabilities.
___________________________________
___________________________________
Slide 5
Probability Histograms
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 6
Continuous Random Variable
• A continous random variable has an uncountable
number of individual outcomes, but probabilities
of intervals can be found.
– These distributions are described by density curves.
– Probabilities of intervals are found by finding the area
under the curve.
• These are normal distributions, as well as other
distributions with uncountable outcomes.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 7
Continuous Random Variable
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 8
___________________________________
Means and Variances of Random
Variables
___________________________________
___________________________________
___________________________________
___________________________________
Slide 9
Means of Discrete Random
Variables
• The mean of a random variables can also be
describes as the expected outcome.
• To find the expected outcome of a discrete
random variable…
– Multiply each possible outcome by its individual
probabilty.
– Add up those numbers.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 10
Finding Expected Value
• Here is the distribution for the AP scores from
1
2
3
4
5
2010.Score
P(X)
.23
.18
.24
.22
.13
___________________________________
___________________________________
___________________________________
What was the average score (expected
outcome)?
= 1(.23) + 2(.18) + 3(.24) + 4(.22) + 5(.13)
Slide 11
Standard Deviation of a Discrete
Random Variable
___________________________________
___________________________________
___________________________________
• Whenever you can find the average (center) of
a distribution you can also find out its
standard deviation (spread)
• Finding the standard deviation of a discrete
random variable is similar to finding the
standard deviation of a set of numbers.
___________________________________
• The formula is _________________________
___________________________________
___________________________________
___________________________________
Slide 12
Finding Standard Deviation
• Here is the distribution for the AP scores from
1
2
3
4
5
2010.Score
P(X)
.23
.18
.24
.22
.13
___________________________________
___________________________________
___________________________________
What was the standard deviation of these test
scores?
___________________________________
___________________________________
Slide 13
Law of Large Numbers
• The House always wins.
• The Law of Large Numbers says that if you
continue to make observations of a random
event, the proportion of outcomes should
approach the expected probabilities.
• What would happen if you flipped a coin ten
times?
– How about 100 times?
– How about 1000 times?
– How about 1,000,000 times?
• Is there are law of small numbers?
Slide 14
What happens if I change data
• Let’s say that I added a certain number to all
pieces of data or multiplied a number for all
pieces of data. We would follow this formula
for average.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
– This formula might be used for converting
between units of measurement.
• In this same situation, the standard deviation
also changes.
___________________________________
___________________________________
Slide 15
Example
• A sports store near the beach makes money
by renting boats to patrons. Each patron must
pay an initial $20 fee and then $10 per hour.
If the average customer rents a boat for 3.4
hours, on average how much money does the
sports store make per boat customer?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 16
Example Continued
• Using the information in the previous
problem, what would be the standard
deviation of the dollar amount, if you knew
that the standard deviation of the time is 0.5
hours?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 17
What happens if I try to combine
two groups of data
• If there are two groups of INDEPENDENT data,
then the sum and differences of their data can
be described by
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 18
Example
• Mr. Merlo is trying to compare his first period
class from this year to his first period class
from last year. He knows that their class
average last year was 58% and this year it is
63%. What is the average difference between
the two classes?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 19
What about standard deviation
• In this same situation the standard deviation
also changes, but you have to use the
variances to find out how much
___________________________________
___________________________________
___________________________________
• Notice, that it is the same formula whether
you are adding or subtracting the two sets of
data.
Slide 20
Example of Standard Deviation
• Mr. Merlo is trying to compare his first period
class from this year to his first period class
from last year. He knows that their class
standard deviation last year was 6% and this
year it is 8%. What is the standard deviation
of the difference between the two classes?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 21
Standard Deviations of Sums
• In order to find the standard deviation of the
sum or difference of two variable,
you
must add their
variances, not their
standard deviations.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 22
Two Normal Distribution
• If we know that two distributions are normal,
then we can use our rules of normal
distribution to find different probabilities
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 23
Let’s Go Bowling
•
Mr. Merlo and Mr. Furutani have entered a team bowling tournament. If
Mr. Merlo has a bowling score with a normal distribution N(120, 30) and
Mr. Furutani has a bowling score with normal distribution N(160, 40), what
is the probability, assuming that their scores are independent that they
will bowl a combined score of at least 320?
___________________________________
___________________________________
___________________________________
•
So, now we know that distribution of their combined scores will be N(280,
50)
___________________________________
___________________________________
Slide 24
Lets Go Bowling Continued
•
Mr. Merlo and Mr. Furutani have entered a team bowling tournament. If
Mr. Merlo has a bowling score with a normal distribution N(120, 30) and
Mr. Furutani has a bowling score with normal distribution N(160, 40), what
is the probability, assuming that their scores are independent that they
will bowl a combined score of at least 320?
___________________________________
___________________________________
– Now we know that the combined distribution is N(280, 50)
P(X > 320)
=P(Z > (320-280)/50)
=P(Z>0.80)
=.2119
There is a 21% chance that they will combine for a score greater than 320.
___________________________________
___________________________________
___________________________________
Slide 25
___________________________________
___________________________________
What We Missed
___________________________________
___________________________________
___________________________________
Slide 26
•
Know how to find the standard
deviation when you average 2
variables
Mr. Merlo and Mr. Furutani have entered a team bowling
tournament. If Mr. Merlo has a bowling score with a normal
distribution N(120, 30) and Mr. Furutani has a bowling score
with normal distribution N(160, 40), what is the distribution
of the average of their two scores? Assume that they are
independent.
• Since they are both normal, we know that the
distribution of the average will also be normal
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 27
Know how to…continued
• In terms of mean and s.d., we are looking at a
distribution of an average
___________________________________
___________________________________
– That means we want the distribution of (M + F)/2
• The average is what you would expect it to be
• We have to think about the standard deviation
a little more
___________________________________
___________________________________
___________________________________
Slide 28
Know how to find median and
quartiles by looking at a probability
distribution
• Here
is the distribution
of
a scores
on a 5
Grade 0
1
2
3
4
5
point
P(G) test
0.1
0.1
0.4
0.14
0.16
0.1
• What is median?
– Median is the number that has 50% above and
50% below
– So, that number would occur at 2, because 60% of
the data is 2 or smaller and 20% is 1 or smaller, so
the 50th percentile has to be in two
– By the same idea, the 25th percentile (Q1) will also
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Chapter 8
Can you find the expected outcome and standard deviation of a binomial distribution? (10, 4a;10b, 3b)—525*
Can you set up and solve a binomial problem that is cumulative using a calculator(
4b;10b, 3c;09, 2b)
=binomcdf(n, p, 30)? (2010,
Know how to solve a binomial problem that is the complement of a cumulative problem (
), (06, 3b; 06b, 3b)
If you believe that you are doing a binomial question, state so immediately in your answer, B(n, p), (10b, 3a;09, 2b; 07b,
1b; 06b, 6c; 03, 3c)
Know how to solve a basic binomial problem (P(X=1)), (07b, 2b; 06b, 6c; 03, 3c; M2, 32)
Know the conditions for a binomial problem and when a problem is not binomial, (04, 3a; M7, 7; M7, 11)
Binomial Distribution
Both of these formulas are in your packet
Solving a binomial probem
 You must state that it is binomial, state what a success is, what the probability of success is and how many observations are
being made.
o This can be done with the shorthand: B(n,p) Make sure to state what p represents.
 You must show that it meets the four criteria
o Success/Failure
o There are a set number (n) of observations
o Probability of success never changes
o Each observation is independent
 Plug into the formula
o P(X = k) =
 Describe you answer in the context of the question
Example 1
The probability of making any money in a state lottery is 0.4. There is a drawing once a week. What is the probably that you would
win at least six times in a seven week period?
B(7, 0.4) where n is the number of weeks being observed and p = the probability of making any money = 0.4
1.
2.
3.
4.
success=making money/failure=not making money
There are seven weeks of observations
p = probability of winning = 0.4
There is no reason to believe that each drawing is not independent
There is a 1.88% chance of randomly winning 6 or more times
Example 2
You could also do this same problem the following way:
Example 3
What is the expected outcome and the standard deviation of this distribution?
Expected = 7(0.4)=2.8
s.d.=1.296
2010 #4
2010B #3
2006 #3
2006B #3
2004 #3
2003 #3
Slide 1
___________________________________
___________________________________
Chapter 8
The Binomial and Geometric
Distributions
___________________________________
___________________________________
___________________________________
Slide 2
___________________________________
___________________________________
The Binomial Distributions
___________________________________
___________________________________
___________________________________
Slide 3
Let’s Flip A Coin
___________________________________
• If I decided to flip a coin three times. What is
the probability that I will get exactly 2 heads?
___________________________________
• This is what we call a binomial distribution.
___________________________________
___________________________________
___________________________________
Slide 4
What makes it binomial?
1. Each observation falls into one of just two
categories: “success” or “failure.”
2. There is a fixed number of observations.
3. All observations are independent.
4. The probability of “success” never changes.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 5
The Binomial Distribution
___________________________________
 We make a set number of observations (n)
and each “success” has the same probability
(p).
 Example
___________________________________
 Let’s say that the probability that a girl says she
will go out with me this week is 0.1, and I ask out
7 girls.
___________________________________
▪ B(7, 0.1)
 What are we assuming is true?
▪ Each are independent.
___________________________________
___________________________________
Slide 6
Finding Probabilities
 To find a binomial probability, we can use the
formula ____________________
 Lets look at the dating example: B(7, 0.1)
 First, define X = # of times Merlo gets a date.
 Find P(X = 2)
▪ This is a little funky because any one of the girls could say yes.
P(X = 2)
=21 * (0.1)^2 * (0.9)^5
21 ways it could happen, 2 yes, 5 no.
So, you get 0.1240
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 7
Dating Continued
 Find P(X = 1)
 P(X = 1) = 7 * (0.1) * (0.9)^6 = 0.3720
 Find P(X = 0)
 P(X = 0) = (0.9)^7 = .4783
 We don’t have to use factorials because there is only one
way that this could happen
 I can use my calculator
___________________________________
___________________________________
___________________________________
 2nd/Distr/0:binompdf
▪ Binomepdf(7, 0.1, 3) finds P(X = 3)
 2nd/Dist/1:binomcdf
▪ Binomcdf(7, 0.1, 3) finds P(X < 3)
▪ This is cumulative
___________________________________
___________________________________
Slide 8
Desperate Dating
• Let’s go ahead and assume the information from the
previous problems is true about my dating life: B(7,
0.1). What is the probability that at least one girl will
say yes?
P(X≥1)=P(X=1) +P(X=2) + P(X=3)+…+ P(X=7)
= 1- P(X < 1)
= 1 – P(X = 0)
= 1 – (0.9)^7
= 0.5217
There is a 52% chance that at least one girl will say yes.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 9
What if he asks 100 girls
• Now we are looking at B(100, 0.1)
• Lets find the probability that at most 15 will
say yes
– P(X≤15) = Binomcdf(100, 0.1, 15) = 0.9601
• Let’s find the probability that at least 8 will say
yes
– P(X≥8) = 1-P(X≤7) = 1-binomcdf(100, 0.1, 7) = 0.7939
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 10
Mean of a binomial distribution
• To find the mean, you multiply n*p
___________________________________
___________________________________
– The formula is ____________
• On average, you would expect me to get
7*0.1=0.7 dates per week.
___________________________________
___________________________________
___________________________________
Slide 11
Standard Deviation of a binomial
distribution
___________________________________
• The formula for finding the standard deviation
of a binomial distribution is_______________
___________________________________
• In our problem, Mr. Merlo averages 0.7 dates
per week with a standard deviation of
___________________________________
• 0.7937 dates per week
___________________________________
___________________________________
Slide 12
___________________________________
___________________________________
Geometric Distributions
___________________________________
___________________________________
___________________________________
Slide 13
How Long Until We…
• Josh and Eli are playing a game where they flip
a coin. If it’s heads Josh wins, if it’s tails Eli
wins. On a bright and shiny Wednesday, they
play the game, and Josh keeps losing. In fact,
he doesn’t win until the 8th flip of the game.
He exclaims, “What is the probability of that?”
What exactly is the probability of that?
• This is a geometric distribution
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 14
What makes it geometric?
1. Each observation falls into one of just two
categories: “success” or “failure.”
2. The probability of “success” never changes.
3. All observations are independent.
4. The variable of interest (X) is the number of
trials required to obtain the first success.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 15
The Geometric Distribution
 We are going to keep making observations until
we succeed.
___________________________________
___________________________________
 The formula for this is ___________________
 Example
 I am going to keep asking out girls until one says yes.
 P(X = 1) = 0.1
___________________________________
▪ That’s the likelihood that the first girls I ask will say yes.
 P(X = 5) = (0.9)(0.9)(0.9)(0.9)(0.1)
▪ That’s the probability that I have to ask five girls until one says
yes.
___________________________________
___________________________________
Slide 16
Adding an infinite amount of numbers
• The probability that it takes more than n trials
___________________________________
___________________________________
to see the first success is _________________
• Let’s find the probability that it will take me at
least 6 tries to get a date.
– P(X > 6) = P(X > 5) = (1 – 0.1)^5 = (0.9)^5= 0.59
___________________________________
___________________________________
___________________________________
Slide 17
Expected Outcome Again
• The mean of a geometric distribution, is the
average number of trials it should take until
you succeed.
– It can be found by _________________
• On average it would take me 1/0.1 = 10 tries
until I got a date.
– This makes sense, because there is a one tenth of
a chance that I will actually get a date.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Chapter 9
How is a sampling distribution of a sample mean different than a distribution of individual observations? (10, 2a; M7, 23)
Know how to find the probability of averaging a certain amount in a Normal distribution (
3b; 06, 3c; 04b, 3c)
? (10, 2b; 09, 2c; 07,
Know how to find the standard deviation of a proportion, (08, 4c)
What does it mean to be considered a biased or unbiased estimator? (08b, 2; M7, 33)
Know how sampling distributions of sample means change as your sample size gets larger, and how that affects the
likelihood of extreme events occurring, (07, 3a; M2, 9)
Know that when you are referring to a standard deviation of a sampling distribution of the sample mean, you do not
have to divide by rad n, it has all ready been done, (07, 3b)
Know that when you have a large sample size, even the most nonnormal distributions will have a sampling distribution
of the sample mean close to a normal distribution by the Central Limit Theorem, (07, 3c; 07b, 2c)
Know how to find the average and standard deviation of a sampling distribution of a sample mean if you know the
average and standard deviation of the original data, (07b, 2c; M2, 30)
Know the difference between finding the probability that a group of 10 will average more than 150 lbs and a group of
ten will all weight more than 150 lbs, (05b, 6c)
2010 #2
2009 #2
2008B #2
2007 #3
2007B #2
2006 #3
2004B #3
Sampling Distributions of Sample Means
As your sample size become larger, these are things that you you notice:
1) It becomes more normal (But will never be normal, only approximately normal)—(This is the Central Limit
Theorem)
a. If something is all ready symmetric, then you only have to make a few observations for the sampling
distribution of the sample means to become approximately normal (notice the picture on the right)
b. If something is not at all symmetric, then you need about 30 observations for the sampling distribution
of the sample means to become approximately normal
2) The average remains the same (If the overall
then
no matter how large the sample size)
3) It becomes less spread (If the overall
then
no matter what n is)
This is true even if the sampling distribution of the sample means is not normal and if it is normal
Example 1 (Know the difference between finding the probability that a group of 10 will average more than 150 and a
group of ten will all weigh more than 150lb)
A group of 10,000 people have a normal distribution of weights with an average of 145 and a standard deviation of 20.
1) What is the probability that a randomly chosen person will weigh more than 150lb?
N(145, 20)
2) What is the probability of 10 random people averaging more than 150lb?
N(145,
)
3) What is the probability that 10 random people will all weigh more than 150?
Since the probability that one individual will weigh more than 150 is 0.4013, the probability that 10 individuals
will all weigh more than 150 is…
Slide 1
___________________________________
___________________________________
Chapter 9
Sampling Distributions
___________________________________
___________________________________
___________________________________
Slide 2
___________________________________
___________________________________
Sampling Distributions
___________________________________
___________________________________
___________________________________
Slide 3
Parameter and Statistic
• A parameter is a number that describes a
population. It is usually unknown.
___________________________________
___________________________________
– 53% of CV students like Mr. Merlo.
• 53% is a parameter. We write that p = 0.53
• A statistic is a number taken from sample data.
This is usually what we can get.
___________________________________
– In a survey of 100 students, 53% say that they like Mr.
Merlo.
• 53% is the statistic. We write that ____________
• Statistics are used to estimate parameters.
___________________________________
___________________________________
Slide 4
Sampling Variability
A statistic is merely a good guess at a parameter.
If I took several surveys of students, looking at different
groups of 100, I might get different statistics (53%, 47%,
58%).
___________________________________
___________________________________
 This is called sampling variability
In the ideal world we would have a survey of every
possible group of 100, and we would create a histogram
of those probabilities.
 This is called a sampling distribution.
 I understand that this is unreasonable, because it would be
easier just to ask every person, but I’m setting up something
for later.
___________________________________
___________________________________
___________________________________
Slide 5
Sampling distribution
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 6
Sampling Distribution
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 7
Bias and Variability
• If a statistic or the mean of a sampling
distribution is far from the actual parameter,
it is said to be biased.
• It is unbiased if the statistic or mean of a
sampling distribution is equal to the true value
of the parameter.
• Variability is how spread out the statistics
gathered are.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 8
Bias And Variability
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 9
___________________________________
___________________________________
Sample Proportions
___________________________________
___________________________________
___________________________________
Slide 10
The Distribution of a Statistic
• Let’s look at the Mr. Merlo likers.
• If I took as many samples of groups of 100 as I
could, that sampling distribution (histogram)
would be close to normal.
– Not only would it be normal, but its mean, should
be the parameter that I’m looking for.
– And, the standard deviation of the distribution
___________________________________
___________________________________
___________________________________
___________________________________
would be ________________
___________________________________
Slide 11
Rule of Thumb
1. Use the formula for standard deviation of a
statistic only when the population is at least
10 times bigger than the sample.
1. In the Merlo example, I would have to use a
sample smaller than 300 because the population
of the school is 3000.
2. We can only really assume that its normal if
np and n(1-p) are bigger than 10.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 12
Example
•
If 100 students are chosen at random, how likely is it that 60 or more will
like Mr. Merlo if we know that 53% actually do like him?
•
Since 100 is less than 10% of the school population, it would be
appropriate to use the standard deviation formula.
Since 100(.53)>10 and 100(.47)>10, we can also use the normal
approximation.
•
•
___________________________________
___________________________________
___________________________________
So, now we know that this distribution is
___________________________________
___________________________________
Slide 13
Example Continued
• Now that we know
___________________________________
___________________________________
• We want to find
___________________________________
___________________________________
___________________________________
Slide 14
___________________________________
___________________________________
Sample Means
___________________________________
___________________________________
___________________________________
Slide 15
Sample means
• If I know the true values…
– Parameter is usually represented with μ.
– The standard deviation of the population is σ.
• If I’m using sample data…
– The mean of the sampling distribution of x-bar is
μ.
– If the standard deviation is σ, then the standard
deviation of our sample is σ/√n.
• What happens to the standard deviation as our sample
sizes gets bigger?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 16
Know how to find the prob of
averaging a certain amount
A soda factory releases soda out of a machine and the amount of liquid
that comes out has a normal distribution with an average of 12 oz and a
standard deviation of 0.4 oz. How likely is it that a six pack of soda will
average more than 12.2 oz per bottle?
___________________________________
___________________________________
___________________________________
Solution
___________________________________
=0.1112
___________________________________
Slide 17
Central Limit Theorem
• As your n gets bigger, all distributions, no
matter how skewed, will begin to become
normal.
• These distributions will have the same mean
as the original distribution and a standard
deviation of σ/√n.
• An example of the Central Limit Theorem can
be found at
http://www.stat.sc.edu/~west/javahtml/CLT.ht
ml
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 18
CLT
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 19
CLT
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 20
Example
___________________________________
• Mr. Merlo is going to roll a die one hundred times and find the
average. What is the probability that the average will be
greater than 3.6 if you know that the average of rolling one
die is 3.5 with a standard deviation of 1.71?
___________________________________
• By the CLT, we know that rolling dice will have an average of
3.5.
• Even though rolling a die is definitely not normal, by the CLT,
we can say it’s normal.
• So, now we have a new distribution of the average of rolling
die. It is
___________________________________
___________________________________
___________________________________
Slide 21
Example Continued
• Mr. Merlo is going to roll a die one hundred times and
find the average. What is the probability that the
average will be greater than 3.6 if you know that the
average of rolling one die is 3.5 with a standard
deviation of 1.71?
• We now know that CLT tell us N(3.5, 0.171)
• =P(z>(3.6-3.5)/.171)
• =P(z>0.58)
• =.2810
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Chapter 10
Know how to interpret a confidence level (i.e. What does it mean to be 95% confident)? (10, 3a; 03b, 6a; M2, 37)—622
What exactly is a confidence interval? How do I use it to argue for or against a claim (e.g. Based on your confidence
interval is there evidence that boys are smarter than girls?)? (10, 3b;10b, 6d; 03, 6b)
If I need to have a certain margin of error, how do I establish how many observations to make? (10, 3c; 08b, 3a; 05, 5b;
03b, 6b; M2, 26)—635, 671
Know the conditions for a 1-proportion z interval, specifically how to check normality, (10b, 4a; 03, 6b; M7, 21)
If we are taking a sample of 50 from a population, why don’t we consider the fact that the probability is changing
because we are not replacing? (10b, 4b)
Know how to calculate a one proportion z interval, (03, 6b; 03b, 6a)
Know how to interpret a one proportion z interval, (03, 6b; 03b, 6a; M7, 34)
Know how to check the conditions for a one proportion z interval, (03b, 6a)
Know how changing the sample size affects the margin of error, (M7, 30)
Know what happens to a confidence interval and the margin of error if you increase the confidence level, (M2, 13)
Know how to calculate a confidence interval for a one sample t-interval, (04, 6a; M2, 8)
Know how to interpret the confidence interval for a one sample t-interval, (04, 6a)
Know when to do a t-test/interval instead of a z-test/interval, (M7, 25; M2, 33)
Know what a t-distribution is and how it is similar and different to a normal distribution, (M2, 18)
2010 #3
2010B #4
2008B #3
2008B #4
2005 #5
2003 #6
2003B #6
One sample (paired) t-confidence interval—Quantitative Data
I)
Name the Test and state the formula
a.
One sample (paired) t-confidence interval
a.
II) Determine whether or not conditions are met
a.
Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i. If n < 10, make a boxplot and verify that it’s normal
ii. If n < 30, make a boxplot and verify that it’s close to normal
iii. If n > 30, state that the CLT allows us to say that it’s normal
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then you must verify that the sample is less than 10% of the population.
III) Do the math
a.
Plug the given numbers into the formula, state the degrees of
freedom that you are using, and use a calculator to verify
c.
IV) Draw a conclusion in context
a.
We are _____% confident that the avg. ____________________ is between _____ and _____
One sample z-confidence interval for proportions—Categorical Data
I)
Name the Test and state the formula
a. One sample z-confidence interval for proportions
b.
II)
Determine whether or not conditions are met
a. Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i.
ii.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then you must verify that the sample is less than 10% of the population.
Do the math
a. Plug the given numbers into the formula, and use a calculator to verify
Draw a conclusion in context
a. We are _____% confident that the proportion of ____________________ is between _____ and _____
c.
III)
IV)
np  10
Slide 1
___________________________________
___________________________________
Estimating With Confidence
___________________________________
___________________________________
___________________________________
Slide 2
Definitions
• Inference
– The process of arriving at some conclusion that,
though it is not logically derivable from the
assumed premises, possesses some degree of
probability relative to the premises.
• Statistical Inference
– Provides methods for drawing conclusions about a
population from sample data.
– We will have several “inference procedures” that
we will learn in the next couple of months.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 3
Questions
• What does the following statement mean to
you?
– I am 95% confident that average age of high
school teachers is between 30 and 36.
• How would the statement change if I altered
the first part to, “I am 99% confident…”
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 4
Confidence Interval
• Confidence Intervals
– This is a guess at a parameter using a statistic.
___________________________________
___________________________________
• We think that the average age is between 30 and 36.
• Margin of Error
– This is the distance up and down from our sample
mean that we are willing to go.
• This is the + or – you see when you watch elections.
• In our example above, we collected a statistic of 33 from a
sample.
___________________________________
– This is just a guess, though.
• So, in our study we had a margin of error of 3 years.
– That’s where the 30 and 36 came from.
___________________________________
___________________________________
Slide 5
Confidence Level
• A confidence level is a percentage, which is the
probability of our interval containing the true
parameter.
• We are 95% confident that the average age of
high school teachers is between 30 and 36 years.
– This means that if we were to take several samples
(where we would get different statistics: 32, 29, 35,…)
95% would contain the the true parameter.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 6
Know how to interpret a confidence
level
• We are 95% confident that the average age of high
school seniors is between 17.1 and 17.8
___________________________________
___________________________________
• What does it mean?
– If we took several samples, 95% of them would contain the
true answer of the average age of high school seniors
• What does it not mean?
– It does not mean that 95% of seniors are between 17.1
and 17.8
– It does not mean that if you took several samples 95%
would have an average between 17.1 and 17.8
___________________________________
___________________________________
___________________________________
Slide 7
Confidence Interval Conditions
• In order for us to be able to create a
confidence interval we need three things
– 1) The data come from an SRS of the population
– 2)The sampling distribution is close to normal, or
it’s large enough to use the CLT.
– 3) The individual observations are independent
___________________________________
___________________________________
___________________________________
• The sample must be less than 10% of the population
___________________________________
___________________________________
Slide 8
Confidence Interval for a Population
Mean
• Once we have checked the condition, we then
want to find our interval by using our formula
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 9
Example
 We sampled a 100 students and found that the
average SAT score was 1800. We are told the the
population standard deviation is 200.
___________________________________
___________________________________
◦ 1800 is a statistic. Is it the true average of all students?
 No. So we will take a good guess at the actual parameter.
 First establish how confident you want to be.
◦ 95% is pretty good. Using the empirical rule, that is
two standard deviation above and below the average.
 Since it’s a sample of 100, the standard deviation of our sample
is 200/(square root of 100) = 20.
___________________________________
___________________________________
___________________________________
Slide 10
Example Continued
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 11
Example Continued
• We are 95% confident that the average SAT
score is between 1760 and 1840.
– So, we think that the parameter, the actual
average, is between 1760 and 1840.
– This means that if we took a lot of samples, our
guess would be wrong 5% of the time.
• 95 is the confidence level, and (1760, 1840) is
the confidence interval.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 12
Writing Confidence Intervals
• This is how I want you to write answers to
confidence intervals…
– We are ____% confident that the average
_________________ is between _____ and ____.
___________________________________
___________________________________
___________________________________
– You fill in the blanks to make it question specific.
___________________________________
___________________________________
Slide 13
Being more specific
• Instead of using the empirical rule, we are
going to use more specific z scores.
– 90% confidence --- z = 1.645
– 95% confidence --- z = 1.960
– 99% confidence --- z = 2.576
• So we would actually be 95% confident that
the average SAT score is between 1760.8 and
1839.2
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 14
Margin Of Error
• The margin of error is the last part of the
formula
___________________________________
___________________________________
___________________________________
• We use this to establish how many people
that I need to interview
___________________________________
___________________________________
Slide 15
Example
• I am studying SAT scores and I want to be
more specific without raising my confidence
interval. So, how many people do I need to
interview in order to reduce my margin of
error to 10.
___________________________________
___________________________________
___________________________________
___________________________________
• We would have to interview 1537 people.
Slide 16
___________________________________
___________________________________
Inference for the Mean of a
Population
___________________________________
___________________________________
___________________________________
___________________________________
Slide 17
Standard Error
• Standard Error is the standard deviation of a
set of data.
___________________________________
___________________________________
– It is s, instead of σ
• The standard error, s, is our best guess at the standard
deviation of the population.
• What do we do with confidence intervals and
hypothesis tests if we don’t know σ, and we
have to use s?
___________________________________
___________________________________
___________________________________
Slide 18
T-scores and Degrees of Freedom
• A t-score is what we use instead of a z-score
when we have to use s instead of σ.
– It is similar to z-score, but it is a non normal
distribution that is shorter and wider
– The larger your sample size, the closer to normal it
becomes
• In order to use a t statistic, you have to
knowing the distribution’s degrees of freedom
– This can be found by taking n-1.
– The larger your degrees of freedom, the closer to
normal your distribution is
Slide 19
T-distribution
•
•
The smaller your sample size (df), the more area there is in the tails
The bigger the sample size (df), the distribution becomes more normal
– That’s why if you have infinite for your degrees of freedom, you just use the z
score
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 20
Confidence Interval for a Population
Mean
• Once we have checked the condition, we then
want to find our interval by using our formula
___________________________________
___________________________________
___________________________________
• Notice the difference than the formula when
you know the population standard deviation
___________________________________
___________________________________
Slide 21
Example
• We sampled 30 random students and found
that the average SAT score was 1800 and the
standard deviation of our sample was 200.
Find a 95% confidence interval.
– This is different than what we’ve done before,
because we do not know the population standard
deviation, only that from our sample.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 22
Example Continued
• State your test
– One sample t-interval
• State your Conditions
• 1) Randomness given in problems
• 2) Since, our sample is large (n=30), the distribution of
sample means will be approximately normal
• 3) Since 30 is less than 10% or the population that takes
the SAT, it is safe to say that our distribution is
independent
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 23
Example Continued
• Do the math
Notice that we used df=29, but the n=30 inside the radical
•
___________________________________
___________________________________
___________________________________
___________________________________
Draw a Conclusion
– We are 95% confident that the average SAT score is between 1725.32 and 1874.67.
___________________________________
Slide 24
___________________________________
Estimating a Population
Proportion
___________________________________
___________________________________
___________________________________
___________________________________
Slide 25
Conditions Again
• The three conditions needed in order to do a
confidence interval of a proportion are
– 1) Random—SRS
– 2) Normal
___________________________________
___________________________________
___________________________________
– 3) Independent
• Less than 10% of the population
• These are all the same as for C.I.’s for sample
means, except the normal check
Slide 26
Confidence Intervals for a Population
Proportion
• If all of the conditions are met, you can use a
confidence interval to make a guess at the
sample proportion.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
• Notice that the formula for standard deviation
is in the equation and that you will always use
a z-score, never a t-score when you do a
Slide 27
Example
• A statistician was trying to make a guess at the
number of students that pass the AP exam.
He had a 80 random students’ results and saw
that 60 had a passing score. Create a 95%
confidence interval for the proportion of
students that will pass the exam.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
• We have sample proportion of 0.75.
Slide 28
Example Continued
• Since the problem fulfills the three conditions,
we may use the formula
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
• We are 95% confident that the proportion of
students that pass the AP exam is between
___________________________________
Slide 29
Margin of Error
• As with sample means, you can also use the
margin of error formula for sample
proportions.
___________________________________
___________________________________
___________________________________
• You can use your best guess at the sample
proportion for p star, or you can use 0.5.
___________________________________
– I recommend using 0.5 because it is the most
conservative guess.
___________________________________
Slide 30
___________________________________
Example
•
How many randomly chosen college students would you need to interview to find
the proportion of college students who lived off campus their freshman year
within 3% at 95% confidence?
___________________________________
Solution
This is asking us to find out how to find this interval with a .03 margin of error
We use 0.5, because that
is the proportion that will
have the widest interval,
making it the most
conservative guess
We always round up margin of error problems. So, we need to interview 1068 college
students
___________________________________
___________________________________
___________________________________
Slide 31
___________________________________
___________________________________
What We Missed
___________________________________
___________________________________
___________________________________
Slide 32
Know how to use a confidence interval
to make a conclusion
___________________________________
• We are 95% confident that the average age of
high school seniors is between 17.1 and 17.8
___________________________________
Question
Is there evidence that the average high school
senior is 18 years old?
___________________________________
Solution
No there is not evidence that the average age is
18, because 18 is not an option in the
___________________________________
___________________________________
Slide 33
Know how to use a confidence interval
to make a conclusion
___________________________________
• We are 95% confident that the average age of
high school seniors is between 17.1 and 17.8
___________________________________
Question
Is there evidence that the average high school
senior is 17.5 years old?
___________________________________
Solution
There is not evidence against it, but we cannot
conclude that 17.5 is the answer. We just
___________________________________
___________________________________
Chapter 11/12
Know how to write a pair of hypotheses for a one sample/paired t-test, including a definition of parameters, (09, 6a;
09b, 5a; 08b, 6c; 07, 4; 06, 6a; 06b, 4; 05B, 4a; 05b, 6a; 03, 1c; M7, 5)
Know what the conditions are and how to check them for a one sample/paired t-test or confidence interval, (09b,
5a;08b, 6c; 07, 4; 06b, 4; 05b, 4a; 05b, 6a; 04, 6a; 04b, 5b)
Know how to calculate a test statistic for a one sample/paired t-test, find its degrees of freedom, and calculate its Pvalue, (09b, 5a;08b, 6c; 07, 4; 06b, 4; 05b, 4a; 05b, 6a; M2, 39)
Know how to draw a conclusion in context based on a P-value in a one sample/paired t-test, (09b, 5a;08b, 6c; 07, 4; 06b,
4; 05b, 4a; 05b, 6a; M2, 24)
Know what a P-value is how to interpret it when analyzing a large number of simulations or large number of
observations, (10, 6e; 09, 6c; 09b, 5b; 06b, 6d)
Know when to do a paired t-test as opposed to a 2 sample t-test, (08b, 6c)
Know how to write a pair of hypotheses for a one propotion z test, including how to define parameters, (06b, 6a; 05, 4;
03, 2a; M2, 2)
Know how to check the conditions for a one proportion z-test, (06b, 6b; 05, 4)
Know how to find a z-statistic and a P-value for one proportion z-test, (06b, 6f; 05, 4)
Know how to draw a conclusion based on a P-value for a one proportion z-test in context, (06b, 6f; 05, 4)
Know that a matched paired design experiment will need a paired t-test, (05b, 3a)
Know how a confidence interval relates to two sided test of the same data, and how it relates to a one sided test of the
same data, (04, 6b; M7, 27; M2, 29)
Know what a Type 1 and Type 2 error, and be able to distinguish which one has worse consequences in the context of a
problem if give a null and alternative hypothesis, (09, 5c; 08b, 4b, 03, 2b)
Know what power is and what can be done to increase it in an observational study or an experiment, (09b, 4b; M7, 32;
M2, 35)
2010 #6
2009 #6
2009B #5
2006B #6
2005B #6
2004 #6
2003 #1
2003 #2
2009B #4
One sample (paired) t-test for a sample mean—Quantitative Data
I)
Name the Test and state the formula
a.
One sample (paired) t-test
b.
II)
Write your pair of hypotheses
III) Determine whether or not conditions are met
a. Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i. If n < 10, make a boxplot and verify that it’s normal
ii. If n < 30, make a boxplot and verify that it’s close to normal
iii. If n > 30, state that the CLT allows us to say that it’s normal
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then you must verify that the sample is less than 10% of the population.
IV) Do the math
a. Plug the given numbers into the formula, state the t statistic, degrees of freedom that you are using, and your P-value. Use a calculator to verify
V) Draw a conclusion in context
a. We can/cannot reject the null hypothesis at the ____% significance level. There is/isn’t evidence to say that ____________________________
One sample z-test for a proportios—Categorical Data
I)
Name the Test and state the formula
a.
One sample z-test for a proportion
b.
II)
State your pair of hypotheses
III)
Determine whether or not conditions are met
a. Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i.
ii.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then you must verify that the sample is less than 10% of the population.
Do the math
a. Plug the given numbers into the formula, state your z statistic, and your P-value. Use a calculator to verify.
Draw a conclusion in context
a. We can/cannot reject the null hypothesis at the ____% significance level. There is/isn’t evidence to say that ____________________________
c.
IV)
V)
np  10
Slide 1
___________________________________
___________________________________
Chapter 11
Testing a Claim
___________________________________
___________________________________
___________________________________
Slide 2
___________________________________
Using Inference to Make
Decisions
___________________________________
___________________________________
___________________________________
___________________________________
Slide 3
Type 1 error (told I’m wrong when I’m
right)
 A Type 1 error occurs when the null hypothesis is
actually true, but you get a small P-value and reject the
null hypothesis.
 Example
◦ An actual fair coin is being tossed.
 If you tossed it a million times the probability of getting heads would
be 0.5.
 Mr. Merlo took this coin and tossed it 100 times and got 90 heads.
 This would give an extremely low P-value in a hypothesis test, even
though the coin is actually fair.
 Mr. Merlo thinks the coin is unfair, even though it isn’t.
 We just happened to get the craziest 100 flips ever.
◦ This is a type 1 error.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 4
Finding the probability of a Type 1
error
 Assuming that we are right, what is the
probability that we would be told that we are
wrong?
◦ It depends at what point you consider a group special
or different.
◦ It’s your level of significance.
 It’s the probability of getting that far away from the actual
average.
 If your level of significance (alpha) is 0.05 or 5%,
then the probability of a type 1 error is 0.05 or
5%.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 5
Type 2 Error(Told I’m right, when I’m
wrong)
 A type 2 error occurs when the null hypothesis is
actually wrong, but you get a big P-value and do not
reject the null hypothesis.
 Example
◦ An unfair coin is being tossed. In fact, for this particular
coin, the probability of getting heads is 0.75.
 Mr. Merlo tossed this unfair coin 100 times and got 50 heads.
 Remember that he has to assume that the coin is fair.
___________________________________
___________________________________
___________________________________
 This give a very high P-value in a hypothesis test, even though the
coin is actually not fair.
 Mr. Merlo thinks the coin is fair, even though it isn’t.
 We just happened to get a crazy 100 flips, because we should have
gotten a number near 75.
◦ This is a type 2 error
___________________________________
___________________________________
Slide 6
Power (told I’m wrong when I’m
wrong)
• This isn’t an error, because this is what you
want to happen.
• Finding the probability
– This is just the complement of a type 2 error.
– It’s the opposite of being told I’m right when I’m
wrong (type 2 error)
• This tells you how good your test is if your
alternate hypothesis is true.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 7
How to Increase Power
1) Have a larger significance level (go from 1%
to 5%)
2) Increase the sample size
3) Decrease the standard deviation
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 8
Know how a confidence interval
relates to a two sided test of the same
data
___________________________________
• 95% C.I. relates to a 5% significance level
• 99% C.I. relates to a 1% significance level
• 96% C.I. relates to a 4% significance level
___________________________________
• If you reject at the 5% level, then your confidence
interval would not contain the parameter from
the null hypothesis
___________________________________
– Think of doing a test and a confidence interval for the
same set of data, you would get the same result
– Let’s look at P. 710-711
___________________________________
___________________________________
Slide 1
___________________________________
___________________________________
Chapter 12
Significance Tests
___________________________________
___________________________________
___________________________________
Slide 2
___________________________________
___________________________________
Using t-scores
___________________________________
___________________________________
___________________________________
Slide 3
Standard Error
• Standard Error is the standard deviation of a
set of data.
___________________________________
___________________________________
– It is s, instead of σ
• The standard error, s, is our best guess at the standard
deviation of the population.
• What do we do with confidence intervals and
hypothesis tests if we don’t know σ, and we
have to use s?
___________________________________
___________________________________
___________________________________
Slide 4
T-scores and Degrees of Freedom
• A t-score is what we use instead of a z-score
when we have to use s instead of σ.
• In order to use a t statistic, you have to
knowing the distribution’s degrees of freedom
– This can be found by taking n-1.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 5
Using T-scores
We can find t-scores using the same formula we do for
z-scores.

 We don’t have a chart for t statistics like we do
for z statistics, because we’d have to have a
chart for every degree of freedom.
 So, we pick a range of P-values instead of an
exact P-value.
___________________________________
___________________________________
___________________________________
 Example
 We have a sample size of 12 and a t statistic of 1.78.
 This gives 11 degrees of freedom, and the area above t=1.78
gives a P-value between 0.05 and 0.10 using table C in our
book.
___________________________________
___________________________________
Slide 6
___________________________________
___________________________________
Basics of a hypothesis Test
___________________________________
___________________________________
___________________________________
Slide 7
What is a Hypothesis Test?
• A Hypotheses test is an inference procedure
where we take data collected from a sample and
determine if it is extreme enough to say that
something is not true
• Example
– The national AP test score average is 2.87 with a
standard deviation of 0.9
– 100 randomly chosen students from California
averaged a 3.12
– Is there statistical evidence that Californians do better
than the nation on the AP test?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 8
Steps of a Hypothesis Test
1)
2)
3)
4)
5)
Name the test and state the formula
State your hypotheses
Check the conditions
Do the math
Draw a conclusion in context
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 9
Name the test and state the formula
• Since we collected our data from one sample,
California AP test takers, we are going to run a
one sample t-test for a sample mean
– It is a t test, because we do not know the
population standard deviation
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 10
___________________________________
State your hypotheses
• You always state a null hypothesis (
which is what you assume is true
),
– This is the assumption that your sample is not
different than the usual population
• You always state an alternative
hypothesis (
), which is what you
think might be true.
µ=the average AP Stats
score for all California
students
___________________________________
___________________________________
___________________________________
___________________________________
Slide 11
Check Conditions
1) You sample must be random
2) Your distribution must be approximately
normal
a.
b.
If your sample is larger than 30 you must use the CLT to check
approximate normality
If you sample is smaller than 30 it must all ready be known to be
normal, or you must graph the data to see if it appears normal
___________________________________
___________________________________
___________________________________
3) You distribution must be independence
a. This is usually checked by having less than 10%
of the population
___________________________________
___________________________________
Slide 12
Check Conditions Continued
1) Students were chosen by random
2) Since our sample is large (n =100) the
distribution will be approximately normal by
the central limit theorem
3) Since there were more than 1000 students
that took the AP Statistics exam in California,
the students are considered independent
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 13
Do the math
• Find a standardized statistic
___________________________________
___________________________________
___________________________________
• State your degrees of freedom
– D=80 since n = 100
• Find your P-value
___________________________________
– Between .0025 and .005
___________________________________
Slide 14
Draw A Conclusion
• You either reject or you cannot reject
– You cannot accept, because there is always a
chance that you got bad data
– You just say, there is evidence that… or there is
not evidence that…
• Since our P-value is smaller than .05, we can
reject at the 5% significance level. There is
evidence that California students did better
than the national average.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 15
Calculator
• Our calculator can do this
– 1 sample t test
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 16
What is a P-value
• A P-value is a probability
– The probability of getting the data that you got (or
more extreme) assuming that the null hypothesis
is true
• Our P-value
– If we assume that the Ca average is the same as
the national average (2.87), then there is a .33%
chance that a group of 100 students would
randomly score 3.12 or more.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 17
___________________________________
___________________________________
One Proportion z-test
___________________________________
___________________________________
___________________________________
Slide 18
The only difference
• The first difference is that you never use a t
statistic for a proportion. You always use a zstatistic
• The second difference is checking normality.
You cannot use CLT with
proportions
• You must check by
___________________________________
___________________________________
___________________________________
___________________________________
– np>10
– n(1-p)>10
___________________________________
Slide 19
Example
• Mr. Merlo’s lucky quarter was flipped 100
times and 60 times it came up heads. Is there
evidence that the coin is unfair.
• State your test
___________________________________
___________________________________
___________________________________
– One proportion z test
___________________________________
___________________________________
Slide 20
Example Continued
___________________________________
• Write your hypotheses
___________________________________
• Check your conditions
___________________________________
– 1) We will assume that the coin flips are random
– 2) 100(0.6)>10 and 100(0.4)>10
– 3) It’s safe to say that all coin flips are
independent
___________________________________
___________________________________
Slide 21
Example continued
• The z score is
___________________________________
___________________________________
___________________________________
P = 0.0456
___________________________________
We cannot reject the null hypothesis at the 1%
significance level. There is not evidence that
Mr. Merlo’s coin is unfair.
___________________________________
Chapter 13
How do I create a set of hypotheses for a two sample t-test? (10, 5; 08, 6a; 07b, 5)
What are the conditions for a two sample t-test/interval and how do I check them? (10, 5;09, 4a;08, 6a; 08b, 1b; 07b, 5;
06, 4; 05, 6a; 04b, 4a; 04b, 5c)—782
Know how to check normality if you have less than 30 pieces of data and if you have more than 30 pieces of data, (09,
4a; 08, 6a; 07b, 5; 06, 4; 05, 6a)
How do I calculate the t statistic for a two sample t-test? (10, 5; 08, 6a; 07b, 5)—788
Know how to write a conclusion in context for a two sample t test, (10, 5; 08, 6a; 07b, 5; M7, 13; M7, 37)
Know how to calculate a confidence interval comparing two sample means, (09, 4a; 06, 4; 05, 6a; 04b, 4a)—788
Know how to write a conclusion in context for a two sample/proportion t/z interval (09, 4a; 09, 5b; 06, 4; 06b, 2a; 05, 6a;
04b, 4a)
Know how to use a confidence interval to determine if there is evidence that there is a difference between two
populations, (09, 4b; 07, 1c; 06, 4; 06b, 2b, 05b, 4b)
Know how to interpret a P-value in the context of a two sample/proportion problem. How is this different that just
drawing a conclusion?, (09, 5a; 07, 5c; 07b, 6a)
Know how to write a pair of hypotheses for a two prop z test, (09b, 3b; 07, 5b; 07b, 6a; 04b, 6a; 03b, 3b)
Know how to check the conditions for a 2 proportion z test/interval, (09b, 3a; 07, 5c; 07b, 6a; 06b, 2a; 04b, 6a; M7, 39;
M2, 22; M2, 40)
Know how to find a test statistic for a 2 prop z test, and how to look up its corresponding p-value, (09b, 3b; 07b, 6a; 04b,
6a)
Know how to draw a conclusion in context for a 2 prop z test, (09b, 3b; 07, 5c;07b, 6a; 04b, 6a)
Know how to calculate a confidence interval for a z confidence interval for a difference of two proportions (2 prop z
interval), (09b, 6b; 06b, 2a; M7, 4)
Know how to distinguish a two sample t-test from a paired t-test, (07, 4; 06b, 4; 05b, 4a; M2, 12)
2010 #5
2009 #4
2009 #5
2009B #3
2008B #1
2007 #1
2007 #4
2007 #5
2006B #2
2006B #4
2005 #6
2005B #3
2005B #4
2004B #4
2004B #5
2003B #3
2003B #4
Two Sample t-confidence interval for a difference in sample means—Quantitative Data
I)
Name the Test and state the formula
a.
Two-sample t-confidence interval for a difference in sample means
b.
II)
III)
IV)
Determine whether or not conditions are met
a.
Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal for both of the sample (you must check twice)
i. a. If n < 10, make a boxplot and verify that it’s normal
ii. If n < 30, make a boxplot and verify that it’s close to normal
iii. If n > 30, state that the CLT allows us to say that it’s normal
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then…
i. If there are two distinct populations this can be done by verifying that there is each less than 10% of their respective populations
ii. If the two groups come from the same population, this can be done by verifying that the individuals were placed in their respective
groups at random
Do the math
a.
Plug the given numbers into the formula, state the degrees of freedom that you are using, and use a calculator to verify
Draw a conclusion in context
a.
We are _____% confident that the avg. difference between ____ and _____ is between _____ and _____
Two sample z-confidence interval for a difference of proportions—Categorical Data
I)
Name the Test and state the formula
a.
Two sample z-interval for a difference of proportions
b.
II)
III)
IV)
Determine whether or not conditions are met
a.
Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i.
and
ii.
and
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then…
i. If there are two distinct populations this can be done by verifying that there is each less than 10% of their respective populations
ii. If the two groups come from the same population, this can be done by verifying that the individuals were placed in their respective
groups at random
Do the math
a.
Plug the given numbers into the formula, and use a calculator to verify
Draw a conclusion in context
a.
We are _____% confident that the difference in the proportion of ________ and _______ is between _____ and _____
Two Sample t-test for a difference of sample means—Quantitative Data
I)
Name the Test and state the formula
a.
Two-sample t-test
b.
II)
State your pair of hypotheses
III)
Determine whether or not conditions are met
a.
Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal for both of the sample (you must check twice)
i. If n < 10, make a boxplot and verify that it’s normal
ii. If n < 30, make a boxplot and verify that it’s close to normal
iii. If n > 30, state that the CLT allows us to say that it’s normal
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then…
i. If there are two distinct populations this can be done by verifying that there is each less than 10% of their respective populations
ii. If the two groups come from the same population, this can be done by verifying that the individuals were placed in their respective
groups at random
Do the math
a.
Plug the given numbers into the formula, state your t-statistic, the degrees of freedom that you are using, and your P-value. Use a calculator to
verify
Draw a conclusion in context
a.
We can/cannot reject the null hypothesis at the _____% significance level. There is/isn’t evidence to say that_______________________
IV)
V)
Two sample z-test for a difference of proportions—Categorical Data
I)
Name the Test and state the formula
a.
Two sample z-test for a difference of proportions
a.
II)
Write your pair of hypotheses
III)
Determine whether or not conditions are met
a. Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i.
IV)
V)
and
ii.
and
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then…
i. If there are two distinct populations this can be done by verifying that there is each less than 10% of their respective populations
ii. If the two groups come from the same population, this can be done by verifying that the individuals were placed in their respective
groups at random
Do the math
a. Plug the given numbers into the formula, state your z statistic, and your P-value. Use a calculator to verify.
Draw a conclusion in context
a. We can/cannot reject the null hypothesis at the ____% significance level. There is/isn’t evidence to say that ____________________________
Slide 1
___________________________________
___________________________________
Chapter 13
Comparing Two Population
Parameters
___________________________________
___________________________________
___________________________________
Slide 2
___________________________________
___________________________________
Comparing Two Means
___________________________________
___________________________________
___________________________________
Slide 3
What does it mean to compare two
means?
• In the previous section we were comparing our
results to an expected outcome.
– For example, we might know that a machine that fills
bottles with soda should put an average of 300 mL in
each bottle.
• We tested to see if it was underfilling.
– Null:
– Alternative:
Average = 300
Average < 300
• In this section we will test to see how two
different populations compare to each other.
– Do women study more than men?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 4
Here’s the Data
• We collected a random sample of 35 men and 30
women and recorded their study habits.
• We found that men averaged 4 hours of studying
per week with a sample standard deviation of
1.5.
x
– =4, s=1.5
• We found that women average 5 hours of
studying per week with a sample standard
deviation
of 1.
x
– =5, s=1.0
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 5
State your test
• We are doing a two sample t test, comparing
sample means
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 6
Write your hypotheses
• In this case, we have to averages that we are
testing.

–
___________________________________
M
= The average amount of hours per week that
men
study.
W
–
___________________________________
= The average amount of hours per week that
women study.
___________________________________
• So, the test is this…
___________________________________
___________________________________
Slide 7
Check the Conditions
• Random
– It was given that both samples were chosen at
random
___________________________________
___________________________________
• Normality
– There is a large sample of men (n=35). So, by CLT our
distribution should be approximately normal
– There is a large sample of women (n=30). So, by CLT
our distribution is approximately normal
___________________________________
• Independence
– There are more than 350 men who study
– There are more than 300 women who study
___________________________________
___________________________________
Slide 8
Do the math
• Find your t statistic
___________________________________
___________________________________
=-3.20
___________________________________
• State your degrees of freedom (d=59.6104)
• P = 0.0011
___________________________________
___________________________________
Slide 9
Draw a conclusion in context
• Since we have a small P-value, we can reject
the null hypothesis in favor of the alternative
at the 1% significance level. There is evidence
that women study more than men.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 10
What’s up with two standard
deviations?
 We have a problem with standard deviation,
though….There are two of them.
 Since we are going to look at difference of averages,
we need to look at difference of standard deviations.
 Remember, though, we can’t do that, we have to add
their variances.
2
2
s1 formula…
s
 So, we get this
 2
n1 n2
2
2
1.5the following…
1
 This gives us

 0.3512
25
30
___________________________________
___________________________________
___________________________________
___________________________________

___________________________________
Slide 11
Find our t-statistic
 In this case
t
45
 2.847
.3511884584
___________________________________
___________________________________
 How many degrees of freedom?
 Since there are two samples we will use the df of
the smaller sample, because that will be a more
conservative guess.
 So, we will say that df = 24.
 So, our P-value is between .0025 and .005.
 This is definitely enough evidence to reject
the null hypothesis, and we can say pretty
securely that women study more than men!
___________________________________
___________________________________
___________________________________
Slide 12
Our Calculator Can Do This
• This is a 2-SampTTest
• Notice that we get the same t-statistic, but
that they use df=40.
– They get that number from the formula on page
633.
– We aren’t going to worry about that. We’ll let the
calculator do it.
• Our P-Value was a pretty good guess, though,
even with the conservative degrees of
freedom.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 13
Two sample t-interval
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 14
Two proportions
• Two proportion z-interval
• Two proportion z-test
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Chapter 14
Know when to use a chi-squared test of independence and how to find the degrees of freedom (r-1 X c-1), (10b, 5d)
Know how to write a null and alternative hypothesis for a chi-square test of independence (09, 1c; 04, 5a)
Know how to write a null and alternative hypothesis for a goodness of fit test, (08, 4a; 03b, 5c)
Know how to find expected counts for a GOF test if each possibility has a different proportion, (08, 4a; 03b, 5c; M2, 19)
Know how to find a chi squared statistic, the degrees of freedom, and the P-value for a GOF test, (08, 4a; 06, 6c; 03b, 5c;
M7, 17; M2, 19)
Know how to draw a conclusion in context for a GOF test, (08, 4a; 03b, 5c)
What does an individual chi squared value represent? Does it tell you if the expected was higher or lower? (08, 4b)
Know how to interpret a P-value for a GOF test in the context of the problem, (06, 6f)
Know how to check the conditions for a chi-squared test of independence, (04, 5a, 03, 5)
Know how to find a chi squared statistic, the degrees of freedom, and the P-value for a chi squared test of
independence, (04, 5a; 03, 5)
Know how to interpret a P-value for a chi squared test of independence, (04, 5a; 03, 5)
Know how to write a pair of hypotheses for a chi squared test of independence (03, 5)
Know how to find the expected outcomes for a chi squared test of independence, (M2, 11)
2010B #5
2009 #1
2008 #5
2004 #5
2003 #5
2003B #5
Chi Squared Goodness of Fit Test—Categorical Data
I)
Name the Test and state the formula
a.
Chis Squared Goodnes of Fit Test
b.
II)
Write your pair of hypotheses
III)
Determine whether or not conditions are met
a. Determine whether the n samples were taken at random
b. Determine whether the sample is large enough
i. At most 20% of the expected outcomes are less than 5
ii. All the expected outcomes are more than 1
Do the math
IV)
a.
b.
V)
Find your expected outcomes
Plug the given numbers into the formula and find your chi squared value, state your degrees of freedom, and state your P-value. Use a calculator
to verify
Draw a conclusion in context
a. We can/cannot reject the null hypothesis at the ____% significance level. There is/isn’t evidence to say that ____________________________
Chi Squared Test of Independence—Categorical Data (Two way table)
I)
Name the Test and state the formula
a.
Chi Squared test of independence
b.
II)
Write your pair of hypotheses
III)
Determine whether or not conditions are met
a. Determine whether the n samples were taken at random
b. Determine whether the sample is large enough
i. At most 20% of the expected outcomes are less than 5
ii. All the expected outcomes are more than 1
Do the math
a. Find your expected outcomes ( )
b. Plug the given numbers into the formula and find your chi squared value, state your degrees of freedom, (r-1)(c-1), and state your P-value. Use a
calculator to verify
Draw a conclusion in context
a. We can/cannot reject the null hypothesis at the ____% significance level. There is/isn’t evidence to say that ____________________________
IV)
V)
Slide 1
___________________________________
___________________________________
Chapter 14
Distributions of Categorical Variables:
Chi-Square Procedures
___________________________________
___________________________________
___________________________________
Slide 2
___________________________________
___________________________________
What is a Chi-Square Distribution
___________________________________
___________________________________
___________________________________
Slide 3
Chi Square Distributions
• Chi Square Distributions are a family of
distributions that are skewed right, always
positive, and specified by degrees of freedom
• I think of it as a special type of t or z statistic
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 4
Chi Square Distributions
• We treat them similar to t distributions
• If we have df = 4 and
, we get a P-value
between .005 and.010
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 5
___________________________________
___________________________________
Chi Square Goodness of Fit Test
___________________________________
___________________________________
___________________________________
Slide 6
When to do a GOF
• You do a goodness of fit when you have a
categorical variable with more than 2 options
• You are trying to see if the distribution is not
what you expected
• This usually takes place when you see a one
way table
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 7
Example
• Historically, the distribution of CV soccer
games is as follows: 50% win, 30% loss, and
20% tie
• This last season, they had the following
distribution
Win
Loss
Tie
16
5
4
• Is there statistical evidence that this year is
different?
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 8
How to do a chi square test
1)
2)
3)
4)
5)
Name your test and state the formula
Write your hypotheses
Check the conditions
Do the math
Make a conclusion
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 9
Name your test
• We are doing a chi square Goodness of Fit Test
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 10
Write your hypotheses
The distribution of wins, losses, and ties is the
same as it has been historically. (The
distribution of wins, losses, and ties for this
year’s team is 50%/30%/20%)
The distribution of wins, losses, and ties is not
the same as it has been historically
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 11
Check Conditions
• For chi square tests, there are only two conditions:
Random and all expected values are greater than 1
(and less than 20% are smaller than 5)
• For this problem, we will assume the games
are random
• The expected outcomes are all 5 or larger
Wins
Losses
Ties
12.5
7.5
5
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 12
Do the math
___________________________________
• Find your test statistic
___________________________________
• We have two degrees of freedom
• So, our P-value between 0.05 and 0.10
___________________________________
___________________________________
___________________________________
Slide 13
Draw a Conclusion
• Since we have a P-value larger than 5% we
cannot reject the null hypothesis at the 1% or
5% level. There is not evidence that this team
has a different distribution than any historical
team
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 14
___________________________________
___________________________________
Chi Square Test of Independence
___________________________________
___________________________________
___________________________________
Slide 15
When to do a Chi Square Test of
Independence
• You use this test when you have a two way
table and you want to check if the two
variables in the table are independent
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 16
___________________________________
Example
• Based on the following distribution is there
evidence that gender is independent of grade
in Mr. Merlo’s class
A
B
C
D
F
Male
20
15
30
10
10
Female
10
13
20
8
8
___________________________________
___________________________________
___________________________________
___________________________________
Slide 17
___________________________________
Name your test
• We are doing a chi square test of
independence
___________________________________
___________________________________
___________________________________
___________________________________
Slide 18
Write your hypotheses
Gender and the grade one receives in Mr.
Merlo’s class are independent
___________________________________
___________________________________
___________________________________
Gender and the grade one receives in Mr.
Merlo’s class are not independent
___________________________________
___________________________________
Slide 19
___________________________________
Check Conditions
• For chi square tests, there are only two conditions:
Random and all expected values are greater than 1
(and less than 20% are smaller than 5)
• For this problem, we will assume the games
are random
• The expected outcomes are all 5 or larger
A
B
C
D
F
Male
17.71
16.53
29.51
10.63
10.63
Female
12.29
11.47
20.49
7.38
7.38
___________________________________
___________________________________
___________________________________
___________________________________
Slide 20
How to get the expected values
• If you do not use your calculator, the best way
to find the expected values is to use the
formula
• To find the expected number of men who
should get C’s:
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 21
Do the math
• Our calculator gives us
• The degree of freedom can be found by the
following formula: df = (r-1)(c-1)
– Our problem has df = (2-1)(5-1) = 4
• So, the P-value is .8669
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 22
Draw a Conclusion
• We cannot reject the null hypothesis in favor
of the alternative. There is no evidence that
gender and grade in Mr. Merlo’s class are not
independent.
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 23
Know how to interpret a P value in
context
• If gender and grade in Mr. Merlo’s class were
independent, there would be an 86.69%
chance that the distribution that actually
occurred or one more extreme would occur
randomly
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Chapter 15
Know what the t score and P-value on the Minitab printout are the results of, (08, 6c)
Know how to write a null and alternative hypothesis for testing slope and when this is an appropriate test, (07, 6c; M7,
28)
Know how to find a t-statistic, degrees of freedom (n-2), and P-value for a test on slope, (07, 6c)
Know how to draw a conclusion from a p-value for a test on slope, (07, 6c)
Know how to calculate a confidence interval for a slope when given a Minitab printout, (07b, 6b; 05b, 5c; M2, 21)
Know what it means if 0 is a possibility in a confidence interval or if you fail to reject the test againt B=0, (07b, 6c)
Know that the SE in a Minitab printout is the standard error of the slope, and know how to interpret that in the context
of a problem, (06, 2c)
2008 #6 (If you do Chapter 13 before)
2007 #6
2007B #6
2005B #5
Slide 1
___________________________________
___________________________________
Chapter 15
Inference for Regression
___________________________________
___________________________________
___________________________________
Slide 2
Confidence Intervals for slope
• We will do a 95% confidence interval for the
slope of the regression line of cost to lay tile in
a house on the square inches need to be
covered (x=square inches, y= cost). We will
say that there are 11 observations
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Slide 3
Confidence Intervals Continued
C.I. = your statistic
(t statistic) X (standard deviation)
___________________________________
___________________________________
Since there are n=11, we will use df=9 because there are two
variables
We are 95% confidence that the slope of the LSRL of cost on square
footage of tile is between 0.4746 and .7931 dollars per square inch
Since 0 is not in the interval, it is safe to say that there is a
relationship between the two variables
___________________________________
___________________________________
___________________________________
Slide 4
Hypothesis Test
• Most of the time, we will run the following hypothesis
test
• This is what the Minitab printout gives us
• Which gives us a P-value= 0.000
• We can reject the null hypothesis in favor of the
alternative. There is evidence that there is a
relationship between square footage and cost
___________________________________
___________________________________
___________________________________
___________________________________
___________________________________
Confidence Intervals
Every confidence interval can be done using this formula:
C.I. = your statistic
(z or t score) (standard deviation)
All problems will be done in the following format:
I)
II)
III)
IV)
State what confidence interval you are doing
Check the conditions to make sure it is appropriate
Do the math
State a conclusion in context
One sample (paired) z-confidence interval—Quantitative Data
I)
Name the Test and state the formula
a.
One sample (Matched pairs) z-confidence interval
b.
II) Determine whether or not conditions are met
a.
Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i. If n < 10, make a boxplot and verify that it’s normal
ii. If n < 30, make a boxplot and verify that it’s close to normal
iii. If n > 30, state that the CLT allows us to say that it’s approximately normal
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then you must verify that the sample is less than 10% of the population.
III) Do the math
a. Plug the given numbers into the formula, and use a calculator to verify
IV) Draw a conclusion in context
b. We are _____% confident that the avg. ____________________ is between _____ and _____
One sample (paired) t-confidence interval—Quantitative Data
V) Name the Test and state the formula
a.
One sample (paired) t-confidence interval
c.
VI) Determine whether or not conditions are met
a.
Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i. If n < 10, make a boxplot and verify that it’s normal
ii. If n < 30, make a boxplot and verify that it’s close to normal
iii. If n > 30, state that the CLT allows us to say that it’s normal
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then you must verify that the sample is less than 10% of the population.
VII) Do the math
a.
Plug the given numbers into the formula, state the degrees of
freedom that you are using, and use a calculator to verify
VIII)Draw a conclusion in context
a.
We are _____% confident that the avg. ____________________ is between _____ and _____
Two Sample t-confidence interval for a difference in sample means—Quantitative Data
V)
Name the Test and state the formula
a.
Two-sample t-confidence interval for a difference in sample means
b.
VI)
Determine whether or not conditions are met
a.
Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal for both of the sample (you must check twice)
i. a. If n < 10, make a boxplot and verify that it’s normal
ii. If n < 30, make a boxplot and verify that it’s close to normal
iii. If n > 30, state that the CLT allows us to say that it’s normal
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then…
i. If there are two distinct populations this can be done by verifying that there is each less than 10% of their respective populations
ii. If the two groups come from the same population, this can be done by verifying that the individuals were placed in their respective
groups at random
VII)
VIII)
Do the math
a.
Plug the given numbers into the formula, state the degrees of freedom that you are using, and use a calculator to verify
Draw a conclusion in context
b. We are _____% confident that the avg. difference between ____ and _____ is between _____ and _____
One sample z-confidence interval for proportions—Categorical Data
V)
Name the Test and state the formula
a. One sample z-confidence interval for proportions
b.
VI)
Determine whether or not conditions are met
a. Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i.
np  10
ii.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then you must verify that the sample is less than 10% of the population.
Do the math
a. Plug the given numbers into the formula, and use a calculator to verify
Draw a conclusion in context
a. We are _____% confident that the proportion of ____________________ is between _____ and _____
c.
VII)
VIII)
Two sample z-confidence interval for a difference of proportions—Categorical Data
V)
Name the Test and state the formula
a.
Two sample z-interval for a difference of proportions
b.
VI)
VII)
VIII)
Determine whether or not conditions are met
a.
Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i.
and
ii.
and
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then…
i. If there are two distinct populations this can be done by verifying that there is each less than 10% of their respective populations
ii. If the two groups come from the same population, this can be done by verifying that the individuals were placed in their respective
groups at random
Do the math
a.
Plug the given numbers into the formula, and use a calculator to verify
Draw a conclusion in context
a.
We are _____% confident that the difference in the proportion of ________ and _______ is between _____ and _____
Hypothesis Tests
Every test statistic can be calculated using this formula:
All problems will be done in the following format:
I)
II)
III)
IV)
V)
State what confidence test you are doing
State your null and alternative hypotheses and define parameters
Check the conditions to make sure it is appropriate
Do the math
State a conclusion in context
One sample (paired) z-test—Quantitative Data
II) Name the Test and state the formula
a. One sample z-test for a sample mean
b.
III) Write your pair of hypotheses
IV) Determine whether or not conditions are met
a.
Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i. If n < 10, make a boxplot and verify that it’s normal
ii. If n < 30, make a boxplot and verify that it’s close to normal
iii. If n > 30, state that the CLT allows us to say that it’s approximately normal
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then you must verify that the sample is less than 10% of the population.
V) Do the math
a. Plug the given numbers into the formula, state your z score, and your P-value. Use a calculator to verify
VI) Draw a conclusion in context
a. We can/cannot reject the null hypothesis at the ____% significance level. There is/isn’t evidence to say that ____________________________
One sample (paired) t-test for a sample mean—Quantitative Data
VI) Name the Test and state the formula
a.
One sample (paired) t-test
b.
VII) Write your pair of hypotheses
VIII)Determine whether or not conditions are met
a. Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i. If n < 10, make a boxplot and verify that it’s normal
ii. If n < 30, make a boxplot and verify that it’s close to normal
iii. If n > 30, state that the CLT allows us to say that it’s normal
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then you must verify that the sample is less than 10% of the population.
IX) Do the math
a. Plug the given numbers into the formula, state the t statistic, degrees of freedom that you are using, and your P-value. Use a calculator to verify
X) Draw a conclusion in context
a. We can/cannot reject the null hypothesis at the ____% significance level. There is/isn’t evidence to say that ____________________________
Two Sample t-test—Quantitative Data
VI)
Name the Test and state the formula
a.
Two-sample t-test
b.
VII)
State your pair of hypotheses
VIII)
Determine whether or not conditions are met
a.
Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal for both of the sample (you must check twice)
i. If n < 10, make a boxplot and verify that it’s normal
ii. If n < 30, make a boxplot and verify that it’s close to normal
iii. If n > 30, state that the CLT allows us to say that it’s normal
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then…
i. If there are two distinct populations this can be done by verifying that there is each less than 10% of their respective populations
ii. If the two groups come from the same population, this can be done by verifying that the individuals were placed in their respective
groups at random
Do the math
a.
Plug the given numbers into the formula, state your t-statistic, the degrees of freedom that you are using, and your P-value. Use a calculator to
verify
Draw a conclusion in context
a.
We can/cannot reject the null hypothesis at the _____% significance level. There is/isn’t evidence to say that_______________________
IX)
X)
One sample z-test for a proportios—Categorical Data
VI)
Name the Test and state the formula
a.
One sample z-test for a proportion
b.
VII)
State your pair of hypotheses
VIII)
Determine whether or not conditions are met
a. Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i.
IX)
X)
np  10
ii.
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then you must verify that the sample is less than 10% of the population.
Do the math
a. Plug the given numbers into the formula, state your z statistic, and your P-value. Use a calculator to verify.
Draw a conclusion in context
a. We can/cannot reject the null hypothesis at the ____% significance level. There is/isn’t evidence to say that ____________________________
Two sample z-test for a difference of proportions—Categorical Data
VI)
Name the Test and state the formula
a.
Two sample z-test for a difference of proportions
b.
VII)
Write your pair of hypotheses
VIII)
Determine whether or not conditions are met
a. Determine whether the n samples were taken at random
b. Determine whether or not the distribution is normal
i.
IX)
X)
and
ii.
and
c.
Determine whether the n observations were taken independently. If sampling with replacement this can be deduced. If sampling without
replacement, then…
i. If there are two distinct populations this can be done by verifying that there is each less than 10% of their respective populations
ii. If the two groups come from the same population, this can be done by verifying that the individuals were placed in their respective
groups at random
Do the math
a. Plug the given numbers into the formula, state your z statistic, and your P-value. Use a calculator to verify.
Draw a conclusion in context
a. We can/cannot reject the null hypothesis at the ____% significance level. There is/isn’t evidence to say that ____________________________
Chi-Squared Test
Every test statistic can be calculated using this formula:
All problems will be done in the following format:
I)
II)
III)
IV)
V)
State what test you are doing
State your null and alternative hypotheses
Check the conditions to make sure it is appropriate
Do the math
State a conclusion in context
Chi Squared Goodness of Fit Test—Categorical Data
VI)
Name the Test and state the formula
a.
Chis Squared Goodnes of Fit Test
b.
VII)
Write your pair of hypotheses
VIII)
Determine whether or not conditions are met
a. Determine whether the n samples were taken at random
b. Determine whether the sample is large enough
i. At most 20% of the expected outcomes are less than 5
ii. All the expected outcomes are more than 1
Do the math
a. Find your expected outcomes
b. Plug the given numbers into the formula and find your chi squared value, state your degrees of freedom, and state your P-value. Use a calculator
to verify
Draw a conclusion in context
a. We can/cannot reject the null hypothesis at the ____% significance level. There is/isn’t evidence to say that ____________________________
IX)
X)
Chi Squared Test of Independence—Categorical Data (Two way table)
VI)
Name the Test and state the formula
a.
Chi Squared test of independence
b.
VII)
Write your pair of hypotheses
VIII)
IX)
X)
Determine whether or not conditions are met
a. Determine whether the n samples were taken at random
b. Determine whether the sample is large enough
i. At most 20% of the expected outcomes are less than 5
ii. All the expected outcomes are more than 1
Do the math
a. Find your expected outcomes ( )
b. Plug the given numbers into the formula and find your chi squared value, state your degrees of freedom, (r-1)(c-1), and state your P-value. Use a
calculator to verify
Draw a conclusion in context
a. We can/cannot reject the null hypothesis at the ____% significance level. There is/isn’t evidence to say that ____________________________