Week 6 - School of Mathematical Sciences

STATS 1000 / STATS 1004 / STATS 1504
Statistical Practice 1
Lecture notes
Week 6
Jonathan Tuke
School of Mathematical Sciences, University of Adelaide
Semester 1, 2015
[W6-1]
Sampling distributions
[W6-2]
Parameters and Statistics
A parameter is a number that is calculated from the population.
In statistical practice, the value of a parameter is not known
because we cannot examine the entire population.
A statistic is a number that is calculated from a sample. The
value of a statistic can be computed directly from the sample data.
We often use a statistic to estimate an unknown parameter.
[W6-3]
Population and Sample
x
¯, s, pˆ
Q
µ, , p
Q
Q
Q
Q
[W6-4]
Statistical Estimation
The process of statistical inference involves using information
from a sample to draw conclusions about a wider population.
Different random samples yield different statistics. We need to be
able to describe the sampling distribution of the possible values
of a statistic in order to perform statistical inference.
[W6-5]
Sampling Variability
Different random samples yield different statistics. This basic fact
is called sampling variability: The value of a statistic varies in
repeated random sampling.
To make sense of sampling variability, we ask, “What would
happen if we took many samples?”
[W6-6]
Sampling Distributions
If we took every one of the possible samples of a certain size,
calculated the sample mean for each, and graphed all of those
values, we’d have a sampling distribution.
The population distribution of a variable is the distribution of
values of the variable among all individuals in the
population.
The sampling distribution of a statistic is the distribution of
values taken by the statistic in all possible samples of the same size
from the same population.
[W6-7]
Example
Weights of Cats
3.72
4.16
4.06
4.68
3.36
4.16
4.13
4.20
4.23
5.08
4.54
4.02
4.19
3.93
3.64
4.46
3.92
4.92
4.07
4.33
4.05
4.54
4.72
3.85
4.04
5.24
5.04
4.17
3.37
4.67
4.40
4.11
4.25
3.28
3.93
4.09
4.43
4.25
3.86
3.45
3.47
4.49
3.96
3.85
3.63
4.11
3.10
4.02
3.96
3.57
4.23
4.17
3.57
4.59
3.90
4.41
4.39
4.02
4.05
4.06
4.14
3.56
3.30
4.08
4.11
5.27
3.51
3.55
4.76
3.36
3.16
4.07
4.04
3.05
4.08
5.32
4.07
4.77
4.33
4.16
2.84
4.52
3.63
2.85
3.65
4.40
4.26
4.44
3.88
2.37
3.64
4.35
4.47
4.53
4.39
3.49
4.27
5.11
4.61
3.78
Population mean: µ = 4kg
[W6-8]
Example
Weights of Cats
0.8
density
0.6
0.4
0.2
0.0
2
3
4
weight
5
6
[W6-9]
Example
Sample 10 cats
Weights
3.26, 3.32, 3.51, 4.56, 3.86, 3.47, 4.05, 3.24, 3.59, 3.84
Sample mean
x¯ = 3.67kg
[W6-10]
Example
1000 samples each of 10 cats
count
100
50
0
3.3
3.6
3.9
means
4.2
4.5
[W6-11]
Example
1000 samples each of 10 cats
count
100
50
0
2
3
4
means
5
6
[W6-12]
Law of Large Numbers
Draw independent observations at random from any population
with finite mean µ. Decide how accurately you would like to
estimate µ. As the number of observations drawn increases, the
mean x¯ of the observed values eventually approaches the mean µ
of the population as closely as you specified and then stays that
close.
[W6-13]
x
¯, s, pˆ
Q
µ, , p
Q
Q
Q
Q
[W6-14]
Law of Large Numbers
Cat weight example
[W6-15]
Mean and Standard Deviation of a Sample Mean
Mean of a sampling distribution of a sample mean
There is no tendency for a sample mean to fall systematically
above or below µ, even if the distribution of the raw data is
skewed. Thus, the mean of the sampling distribution is an
unbiased estimate of the population mean µ.
[W6-16]
count
3000
2000
1000
0
0
5
10
15
x
[W6-17]
●
1.50
●
●
●
●
●
●
●
●
1.25
●
●
●
●
●
●
● ●
●
●
●
●
●
●
means
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1.00
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.75
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
25
50
75
100
sample
[W6-18]
Mean and Standard Deviation of a Sample Mean
Standard deviation of a sampling distribution of a sample mean
The standard deviation of the sampling distribution measures how
much the sample statistic varies from sample to sample. It is
smaller than the standard deviation of the population by a factor
√
of n.
Averages are less variable than individual observations
[W6-19]
0.4
f(x)
0.3
√σ
n
0.2
µ
0.1
0.0
−2
0
2
x¯x
[W6-20]
Example
Cats’ weights
It is known that the population standard deviation of cats’ weights
is σ = 0.5kg.
If we take a sample of 10 cats what is the standard deviation of
the sample mean?
If we take a sample of 30 cats what is the standard deviation of
the sample mean?
[W6-21]
Distribution Of The Sample Mean
The average of independent Normal random variables is also
Normally distributed.
[W6-22]
Example
Cats’ weights
It is known that the population standard deviation of cats’ weights
is σ = 0.5kg and the population mean is µ = 4kg.
If we take a sample of 10 cats what is the distribution of the
sample mean?
If we take a sample of 30 cats what is the distribution of the
sample mean?
[W6-23]
Central Limit Theorem
Most population distributions are not Normal. What is the shape
of the sampling distribution of sample means when the population
distribution isn’t Normal?
It is a remarkable fact that, as the sample size increases, the
distribution of sample means begins to look more and more like a
Normal distribution!
When the sample is large enough, the distribution of sample means
is very close to Normal, no matter what shape the population
distribution has, as long as the population has a finite standard
deviation.
[W6-24]
Central Limit Theorem
Draw an SRS of size n from a population with mean µ and finite
standard deviation σ. The central limit theorem (CLT) says that
when n is large, the sampling distribution of the sample mean x¯ is
approximately normal:
σ
x¯ ∼
: N µ, √
n
[W6-25]
CLT
120000
count
90000
60000
30000
0
0
5
10
15
x
[W6-26]
CLT
Sample size n = 5
count
6000
4000
2000
0
0
1
2
means
3
4
[W6-27]
CLT
Sample size n = 10
3000
count
2000
1000
0
0
1
2
means
3
[W6-28]
CLT
Sample size n = 30
1000
count
750
500
250
0
0.5
1.0
1.5
means
[W6-29]
Confidence Intervals
[W6-30]
The problem
Consider that we have a variable with a Normal distribution.
Assume that we know the population standard deviation σ, but we
do not know the population mean µ
How can we estimate the value of µ?
[W6-31]
Point estimate
If we want a single point estimate of the population mean µ, we
can take a simple random sample (SRS) from the population and
then use the sample mean x¯ to estimate the population
mean.
[W6-32]
Example
Consider the case of estimating the population mean amount of
active ingredient in manufactured tablets. You know that the
population standard deviation is 0.5mg. You have taken a random
sample of 10 tablets and got the following mg of active
ingredient:
29.57, 29.82, 30.45, 30.87, 30.46, 29.41, 29.03, 31.05, 30.11, 30.59
What is your estimate of the population mean active
ingredient?
[W6-33]
Confidence intervals
What if you would like a range for the population mean rather
than a point estimate?
Use a confidence interval.
[W6-34]
Confidence intervals
A confidence interval will give a range of values for the population
mean that we are confident about to a level C %, usually
95%.
[W6-35]
What do we mean by 95% confident
100
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
50
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
25
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
29.6
●
●
●
75
●
●
●
●
●
30.0
●
●
●
30.4
means
[W6-36]
Confidence interval
All confidence intervals we construct will have a similar form:
estimate ± critical value × standard error
[W6-37]
Confidence interval for population mean with known
population standard deviation
In the case of a population mean for a Normal distribution with a
known population standard deviation, then we have
estimate: the sample mean
critical value: we will get this from a Normal distribution and
denote it as z ∗
standard error: this is the standard deviation of the sample mean
√
σ/ n
[W6-38]
How to calculate z ∗
Consider confidence level 95%
0.4
y
0.3
0.95
0.2
0.025
0.1
0.025
0.0
−4
−2
0
2
4
x
[W6-39]
How to calculate z ∗
• Calculate
a=
1 − C /100
2
• Enter NORMINV(a,0,1) into excel
• Remove the minus sign and you have z ∗
For example for a 95% confidence interval z ∗ = 1.96
[W6-40]
Example
Calculate the 95% confidence interval for the tablet example.
[W6-41]
Interpretation of confidence interval
We are <CI level> confident that the true <parameter> of
<population> lies between <lower> and <upper>
<units>.
What is the interpretation of the 95% CI in this case?
[W6-42]
Summary
The formula for calculating the C% confidence interval for the
population mean of a Normal distribution with known population
standard deviation is
σ
x¯ ± z ∗ √
n
[W6-43]