Download Report

Chapter 7: Inference for Distributions
A visual comparison of
normal and paranormal
distribution
Lower caption says
'Paranormal Distribution' - no
idea why the graphical artifact
is occurring.
http://stats.stackexchange.com/questions/423/what-is-your-favorite-data-analysis-cartoon
2
7.1: Inference for the Mean of a Population
- Goals
• Be able to distinguish the standard deviation from the
standard error of the sample mean.
• Be able to construct a level C confidence interval
(without knowing ) and interpret the results.
• Perform a one-sample t significance and summarize the
results.
• Be able to determine when the t procedure is valid.
3
Conditions for Inference (Chapter 6)
1. The variable we measure has a Normal
distribution with mean  and standard
deviation σ.
2. We don’t know , but we do know σ.
3. We have an SRS from the population of
interest.
4
Shape of t-distribution
http://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Student_t_pdf.svg/1000
px-Student_t_pdf.svg.png
5
t-Table
(Table D)
6
Table A vs. Table D
Table A
Standard normal (z)
P(Z ≤ z)
df not required
Table D
t-distribution
P(T > t)
df required
7
Example: t critical values
What is the t critical value for the following:
a) Central area = 0.95, df = 10
b) Central area = 0.95, df = 60
c) Central area = 0.95, df = 100
d) Central area = 0.95, z curve
e) Upper area = 0.99, df = 10
f) lower area = 0.99, df = 10
8
Summary: CI
Confidence Interval x ± t*(df)
s
n
Upper Confidence Bound  < x + t*(df)
Lower Confidence Bound  > x - t*(df)
Sample Size
 t's 
n

m
s
n
s
n
2
9
Example: Sample size
You are in charge of quality control in your food
company. You sample randomly four packs of
cherry tomatoes. The average weight from your
four boxes is 222 g with a sample standard
deviation of 5 g.
a) What sample size is required to obtain a
margin of error of 2 g at a 95% confidence
level?
10
Single mean test: Summary
Null hypothesis: H0: μ = μ0
x  0
Test statistic: t 
s/ n
Alternative
Hypothesis
One-sided: upper-tailed Ha: μ > μ0
One-sided: lower-tailed Ha: μ < μ0
two-sided
Ha: μ ≠ μ0
P-Value
P(T ≥ t)
P(T ≤ t)
2P(T ≥ |t|)
11
Robustness of the t-procedure
• A statistical value or procedure is robust if the
calculations required are insensitive to
violations of the condition.
• The t-procedure is robust against normality.
– n < 15 : population distribution should be
close to normal.
– 15 < n < 40: mild skewedness is acceptable
– n > 40: procedure is usually valid.
12
Inferences for Non-Normal Distributions
• If you know what the distribution is, use the
appropriate model.
• If the data is skewed, you can transform the
variable.
• Use a nonparametric procedure.
13
7.2: Comparing two Means - Goals
• Be able to construct a level C confidence interval for
the difference between two means and interpret the
results.
• Perform a two-sample t significance and summarize the
results.
• Be able to construct a level C confidence interval for a
matched pair and interpret the results.
• Perform a matched pair t significance and summarize
the results.
• Be able to determine when the t procedure is valid.
14
Conditions for Inference: 2 - sample
1. Each group is considered to be a sample from
a distinct population.
• We have an SRS from the population of
interest for each variable.
2. The responses in each group are
independent of those in the other group.
3. The variable(s) we measure has a Normal
distribution with mean  and standard
deviation σ.
15
Df for 2-sample t test
s s 
  
n1 n 2 
df 
2
2
2
2
1 s1 
1 s2 
  
 
n1 1 n1  n 2 1 n 2 
2
1
2
2
2

16
Two-sample Test (independent): Summary
Null hypothesis: H0: μ1 – μ2 = Δ
Test statistic: t  x1  x2  
s12 s22

n1 n2
Alternative
P-Value
Hypothesis
Upper-tailed
Ha: μ1 – μ2 > Δ P(T ≥ t)
Lower-tailed
Ha: μ1 – μ2 < Δ P(T ≤ t)
two-sided
Ha: μ1 – μ2 ≠ Δ 2P(T ≥ |t|)
Note: If we are determining if the two populations
are equal, then Δ = 0
17
Example: two-sample Independent t
A group of 15 college seniors are selected to
participate in a manual dexterity skill test against a
group of 20 industrial workers. Skills are assessed by
scores obtained on a test taken by both groups. The
data is shown in the following table:
a) Perform a significance test to determine if the skills
are the same for college students and industrial
works at a significance level of 0.05.
b) Calculate and interpret the 95% confidence
interval. Group
n
x̅
s
Students
Workers
15
20
35.12
37.32
4.31
3.83
18
Example: two-sample Independent t
(cont)
The data does not provide support (P = 0.128) to
the claim that there is a difference between the
population mean tests for students and workers.
19
Two-sample Test (independent): CI
Summary
𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 𝑚
= 𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 𝑡 ∗ (𝑑𝑓)𝑆𝐸
2
2
𝑠
𝑠
1
2
∗
= 𝑥1 − 𝑥2 ± 𝑡 (𝑑𝑓)
+
𝑛1 𝑛2
20
Example: two-sample Independent t
A group of 15 college seniors are selected to
participate in a manual dexterity skill test against a
group of 20 industrial workers. Skills are assessed by
scores obtained on a test taken by both groups. The
data is shown in the following table:
a) Perform a significance test to determine if the skills
are the same for college students and industrial
works at a significance level of 0.05.
b) Calculate and interpret the 95% confidence
interval. Group
n
x̅
s
Students
Workers
15
20
35.12
37.32
4.31
3.83
21
Example: two-sample Independent t
(CI) (cont)
We are 95% confidence that the difference
between the population mean tests of students
and workers is between -5.08 and 0.68.
P-value = 0.128, (-5.08, 0.68)
22
Matched Pairs Procedures
• To compare the responses to the two
treatments in a matched-pairs design, find the
difference between the responses within each
pair. Then apply the one-sample t procedures
to these differences.
23
Conditions for Inference: 2 - sample
1. Each pair is considered to be a sample from a
population of pairs.
• We have an SRS from the population of
pairs.
2. Each pair is independent of the other pairs.
3. The difference of the each pair that we
measure has a Normal distribution with
mean D and standard deviation σD.
24
Two-sample Matched Pair
𝑠𝑑
𝑆𝐸 =
𝑛
𝑠𝑑
∗
𝑑 ± 𝑡 (𝑑𝑓)
𝑛
25
Two-sample matched pair Test: Summary
Null hypothesis: H0: μD = 
d 
Test statistic: t 
sd / n
Alternative
Hypothesis
One-sided: upper-tailed Ha: μD > 
One-sided: lower-tailed Ha: μD < 
two-sided
Ha: μD ≠ 
P-Value
P(T ≥ t)
P(T ≤ t)
2P(T ≥ |t|)
Note: If we are determining if the two
populations are equal, then Δ = 0
26
Example: Paired t test Procedure
In an effort to determine whether sensitivity training for
nurses would improve the quality of nursing provided
at an area hospital, the following study was conducted.
Eight different nurses were selected and their nursing
skills were given a score from 1 to 10. After this initial
screening, a training program was administered, and
then the same nurses were rated again. On the next
slide is a table of their pre- and post-training scores.
a) Conduct a test to determine whether the training could
on average improve the quality of nursing provided in
the population at a 0.01 significance level.
b) Calculate and interpret the 99% lower confidence
bound of the population mean difference in nursing
scores?
27
Individual Pre-Training Post-Training Pre - Post
1
2
3
4
5
6
7
8
mean
stdev
2.56
3.22
3.45
5.55
5.63
7.89
7.66
6.20
5.27
2.018
4.54
5.33
4.32
7.45
7.00
9.80
7.33
6.80
6.57
1.803
-1.98
-2.11
-0.87
-1.90
-1.37
-1.91
0.33
-0.60
-1.30
0.861
28
Example: Paired t test Procedure (cont)
The data does provide strong support (P =
0.002) to the claim that the population average
score did improve after training.
29
Example: Paired t test Procedure
In an effort to determine whether sensitivity training for
nurses would improve the quality of nursing provided
at an area hospital, the following study was conducted.
Eight different nurses were selected and their nursing
skills were given a score from 1 to 10. After this initial
screening, a training program was administered, and
then the same nurses were rated again. On the next
slide is a table of their pre- and post-training scores.
a) Conduct a test to determine whether the training could
on average improve the quality of nursing provided in
the population at a 0.01 significance level.
b) Calculate and interpret the 99% upper confidence
bound of the population mean difference in nursing
scores?
30
Individual Pre-Training Post-Training Pre - Post
1
2
3
4
5
6
7
8
mean
stdev
2.56
3.22
3.45
5.55
5.63
7.89
7.66
6.20
5.27
2.018
4.54
5.33
4.32
7.45
7.00
9.80
7.33
6.80
6.57
1.803
-1.98
-2.11
-0.87
-1.90
-1.37
-1.91
0.33
-0.60
-1.30
0.861
31
Example: Paired t test Procedure (cont)
We are 99% confident that the difference in the
scores between pre-training and post-training
scores is less than -0.39.
P = 0.00185,  < -0.39
32
Independent vs. Paired
1. If there is great heterogeneity between
experimental units and a large correlation
within experimental units then a paired
experiment is preferable.
2. If the experimental units are relatively
homogeneous and the correlation within pairs
is not large, then unpaired experiments should
be used
33
Robustness of the 2 sample tprocedure
• The t-procedure is very robust against
normality. Let n = n1 + n2
– n < 15 : population distribution should be
close to normal.
– 15 < n < 40: mild skewedness is acceptable
– n > 40: procedure is usually valid.
• Best when n1  n2
• Best when distributions are similar.
34
In Class (or HW): 2-sample
Independent or Paired
For the following questions, state which method
is better; independent or paired and why. The
following explanations are wrong: 1) there is no
information for one of the methods, 2) the data
is matched in the exercise, 3) the number of
data points is different (or the same).
35
Example 1
Example 1: Does dress affect competence and
intelligence ratings? Researchers performed a
study to examine whether or not women are
perceived as less competent and less intelligent
when they dress in a sexy manner versus a
business-like manner. Competence was rated
from 1 (not at all) to 7 (extremely), and a 1 to 5
scale was used for intelligence. Under each
condition, 17 subjects provided data.
36
Example 2
Example 2: Perceived quality of high- and lowperforming restaurants. A study classified 394
quick-service restaurants (QSR) into highperforming and low-performing groups based on
their total sales. Each restaurant was rated on a
collection of perceived measures of quality by a
large number of diners using a 1 to 7 scale. In this
study we view the diners as a measuring
instrument, and our major interest is in comparing
the 170 high-sales restaurants with the 224 lowsales restaurants.
37
Example 3
Example 3: Air in poultry-processing plants. The air
in poultry-processing plants often contains fungus
spores. If the ventilation is inadequate, this can
affect the health of the workers. The problem is
most serious during the summer. To measure the
presence of spores, air samples are pumped to an
agar plate and “colony-forming units (CFUs)” are
counted after an incubation period. Here are data
from two locations in a plant that processes 37,000
turkeys per day, taken on four days in the summer.
38
Example 4
Example 4: The manufacture of dyed clothing
fabrics. Different fabrics respond differently
when dyed. This matters to clothing
manufacturers, who want the color of the fabric
to be just right. Fabrics made of cotton and of
ramie are dyed with the same “procion blue” die
applied in the same way. A colorimeter is used
to measure the lightness of the color on a scale
in which black is 0 and white is 100.
39
Example 5
Example 5: Durable press and breaking
strength. “Durable press” cotton fabrics are
treated to improve their recovery from wrinkles
after washing. Unfortunately, the treatment also
reduces the strength of the fabric. A study
compared the breaking strength of fabric
treated by two commercial durable press
processes. Five specimens of the same fabric
were assigned at random to each process.
40
Example 6
Example 6: Brain training. The assessment of
computerized brain-training programs is a rapidly
growing area of research. Researchers are now
focusing on who this training benefits most, what
brain functions are most susceptible to improvement,
and which products are most effective. A recent study
looked at 487 community-dwelling adults aged 65 and
older, each randomly assigned to one of two training
groups. In one group, the participants used a
computerized program 1 hour per day. In the other,
DVD-based educational programs were shown and
quizzes were administered after each video. The
training period lasted 8 weeks. The response was the
improvement in a composite score obtained from an
auditory memory/attention survey given before and
after the 8 weeks.
41
Example 7
Example 7: Occupation and diet. Do various
occupational groups differ in their diets? A
British study of this question compared 98
drivers and 83 conductors of London doubledecker buses. The conductors’ jobs require more
physical activity.
42