Review for the Final Exam STAT E-150 Statistical Methods

STAT E-150
Statistical Methods
Review for the Final Exam
The final exam will be on Wednesday, May 15 during our regular class
time. You will have two hours to complete the exam.
Please arrive on time and take alternating seats so that you have room
for your materials.
The exam is open book: you may use your own notes, handouts,
homework, textbook, etc. Basically, you may use any of your own
materials from this course, from this semester.
2
Topics will include: simple linear regression, multiple regression, oneway ANOVA, two-way ANOVA, repeated measures, logistic regression,
multiple logistic regression, experimental design, and nonparametric
tests. No specific topics from the first half of the course will be tested,
except as they relate to the topics listed here.
3
What to expect
• Multiple choice, matching questions, and short answer questions
as there were on the midterm
• Using SPSS output instead of calculations
• Interpreting graphs
You’ll be tested on what you’ve seen (assuming you’ve done your
homework …)
4
If you would like to have your exam returned to you, please bring a
large self-addressed envelope to the exam, with sufficient postage
attached (not affixed).
If you forget to bring an envelope to the exam, you can send one to me
at Pine Manor College, 400 Heath St, Chestnut Hill, MA 02467.
5
WELL BEFORE THE EXAM
Organize the course handouts, your homework, homework solutions,
and any section materials from the class website in a 3-ring binder,
tabbed for quick reference. Materials from before the Midterm could be
helpful as well.
Make a few pages of your own key notes, such as particular formulas,
definitions or concepts. Consider making your own guides to
hypothesis tests, such as the example in this document.
6
WELL BEFORE THE EXAM
Review the class notes and homework solutions, especially on
concepts that you need to review more carefully.
Be sure that you have taken the Practice Final using the materials
(calculator, notes, section handouts...) that you will use for the exam.
Attend some or all of these sections:
Fri.
5/10
53 Church St., Rm 201
5:30 - 6:30
Kela
Mon.
5/13
53 Church St., Rm 202
5:30 - 6:30
Kela
Harvard Hall, Rm 201
4:30 - 5:20
Stephanie
Wed. 5/15
7
BEFORE THE EXAM
Organize all materials, notebooks, textbook, section handouts, etc.
Only your own materials from this course, this semester, are permitted.
Be sure you have your calculator, and possibly an extra one. You may
not use a graphing calculator for statistics functions, and you may not
use a cell phone or PDA calculator.
When you are ready to begin, RELAX!, and continue to think positive
thoughts about the outcome of the exam, as research has shown this
technique to contribute to better scores (Meichenbaum, 1996).
8
DURING THE EXAM
Read questions carefully and thoroughly.
Pace yourself, and keep track of your progress and the clock. Don’t get
bogged down. Consider noting any difficult questions and coming back
later.
Think carefully about the appropriate analysis: Hypothesis test or
confidence interval? Is the response variable quantitative or
categorical? How many treatments? Which is the coefficient of
interest? Are you concerned with means or relationships between
variables?
When you are finished, go back to check that you haven’t skipped any
questions.
9
AFTER THE EXAM
CONGRATULATIONS!! IT’S TIME TO CELEBRATE!!!
10
SAMPLE HYPOTHESIS TEST GUIDE - create your own for each
type of test
Multiple Regression
Test 1: Overall significance of multiple regression model; use an F-test
for the model (ANOVA).
H0: β1 = β2 =… = βk = 0
Ha: The slopes are not all zero
Test 2: Specific significance of single coefficient; use a t-test for each
coefficient.
H0: βj = 0
Ha: βj ≠ 0
11
Review Questions
For each question choose the best method of analysis and write the
appropriate hypotheses.
Choose from the statistical methods we have discussed this semester:
Simple linear regression
Multiple regression
Logistic regression
Multiple logistic regression
One-way ANOVA
Two-way ANOVA
Repeated measures ANOVA
12
1. A survey asked subjects to report their political ideology, measured
with seven categories in which 1 = extremely liberal, 4 = moderate, and
7 = extremely conservative. The subjects also reported their gender
(male, female) and their level of education (no college, some college,
college graduate). The data was used to investigate any differences in
the political ideologies of these groups.
What is the best method of analysis?
What would be your hypotheses?
13
1. A survey asked subjects to report their political ideology, measured
with seven categories in which 1 = extremely liberal, 4 = moderate, and
7 = extremely conservative. The subjects also reported their gender
(male, female) and their level of education (no college, some college,
college graduate). The data was used to investigate any differences in
the political ideologies of these groups.
What is the best method of analysis? Two-way ANOVA
What would be your hypotheses?
14
1. A survey asked subjects to report their political ideology, measured
with seven categories in which 1 = extremely liberal, 4 = moderate, and
7 = extremely conservative. The subjects also reported their gender
(male, female) and their level of education (no college, some college,
college graduate). The data was used to investigate any differences in
the political ideologies of these groups.
What is the best method of analysis? Two-way ANOVA
What would be your hypotheses?
H0: μmn = μfn = μms = μfs = μmg = μfg
Ha: the means are not all equal
15
2. A survey asked subjects to report their political ideology, measured
with seven categories in which 1 = extremely liberal, 4 = moderate, and
7 = extremely conservative. The subjects also reported their gender
(male, female) and their level of education (no college, some college,
college graduate). The data was used to see if there were differences
in the political ideologies of people with different levels of education.
What is the best method of analysis?
What would be your hypotheses?
16
2. A survey asked subjects to report their political ideology, measured
with seven categories in which 1 = extremely liberal, 4 = moderate, and
7 = extremely conservative. The subjects also reported their gender
(male, female) and their level of education (no college, some college,
college graduate). The data was used to see if there were differences
in the political ideologies of people with different levels of education.
What is the best method of analysis? One-way ANOVA
What would be your hypotheses?
17
2. A survey asked subjects to report their political ideology, measured
with seven categories in which 1 = extremely liberal, 4 = moderate, and
7 = extremely conservative. The subjects also reported their gender
(male, female) and their level of education (no college, some college,
college graduate). The data was used to see if there were differences
in the political ideologies of people with different levels of education.
What is the best method of analysis? One-way ANOVA
What would be your hypotheses?
H0: μnone = μsome = μgrad
Ha: the means are not all equal
18
3. In a survey related to Jeb Bush’s possible candidacy for President,
subjects were asked for their annual income and whether they would
vote for Bush, to see if there is any relationship between income and
interest in voting for Bush.
What is the best method of analysis?
What would be your hypotheses?
19
3. In a survey related to Jeb Bush’s possible candidacy for President,
subjects were asked for their annual income and whether they would
vote for Bush, to see if there is any relationship between income and
interest in voting for Bush.
What is the best method of analysis? Logistic regression
What would be your hypotheses?
20
3. In a survey related to Jeb Bush’s possible candidacy for President,
subjects were asked for their annual income and whether they would
vote for Bush, to see if there is any relationship between income and
interest in voting for Bush.
What is the best method of analysis? Logistic regression
What would be your hypotheses?
H 0:  1 = 0
H a:  1 ≠ 0
21
4. Many women give birth to more than one child. In research on the
birthweights of children, data was gathered on the birthweights of
children born to six different mothers. The data looked like this:
Birthweights
Mother
Child 1
Child 2
Child 3
Child 4
1
2
:
6
6.4
8.5
:
7.0
6.9
7.8
:
7.8
6.7
7.8
:
8.6
7.1
8.3
:
6.6
What is the best method of analysis?
What would be your hypotheses?
22
4. Many women give birth to more than one child. In research on the
birthweights of children, data was gathered on the birthweights of
children born to six different mothers. The data looked like this:
Birthweights
Mother
Child 1
Child 2
Child 3
Child 4
1
2
:
6
6.4
8.5
:
7.0
6.9
7.8
:
7.8
6.7
7.8
:
8.6
7.1
8.3
:
6.6
What is the best method of analysis? Repeated measures ANOVA
What would be your hypotheses?
23
4. Many women give birth to more than one child. In research on the
birthweights of children, data was gathered on the birthweights of
children born to six different mothers. The data looked like this:
Birthweights
Mother
Child 1
Child 2
Child 3
Child 4
1
2
:
6
6.4
8.5
:
7.0
6.9
7.8
:
7.8
6.7
7.8
:
8.6
7.1
8.3
:
6.6
What is the best method of analysis? Repeated measures ANOVA
What would be your hypotheses?
H0: μ1 = μ2 = μ3 = μ4
Ha: the means are not all equal
24
5. An investigator for the state police wants to determine the
effectiveness of three different defensive driving programs to see if
there are gender differences. Five subjects of each gender who
recently received speeding tickets are assigned to each program. At
the end of the program each is given a written test on his or her
knowledge of defensive driving.
The scores (out of 100) are given here:
Scores
Gender
Female
Male
One 8 - hour
session
88
92
98
99
91
Two 4 - hour
sessions
87
92
91
94
93
Two 2 - hour
sessions
80
82
79
86
88
89
96
95
90
96
95
87
90
91
92
77
78
83
78
78
25
Use the SPSS results shown to answer these questions:
1. Is there interaction between program and gender?
a.
b.
c.
d.
e.
No, because the p-value is close to zero.
Yes, because .196 is greater than .05
No, because .196 is greater than .05
Yes, because .374 is greater than .05
No, because .374 is greater than .05
Tests of Between-Subjects Effects
Dependent Variable:score
Type III Sum of
Source
Squares
df
Mean Square
F
Sig.
935.500a
5
187.100
15.923
.000
234967.500
1
234967.500
19997.234
.000
20.833
1
20.833
1.773
.196
890.600
2
445.300
37.898
.000
24.067
2
12.033
1.024
.374
Error
282.000
24
11.750
Total
236185.000
30
1217.500
29
Corrected Model
Intercept
gender
sessions
gender * sessions
Corrected Total
a. R Squared = .768 (Adjusted R Squared = .720)
26
1. Is there interaction between program and gender?
a.
b.
c.
d.
e.
No, because the p-value is close to zero.
Yes, because .196 is greater than .05
No, because .196 is greater than .05
Yes, because .374 is greater than .05
No, because .374 is greater than .05
Tests of Between-Subjects Effects
Dependent Variable:score
Type III Sum of
Source
Squares
df
Mean Square
F
Sig.
935.500a
5
187.100
15.923
.000
234967.500
1
234967.500
19997.234
.000
20.833
1
20.833
1.773
.196
890.600
2
445.300
37.898
.000
24.067
2
12.033
1.024
.374
Error
282.000
24
11.750
Total
236185.000
30
1217.500
29
Corrected Model
Intercept
gender
sessions
gender * sessions
Corrected Total
a. R Squared = .768 (Adjusted R Squared = .720)
27
2. How does the interaction plot support your results in the previous
question?
a. There is evidence of interaction because for part of the plot the
lines appear to be parallel.
b. There is no evidence of interaction because for part of the plot
the lines are not parallel.
c. There is evidence of interaction because the lines do not
intersect.
d. There is no evidence of interaction because the lines do not
intersect.
e. None of the above.
28
2. How does the interaction plot support your results in the previous
question?
a. There is evidence of interaction because for part of the plot the
lines appear to be parallel.
b. There is no evidence of interaction because for part of the plot
the lines are not parallel.
c. There is evidence of interaction because the lines do not
intersect.
d. There is no evidence of interaction because the lines do not
intersect.
e. None of the above.
29
3. Is there a significant difference in the mean scores for the three
types of sessions?
a.
b.
c.
d.
e.
Yes, because F = 1.773 and p is large.
No, because F = 1.773 and p is large.
Yes, because F = 37.898 and p is close to 0.
No, because F = 37.898 and p is close to 0.
None of the above.
Tests of Between-Subjects Effects
Dependent Variable:score
Type III Sum of
Source
Squares
df
Mean Square
F
Sig.
935.500a
5
187.100
15.923
.000
234967.500
1
234967.500
19997.234
.000
20.833
1
20.833
1.773
.196
890.600
2
445.300
37.898
.000
24.067
2
12.033
1.024
.374
Error
282.000
24
11.750
Total
236185.000
30
1217.500
29
Corrected Model
Intercept
gender
sessions
gender * sessions
Corrected Total
a. R Squared = .768 (Adjusted R Squared = .720)
30
3. Is there a significant difference in the mean scores for the three
types of sessions?
a.
b.
c.
d.
e.
Yes, because F = 1.773 and p is large.
No, because F = 1.773 and p is large.
Yes, because F = 37.898 and p is close to 0.
No, because F = 37.898 and p is close to 0.
None of the above.
Tests of Between-Subjects Effects
Dependent Variable:score
Type III Sum of
Source
Squares
df
Mean Square
F
Sig.
935.500a
5
187.100
15.923
.000
234967.500
1
234967.500
19997.234
.000
20.833
1
20.833
1.773
.196
890.600
2
445.300
37.898
.000
24.067
2
12.033
1.024
.374
Error
282.000
24
11.750
Total
236185.000
30
1217.500
29
Corrected Model
Intercept
gender
sessions
gender * sessions
Corrected Total
a. R Squared = .768 (Adjusted R Squared = .720)
31
4. Is there significant difference in the scores by gender?
Yes/No, because F = 1.773 and the p-value is large/small.
Tests of Between-Subjects Effects
Dependent Variable:score
Type III Sum of
Source
Squares
df
Mean Square
F
Sig.
935.500a
5
187.100
15.923
.000
234967.500
1
234967.500
19997.234
.000
20.833
1
20.833
1.773
.196
890.600
2
445.300
37.898
.000
24.067
2
12.033
1.024
.374
Error
282.000
24
11.750
Total
236185.000
30
1217.500
29
Corrected Model
Intercept
gender
sessions
gender * sessions
Corrected Total
a. R Squared = .768 (Adjusted R Squared = .720)
32
4. Is there significant difference in the scores by gender?
Yes/No, because F = 1.773 and the p-value is large/small.
Tests of Between-Subjects Effects
Dependent Variable:score
Type III Sum of
Source
Squares
df
Mean Square
F
Sig.
935.500a
5
187.100
15.923
.000
234967.500
1
234967.500
19997.234
.000
gender
20.833
1
20.833
1.773
.196
sessions
890.600
2
445.300
37.898
.000
24.067
2
12.033
1.024
.374
Error
282.000
24
11.750
Total
236185.000
30
1217.500
29
Corrected Model
Intercept
gender * sessions
Corrected Total
a. R Squared = .768 (Adjusted R Squared = .720)
33