STATA, pdf 6 pp

PubHlth640 - Spring 2015
Intermediate Biostatistics
Page 1 of 6
Unit 5 – Logistic Regression
WEEKS 10 - Practice Problems
SOLUTIONS – Stata version 13
Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition.
Boca Raton: Chapman and Hall, 2004.
Exercises #1-#3 utilize a data set provided by Afifi, Clark and May (2004). The data are a study of depression
and was a longitudinal study. The purpose of the study was to obtain estimates of the prevalence and incidence
of depression and to explore its risk factors. The study variables were of several types – demographics, life
events, stressors, physical health, health services utilization, medication use, lifestyle, and social support.
Tip - To access this data set in STATA, type the following into the command window:
use “http://people.um ass.edu/biep640w /datasets/depress.dta”
Consider the following three variables.
Variable
drink
sex
cases
Codings
1 = yes
2 = no
1 = male
2 = female
0 = Normal 1 = Case of Depression
Label in STATA
Regular Drinker
Depressed is cesd > 16
1. Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition.
Boca Raton: Chapman and Hall, 2004, Problem 12.9, page 330.
Using Stata, load the depression data set and execute the commands needed to fill in the following table:
Sex
Regular Drinker
Yes
No
Total
…. Sol_logistic_STATA.docx
Female
139
44
183
Male
95
16
111
Total
234
60
294
PubHlth640 - Spring 2015
Intermediate Biostatistics
Page 2 of 6
. tabulate drink sex
regular |
sex
drinker? |
male
female |
Total
-----------+----------------------+---------yes |
95
139 |
234
no |
16
44 |
60
-----------+----------------------+---------Total |
111
183 |
294
What are the odds that a woman is a regular drinker? 139 / 44 = 3.2
What are the odds that a man is a regular drinker? 95 / 16 = 5.9
What is the odds ratio? That is, compared to a man, what is the relative odds (odds ratio) that a woman is a
regular drinker? OR = [odds for woman] / [odds for man] = 3.2/5.9 = 0.54
2. Repeat the tabulation that you produced for problem #1 two times, one for persons who are depressed
and the other for persons who are not depressed.
. * Use command SORT to sort the data by case status (depressed or not depressed)
. sort case
. * Use the command BY in front of the command TABULATE
. by case: tabulate drink sex
----------------------------------------------------------------------------------------------------------------> cases = normal
regular |
sex
drinker? |
male
female |
Total
-----------+----------------------+---------yes |
87
106 |
193
no |
14
37 |
51
-----------+----------------------+---------Total |
101
143 |
244
----------------------------------------------------------------------------------------------------------------> cases = depressed
regular |
sex
drinker? |
male
female |
Total
-----------+----------------------+---------yes |
8
33 |
41
no |
2
7 |
9
-----------+----------------------+---------Total |
10
40 |
50
Among Persons Who are Depressed
Sex
Regular Drinker
Yes 33
No 7
Total 40
Female
Male
8
2
10
OR (Relative odds, compared to a man, that a woman is a regular drinker):
OR = [(33)(2)] / [(7)(8) ] = 1.18
…. Sol_logistic_STATA.docx
Total
41
9
50
PubHlth640 - Spring 2015
Intermediate Biostatistics
Page 3 of 6
Among Persons Who are NOT Depressed
Sex
Regular Drinker
Yes 106
No 37
Total 143
Female
Male
Total
87
14
101
193
51
244
OR (Relative odds, compared to a man, that a woman is a regular drinker):
OR = [(106)(14)] / [(37)(87)] = 0.46
3. Fit a logistic regression model using these variables. Use DRINK as the dependent variable and CASES
and SEX as independent variables. Also include as an independent variable the appropriate interaction
term.
.* Some variable creation commands
.* Create 0/1 indicators of drinker and female gender
. generate drink01=.
(294 missing values generated)
. replace drink01=1 if drink==1
(234 real changes made)
. replace drink01=0 if drink==2
(60 real changes made)
. label define drinkf 0 "0=nondrinker" 1 "1=drinker"
. label values drink01 drinkf
. generate female=.
(294 missing values generated)
. replace female=0 if sex==1
(111 real changes made)
. replace female=1 if sex==2
(183 real changes made)
. label define sexf 0 "0=male" 1 "1=female"
. label values female sexf
.* Create a new variable called FEM_CASE that is the interaction of FEMALE and CASES
. generate fem_case=female*cases
. * Use the command LOGISTIC if you want output to include ODDS RATIOS
. * Use the command LOGIT if you want the output to include BETAs and SEs
. * LOGISTIC OUTCOME PREDICTOR PREDICTOR etc..
. logit drink01 cases female fem_case
Logistic regression
Log likelihood = -145.95772
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
=
=
=
=
294
5.62
0.1318
0.0189
-----------------------------------------------------------------------------drink01 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------cases | -.4405564
.8413815
-0.52
0.601
-2.089634
1.208521
female | -.7743296
.3455196
-2.24
0.025
-1.451536
-.0971237
fem_case |
.9386327
.9578851
0.98
0.327
-.9387877
2.816053
_cons |
1.826851
.2879632
6.34
0.000
1.262453
2.391248
------------------------------------------------------------------------------
…. Sol_logistic_STATA.docx
PubHlth640 - Spring 2015
Intermediate Biostatistics
Page 4 of 6
Fitted Model:
logit [ pr (drinker=yes) ] = 1.8269 - 0.4406 [CASES] - 0.7743[FEMALE] + 0.9386 [FEM_CASE]
where CASES =1 if depressed; 0 otherwise
FEMALE = 1 if female; 0 otherwise
FEM_CASE = (CASES) * (FEMALE)
ˆ ˆ ) = 0.96 and p-value = .33
Is the interaction term in your model significant? No. βˆ 3 = 0.9386 SE(β
0
How does your answer to problem #3 compare to your answer to problem #2? Comment.
The answers match.
Among Depressed: OR = 1.18
Among NON-depressed: OR = 0.46
logit [ pr (drinker=yes) ] = 1.8269 - 0.4406 [CASES] - 0.7743[FEMALE] + 0.9386 [FEM_CASE]
CASES
FEMALE
FEM_CASE
Among Depressed
“1” = Female
“0” = Male
1
1
1
0
1
0
logit [ female ] = 1.8269 – 0.4406 – 0.7743 + 0.9386
= 1.5506
logit [male] = 1.8269 – 0.4406
= 1.3863
logit [ female ] - logit [ male ] = 1.5506 - 1.3863
= + 0.1643
OR [women compared to men ] = exp { logit [ p1 ] - logit [ p0 ] }
= exp { + 0.1643 }
= 1.1786
…. Sol_logistic_STATA.docx
PubHlth640 - Spring 2015
CASES
FEMALE
FEM_CASE
Intermediate Biostatistics
Page 5 of 6
Among NON Depressed
“1” = Female
“0” = Male
0
0
1
0
0
0
logit [ female ] = 1.8269 – 0.7743
= 1.0526
logit [male] = 1.8269
logit [ female ] - logit [ male ] = 1.0526 - 1.8269
= -0.7743
OR [women compared to men ] = exp { logit [ p1 ] - logit [ p0 ] }
= exp { -0.7743 }
= 0.4610
4. Source: Kleinbaum, Kupper, Miller, and Nizam. Applied Regression Analysis and Other Multivariable
Methods, Third Edition. Pacific Grove: Duxbury Press, 1998. p 683 (problem 2).
A five year follow-up study on 600 disease free subjects was carried out to assess the effect
of 0/1 exposure E on the development (or not) of a certain disease. The variables AGE (continuous)
and obesity status (OBS), the latter a 0/1 variable were determined at the start of the follow-up and were
to be considered as control variables in analyzing the data.
(A) State the logit form of a logistic regression model that assesses the effect of the 0/1 exposure
variable E controlling for the confounding effects of AGE and OBS and the interaction effects
of AGE with E and OBS with E.
Solution:
logit[π] = β0 + β1*E + β 2 *AGE + β3 *OBS + β4 *AGEE + β5 *OBSE
I used the following notation:
π = Probability [ disease ]
AGEE = AGE * E. This is a created variable that is the interaction of AGE with E
OBSE = OBS * E Similarly, this is the interaction of OBS with E.
logit[π] = β0 + β1*E + β 2 *AGE + β3 *OBS + β4 *AGEE + β5 *OBSE
…. Sol_logistic_STATA.docx
PubHlth640 - Spring 2015
Intermediate Biostatistics
Page 6 of 6
(B) Given the model you have for part “A”, give a formula for the odds ratio for the exposure-disease
relationship that controls for the confounding and interactive effects of AGE and OBS.
Solution:
The solution here follows the ideas on pp 9-11 in Lecture Notes 5, Logistic Regression.
Value of Predictor for Person who is
Exposed
Not Exposed
Predictor
E
AGE
OBS
AGEE
OBSE
1
AGE1
OBS1
AGE1
OBS1
0
AGE0
OBS0
0
0
Then
OR = exp { logit[π for exposed person] - logit[π for NON exposed person] }
= exp { [ β0 + β1 + β 2 *AGE1 + β3 *OBS1 + β 4 *AGE1 + β5 *OBS1 ]
- [ β0 + β 2 *AGE 0 + β3 *OBS0 ] }
= exp { β1 + β 2 *(AGE1 -AGE o ) + β3 *(OBS1 - OBS0 ) + β 4 *AGE1 + β5 *OBS1 }
(C) Now use the formula that you have for part “B” to write an expression for the estimated odds
ratio for the exposure-disease relationship that considers both confounding and interaction
when AGE=40 and OBS=1.
Solution:
ORˆ = exp { β1 + (40)β 4 + β5 }
Predictor
E
AGE
OBS
AGEE
OBSE
Value of Predictor for Person who is
Exposed
Not Exposed
1
0
40
40
1
1
40
0
1
0
OR = exp { β1 + β 2 *(40-40) + β3 *(1 - 1) + β 4 *40 + β5 *1 } = exp { β1 + β 4 *40 + β5 *1 }
…. Sol_logistic_STATA.docx