Download Report

How to Learn Everything You Ever
Wanted to Know About Biostatistics
Daniel W. Byrne
Director of Biostatistics and Study Design
General Clinical Research Center
Vanderbilt University Medical Center
1
The presenter has no financial interests in the products mentioned in this talk.
Objective of This Workshop

To provide a 1-hour overview of the
important practical information that a
clinical investigator needs to know about
biostatistics to be successful.
2
I. You Will Need the Right Tools
3
Install a powerful, yet easy to use, statistical
software package on your computer.

I recommend SPSS for Windows.

Bring an 1180 for $80 to Karen Montefiori
in 143 Hill Student Center (3-1630).

She will lend you the SPSS CD for the day
and you can install this software easily.
4
SPSS is the 2nd most popular package.
It is much easier to use than SAS and Stata.
1
163 SAS
2
3
4
5
6
7
8
9
10
52
48
36
22
19
12
8
6
5
SPSS
STATA
Epi Info
SUDAAN
S-PLUS
Statxact
BMDP
Statistica
Statview
5
Install additional software for
statistical “odds and ends”

Instat by GraphPad – graphpad.com


True Epistat by Epistat Services –
true-epistat.com - $395


for summary data analysis - $100
for random number table, etc.
CIA (Confidence Interval Analysis)
– bmj.com

for confidence intervals - $35.95 with book
6
Install a sample size program.

If you can afford to spend $400, buy nQuery
Advisor – statistical solutions www.statsol.com

If you can afford to spend $0, download PS from
the Vanderbilt web site –

http://www.mc.vanderbilt.edu/prevmed/
ps/index.htm

Both packages are on the CRC’s statistical
7
II. You Will Need a Plan
8
Use the scientific method to keep
your project focused.






State the problem
Formulate the null hypothesis
Design the study
Collect the data
Interpret the data
Draw conclusions
9
State the Problem


Among patients hospitalized for a hip fracture
who develop pneumonia during their stay in the
hospital, the mortality rate is 2.3 times higher at
non-trauma centers compared with trauma centers
 (48.7% vs. 21.1%, P=0.043.)
It is not clear if, or how, those who will develop
pneumonia could be identified on admission.
10
Formulate the Null Hypothesis

Among patients hospitalized for treatment
of a hip fracture, there are no factors known
upon admission that are statistically
different between those who develop
pneumonia during their stay and those who
do not.
11
Why bother with a null hypothesis?



For the same reason that we assume that a person
is innocent until proven guilty.
The burden of responsibility is on the prosecutor
to demonstrate enough evidence for members of a
jury to be convinced of that the charges are true
and to change their minds.
Outcome after treatment with Drug A will not be
significantly different from placebo.
12
Design the Study
Data on 933 patients with a hip fracture
from a New York trauma registry will be
analyzed.
 The 58 patients with pneumonia will be
compared with the 875 without pneumonia.

13
The Most Common Type of Flaw
Study Design
20
Interpretation of the findings
4
Importance of the topic
4
0
Presentation of the results
0
5
10
15
20
25
Number of Responses
14
Example of Recall Bias



A control group is asked,
 “Two weeks ago from today, did you eat X for
breakfast?”
Two weeks after their MI, patients are asked
 “Did you eat X for breakfast on the day of your
heart attack?”
You can prove any food causes an MI using this
method (X=bacon, X=Flintstone vitamins, etc.)
15
John Bailar’s Quote:
“Study design and bias are much more
important than complex statistical
methods.”
 Devote more time to improving the study
design, and minimizing and measuring bias.
 Become an expert at study design issues and
biases in your area of research.

16
What is the statistical power of
the study?
Power
 Beta
 Alpha
 Sample size
 Ratio of treated to control group
 Measure of outcome

17
Sample Size Table

See Table 9-1 in the handout
 “Sample Size Requirements for Each of
Two Groups”.
18
19
Collect the Data

See the handouts for:
 ITEC Trauma Systems Study
20
III. You Will Need Data
Management Skills
21
Enter your data with statistical
analysis in mind.
For small projects enter data into Microsoft
Excel or directly into SPSS.
 For large projects, create a database with
Microsoft Access.
 Keep variables names in the first row, with
<=8 characters, and no internal spaces.
 Enter as little text as possible and use codes
for categories, such as 1=male, 2=female.

22
Spreadsheet from Hell
23
Spreadsheet from Heaven
24
IV. You Will Need to Learn
Descriptive Statistics
25
Descriptive vs. Inferential

Descriptive statistics summarize your group.


average age 78.5, 89.3% white.
Inferential statistics use the theory of probability
to make inferences about larger populations from
your sample.

White patients were significantly older than
black and Hispanic patients, P<0.001.
26
Import your data into a statistical
program for screening and analysis.
27
Screen your data thoroughly for errors and
inconsistencies before doing ANY analyses.



Check the lowest and highest value for each
variable.
 For example, age 1-777.
Look at histograms to detect typos.
Cross-check variables to detect impossible
combinations.
 For example, pregnant males, survivors
discharged to the morgue, patients in the ICU
for 25 days with no complications.
28
Analyze, descriptive statistics,
frequencies, select the variable
AGE
700
600
500
400
Statistics
933
0
79.292
81.300
90.0
26.537
763.0
14.0
777.0
200
Std. Dev = 26.54
100
Mean = 79.3
N = 933.00
0
0
5.
77 0
5.
72 .0
5
67 0
5.
62 .0
5
57 0
5.
52 0
5.
47 .0
5
42 0
5.
37 0
5.
32 .0
5
27 0
5.
22 .0
5
17 0
5.
12
.0
75
.0
25
Mean
Median
Mode
Std. Deviation
Range
Minimum
Maximum
Valid
Mis sing
300
Frequency
AGE
N
29
AGE
Analyze, Descriptive Statistics,
Crosstabs
SURVIVAL * 48-DISPOSITION Crosstabulation
Count
48-DISPOSITION
HOME
SURVIVAL
Total
EXPIRED
SURVIVED
224
224
REHABILI
TATION
FACILITY
OTHER
HOSPITAL
56
56
12
12
MORGUE
63
63
SKILLED
NURSING
FACILITY
HOME
WITH
ASSISTA
NCE
AMA
DISCHAR
GE
AGAINST
MEDICAL
ADVICE
201
201
236
236
3
3
8
138
138
Total
63
870
30
933
Correct the data in the original database or
spreadsheet and import a revised version into
the statistical package.

The age of 777 should be checked and
changed to the correct age.

Suspicious values, such as an age of 106
should be checked. In this case it is correct.
31
Interpret the Data
32
Run descriptive statistics to
summarize your data.
SURVIVAL
Valid
EXPIRED
SURVIVED
Total
Frequency
63
870
933
Percent
6.8
93.2
100.0
Valid
Percent
6.8
93.2
100.0
Cumulativ
e Percent
6.8
100.0
49-DAYS IN HOSPITAL
400
Statistics
300
933
0
23.34
19.00
20
18.03
236
1
237
200
Frequency
49-DAYS IN HOSPITAL
N
Valid
Mis sing
Mean
Median
Mode
Std. Deviation
Range
Minimum
Maximum
100
Std. Dev = 18.03
Mean = 23.3
N = 933.00
0
0
0.
24
0
0.
22
0
0.
20
0
0.
18
0
0.
16
0
0.
14
0
0.
12
0
0.
10
.0
80
.0
60
.0
40
.0
20
0
0.
33
49-DAYS IN HOSPITAL
V. You Will Need to Learn
Inferential Statistics
34
P Value




A P value is an estimate of the probability of
results such as yours could have occurred by
chance alone if there truly was no difference or
association.
P < 0.05 = 5% chance, 1 in 20.
P <0.01 = 1% chance, 1 in 100.
Alpha is the threshold. If P is < this threshold, you
consider it statistically significant.
35
Basic formula for inferential tests
Observed  Expected
Test Statistic 
Variabilit y

Based on the total number of observations
and the size of the test statistic, one can
determine the P value.
36
How many noise units?
Signal
Test Statistic 
Noise

Test statistic & sample size (degrees of
freedom) convert to a probability or P
Value.
37
Use inference statistics to test for
differences and associations.

There are hundreds of statistical tests.

A clinical researcher does not need to know them
all.

Learn how to perform the most common tests on
SPSS.

Learn how to use the statistical flowchart to
determine which test to use.
38
VI. You Will Need to
Understand the Statistical
Terminology Required to Select
the Proper Inferential Test
39
Univariate vs. Multivariate
Univariate analysis usually refers to one
predictor variable and one outcome variable
 Is gender a predictor of pneumonia?
 Multivariate analysis usually refers to more
than one predictor variable or more than one
outcome variable being evaluated
simultaneously.
 After adjusting for age, is gender a
predictor of pneumonia?

40
Difference vs. Association


Some tests are designed to assess whether there
are statistically significant differences between
groups.
 Is there a statistically significant difference
between the age of patients with and without
pneumonia?
Some tests are designed to assess whether there
are statistically significant associations between
variables.
 Is the age of the patient associated with the
number of days in the hospital?
41
Unmatched vs. Matched
Some statistical tests are designed to assess
groups that are unmatched or independent.
 Is the admission systolic blood pressure
different between men and women?
 Some statistical tests are designed to assess
groups that are matched or data that are
paired.
 Is the systolic blood pressure different
42
between admission and discharge?

Level of Measurement

Categorical vs. continuous variables
 If you take the average of a continuous
variable, it has meaning.
 Average age, blood pressure, days in
the hospital.
 If you take the average of a categorical
variable, it has no meaning.
 Average gender, race, smoker.
43
Level of Measurement
Nominal - categorical
 gender, race, hypertensive
 Ordinal - categories that can be ranked
 none, light, moderate, heavy smoker
 Interval - continuous
 blood pressure, age, days in the hospital

44
Horse race example



Nominal
 Did this horse come in first place?
 0=no, 1=yes
Ordinal
 In what position did this horse finish?
 1=first, 2=second, 3=third, etc.
Interval (scale)
 How long did it take for this horse to finish?
 60 seconds, etc.
45
46
Normal vs. Skewed Distributions
Parametric statistical test can be used to
assess variables that have a “normal” or
symmetrical bell-shaped distribution curve
for a histogram.
 Nonparamettric statistical test can be used
to assess variables that are skewed or
nonnormal.
 Look at a histogram to decide.

47
Examples of Normal and Skewed
44-DAYS IN ICU
35-SYSTOLIC BLOOD PRESSURE FIRST ER
1000
160
140
800
120
100
600
80
400
40
Frequency
60
200
Std. Dev = 3.99
Std. Dev = 27.74
20
Mean = .9
Mean = 146.9
N = 925.00
0
0
0.
250.0
24 .0
0
230.0
220.0
21 .0
0
200.0
19 .0
0
180.0
170.0
16 .0
0
150.0
140.0
13 .0
0
120.0
11 0
0.
10 0
.
90.0
80 0
.
70.0
60
35-SYSTOLIC BLOOD PRESSURE FIRST ER
N = 933.00
0
0.0
10.0
5.0
20.0
15.0
30.0
25.0
40.0
35.0
50.0
45.0
60.0
55.0
70.0
65.0
44-DAYS IN ICU
48
VII. You Will Need to Know
Which Statistical Test to Use
49
Flowchart of common inferential
statistics

See the handout, Figure 16-1, pages 78-79.
50
Commonly used statistical methods










1. Chi-square
2. Logistic regression
3. Student's t-test
4. Fisher's exact test
5. Cox proportional-hazards
6. Kaplan-Meier method
7. Wilcoxon rank-sum test
8. Log-rank test
9. Linear regression analysis
10.
Mantel-Haenszel method
51
Commonly used statistical methods









11.
One-way analysis of
variance (ANOVA)
12. Mann-Whitney U test
13. Kruskal-Wallis test
14. Repeated-measures analysis of
variance
15. Paired t-test
16. Chi-square test for trend
17.
Wilcoxon signed-rank test
18.
Analysis of variance (twoway)
19. Spearman rank-order
52
correlation
Chi-square
The most commonly used statistical test.
 Used to test if two or more percentages are
different.
 For example, suppose that in a study of 933
patients with a hip fracture, 10% of the men
(22/219) of the men develop pneumonia
compared with 5% of the women (36/714).
 What is the probability that this could
happen by chance alone?
 Univariate, difference, unmatched, nominal,
53
=>2 groups, n=>20.

4
8
E
M
A
o
A
t
7
8
5
P
A
C
C
%
%
%
%
4
2
6
8
P
C
Chi-square example
%
%
%
%
9
4
3
T
C
%
%
%
%
a
m
c
c
p
t
t
s
a
d
i
i
i
l
d
d
f
d
u
b
7
1
7
P
a
4
1
2
C
2
1
1
L
0
8
F
9
1
7
L
3
N
a
C
b
0
54
Fisher’s Exact Test

This test can be used for 2 by 2 tables when
the number of cases is too small to satisfy
the assumptions of the chi-square.
 Total number of cases is <20 or
 The expected number of cases in any cell
is <1 or
 More than 25% of the cells have expected
frequencies <5.
55
6
.9
9
tab
u
H O
S
IC
L
BS
T
E
o
S
E
t
E
a
N
875
70
P
5
A
Co
N
B
CO
7.5
7.5
5.0
E
x
480
%
6%
0%
%
CO
48
%
5%
8%
%
CH
55
58
3
P
Co
R
7.5
8.0
.5
E
x
%
2%
0%
%
CO
48
%
5%
2%
%
CH
933
25
T
8
Co
o
5.0
8.0
3.0
E
x
%
9%
0%
%
CO
48
%
0%
0%
%
CH
u
a
r
m
c
c
p
t
t
s
s
s
a
d
i
i
i
l
d
d
f
d
u
b
5
1
0
P
a
4
1
3
C
2
1
9
L
0
0
F
1
1
0
L
3
N
a
.
C
56
b
1
How to calculate the expected
number in a cell
PNEUMONIA COMPLICATION 480.00-486.99 * CIRRHOSIS OR CHRONIC LIVER 571 Crosstabulation
PNEUMONIA COMPLICATION 480.00-486.99 * CIRRHOSIS OR CHRONIC
LIVER 571 Crosstabulation
Count
CIRRHOSIS OR
CHRONIC LIVER 571
ABSENT
PRESENT
PNEUMONIA
COMPLICATION
480.00-486.99
ABSENT
PRESENT
Total
PNEUMONIA
COMPLICATION
480.00-486.99
Total
870
5
875
55
3
58
925
8
933
PRESENT
PNEUMONIA COMPLICATION 480.00-486.99 * CIRRHOSIS OR CHRONIC LIVER 571
Crosstabulation
PNEUMONIA
COMPLICATION
480.00-486.99
Total
ABSENT
PRESENT
Count
Expected Count
Count
Expected Count
Count
Expected Count
ABSENT
CIRRHOSIS OR
CHRONIC LIVER 571
ABSENT
PRESENT
870
5
867.5
7.5
55
3
57.5
.5
925
8
925.0
8.0
Total
875
875.0
58
58.0
933
933.0
Total
Count
Expected Count
% within PNEUMONIA
COMPLICATION
480.00-486.99
% within CIRRHOSIS OR
CHRONIC LIVER 571
Count
Expected Count
% within PNEUMONIA
COMPLICATION
480.00-486.99
% within CIRRHOSIS OR
CHRONIC LIVER 571
Count
Expected Count
% within PNEUMONIA
COMPLICATION
480.00-486.99
% within CIRRHOSIS OR
CHRONIC LIVER 571
CIRRHOSIS OR
CHRONIC LIVER 571
ABSENT
PRESENT
870
5
867.5
7.5
Total
875
875.0
99.4%
.6%
100.0%
94.1%
62.5%
93.8%
55
57.5
3
.5
58
58.0
94.8%
5.2%
100.0%
5.9%
37.5%
6.2%
925
925.0
8
8.0
933
933.0
99.1%
.9%
100.0%
100.0%
100.0%
100.0%
57
Chi-square for a trend test
Used to assess a nominal variable and an
ordinal variable.
 Does the pneumonia rate increase with the
total number of comorbidities?
 Univariate, association, nominal.
 Analyze, Descriptive Statistics, Crosstabs.

58
Chi-Square Tests
Pears on Chi-Square
Likelihood Ratio
Linear-by-Linear
Ass ociation
N of Valid Cas es
Value
43.381a
34.576
30.522
5
5
Asymp.
Sig.
(2-s ided)
.000
.000
1
.000
df
933
a. 2 cells (16.7%) have expected count less than 5. The
minimum expected count is .37.
PNEUMONIA COMPLICATION 480.00-486.99 * NUMBER OF COMORBIDITES (0-9) Crosstabulation
PNEUMONIA
COMPLICATION
480.00-486.99
ABSENT
PRESENT
Total
Count
% within NUMBER OF
COMORBIDITES (0-9)
Count
% within NUMBER OF
COMORBIDITES (0-9)
Count
% within NUMBER OF
COMORBIDITES (0-9)
.00
250
NUMBER OF COMORBIDITES (0-9)
1.00
2.00
3.00
4.00
292
213
98
19
5.00
3
Total
875
98.8%
94.2%
93.0%
86.0%
90.5%
50.0%
93.8%
3
18
16
16
2
3
58
1.2%
5.8%
7.0%
14.0%
9.5%
50.0%
6.2%
253
310
229
114
21
6
933
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
59
Mantel-Haenszel Method
Used to assess a factor across a number of 2
by 2 tables.
 Is the mortality rate associated with
pneumonia different between trauma centers
and nontrauma centers?
 Analyze, Descriptive Statistics, Crosstabs.

60
61
Student’s t-test
Used to compare the average (mean) in one
group with the average in another group.
 Is the average age of patients significantly
different between those who developed
pneumonia and those who did not?
 Univariate, Difference, Unmatched,
Interval, Normal, 2 groups.

62
S
s
t
u
r
a
ia
d
e
D
i
M ig
e
E
o
d
e
F
a
p
e
ig
t
w
r
i
r
f
p
7
4
1
1
9
9
5
9
2
A
E
63
5
4
1
9
6
2
5
E
Mann-Whitney U test
Same as the Wilcoxon rank-sum
test
 Used in place of the
Student’s t-test when the
data are skewed.
 A nonparametric test that
uses the rank of the value
rather than the actual value.
64
 Univariate, Difference,

Paired t-test
Used to compare the average for
measurements made twice within the same
person - before vs. after.
 Used to compare a treatment group and a
matched control group.
 For example, Did the systolic blood
pressure change significantly from the scene
of the injury to admission?
 Univariate, Difference, Matched, Interval,
65
Normal, 2 groups.

Wilcoxon signed-rank
test

Used to compare two skewed continuous variables
that are paired or matched.

Nonparametric equivalent of the paired t-test.

For example, “Was the Glasgow Coma Scale score
different between the scene and admission?”

Univariate, Difference, Matched, Interval,
Nonnormal, 2 group.
66
ANOVA
One-way used to compare more than 3 means
from independent groups.
“Is the age different between White, Black,
Hispanic patients?”
Two-way used to compare 2 or more means
by 2 or more factors.
“Is the age different between Males and
Females, With and Without Pnuemonia?”
67
Tests of Between-Subjects Effects
Dependent Variable: AGE
Source
Model
SEX
PNEUMON
SEX * PNEUMON
Error
Total
Type III
Sum of
Squares
5769944 a
1981.683
1299.320
519.282
154657.2
5924601
df
4
1
1
1
929
933
Mean
Square
1442486
1981.683
1299.320
519.282
166.477
F
8664.775
11.904
7.805
3.119
Sig.
.000
.001
.005
.078
a. R Squared = .974 (Adjusted R Squared = .974)
68
Kruskal-Wallis One-Way ANOVA
Used to compare continuous variables that
are not normally distributed between more
than 2 groups.
 Nonparametric equivalent to the one-way
ANOVA.
 Is the length of stay different by ethnicity?
 Analyze, nonparametric tests, K
independent samples.

69
Repeated-Measures ANOVA



Used to assess the change in 2 or more continuous
measurement made on the same person. Can also
compare groups and adjust for covariates.
Do changes in the vital signs within the first 24
hours of a hip fracture predict which patients will
develop pneumonia?
Analyze, General Linear Model, Repeated
Measures.
70
Pearson Correlation

Used to assess the linear association
between two continuous variables.
 r=1.0 perfect correlation
 r=0.0 no correlation
 r=-1.0 perfect inverse correlation

Univariate, Association, Interval
71
Correlations
AGE
49-DAYS IN HOSPITAL
NUMBER OF
COMORBIDITES (0-9)
43-TOTAL NUMBER
OF COMPLICATIONS
35-SYSTOLIC BLOOD
PRESSURE FIRST ER
35-GLASGOW COMA
SCALE FIRST ER
35-PULSE FIRST ER
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
35-SYSTO
NUMBER 43-TOTAL
LIC
OF
NUMBER
BLOOD
35-GLASG
49-DAYS
COMORB
OF
PRESSU
OW COMA
IN
IDITES
COMPLIC RE FIRST
SCALE
AGE
HOSPITAL
(0-9)
ATIONS
ER
FIRST ER
1.000
.088**
.211**
.137**
.149**
-.030
.
.007
.000
.000
.000
.356
933
933
933
933
925
926
.088**
1.000
.167**
.453**
.039
.016
.007
.
.000
.000
.237
.633
933
933
933
933
925
926
.211**
.167**
1.000
.222**
.034
-.079*
.000
.000
.
.000
.296
.017
933
.137**
.000
933
.149**
.000
925
-.030
.356
926
-.008
.809
923
933
.453**
.000
933
.039
.237
925
.016
.633
926
.022
.499
923
933
.222**
.000
933
.034
.296
925
-.079*
.017
926
.055
.093
923
933
1.000
.
933
-.033
.310
925
-.028
.393
926
.046
.161
923
35-PULSE
FIRST ER
-.008
.809
923
.022
.499
923
.055
.093
925
926
923
-.033
.310
925
1.000
.
925
.043
.196
925
.069*
.035
923
-.028
.393
926
.043
.196
925
1.000
.
926
-.100**
.002
923
.046
.161
923
.069*
.035
923
-.100**
.002
923
1.000
.
923
**. Correlation is s ignificant at the 0.01 level (2-tailed).
*. Correlation is s ignificant at the 0.05 level (2-tailed).
72
Spearman rank-order
correlation
Use to assess the relationship between two
ordinal variables or two skewed continuous
variables.
 Nonparametric equivalent of the Pearson
correlation.
 Univariate, Association, Ordinal (or
skewed).

73
Correlations
Spearman's rho
AGE
49-DAYS IN HOSPITAL
NUMBER OF
COMORBIDITES (0-9)
43-TOTAL NUMBER
OF COMPLICATIONS
35-SYSTOLIC BLOOD
PRESSURE FIRST ER
35-GLASGOW COMA
SCALE FIRST ER
35-PULSE FIRST ER
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
35-SYSTO
NUMBER 43-TOTAL
LIC
OF
NUMBER
BLOOD
35-GLASG
49-DAYS
COMORB
OF
PRESSU
OW COMA
IN
IDITES
COMPLIC RE FIRST
SCALE
35-PULSE
AGE
HOSPITAL
(0-9)
ATIONS
ER
FIRST ER
FIRST ER
1.000
.089**
.158**
.145**
.091**
-.146**
-.008
.
.007
.000
.000
.005
.000
.806
933
933
933
933
925
926
923
.089**
1.000
.142**
.389**
.073*
.048
.037
.007
.
.000
.000
.027
.149
.268
933
933
933
933
925
926
923
.158**
.142**
1.000
.229**
.037
-.091**
.042
.000
.000
.
.000
.257
.006
.202
933
.145**
.000
933
.091**
.005
925
-.146**
.000
926
-.008
.806
923
933
.389**
.000
933
.073*
.027
925
.048
.149
926
.037
.268
923
933
933
925
926
923
.229**
.000
933
.037
.257
925
-.091**
.006
926
.042
.202
923
1.000
.
933
-.014
.676
925
-.076*
.020
926
.043
.196
923
-.014
.676
925
1.000
.
925
.079*
.017
925
.080*
.015
923
-.076*
.020
926
.079*
.017
925
1.000
.
926
-.038
.252
923
.043
.196
923
.080*
.015
923
-.038
.252
923
1.000
.
923
**. Correlation is s ignificant at the .01 level (2-tailed).
*. Correlation is s ignificant at the .05 level (2-tailed).
74
Summary of Inferential Tests
75
Unpaired vs. Paired





Student’s t-test
Chi-square
One-way ANOVA
Mann-Whitney U test
Kruskal-Wallis H test





Paired t-test
McNemar’s test
Repeated-measures
Wilcoxon signed-rank
Friedman ANOVA
76
Parametric vs. Nonparametric





Student’s t-test
One-way ANOVA
Paired t-test
Pearson correlation
Correlated F ratio
(repeatedmeasures
ANOVA)





Mann-Whitney U test
Kruskal-Wallis test
Wilcoxon signed-rank
Spearman’s r
Friedman ANOVA
77
A Good Rule to Follow
Always check your results with a
nonparametric.
 If you test your null hypothesis with a
Student’s t-test, also check it with a MannWhitney U test.
 It will only take an extra 25 seconds.

78
VIII. You Will Need to
Understand Regression
Techniques
79
Linear Regression
Used to assess how one or more predictor
variables can be used to predict a
continuous outcome variable.
 “Do age, number of comorbidities, or
admission vital signs predict the length of
stay in the hospital after a hip fracture?”
 Multivariate, Association, Interval/Ordinal
dependent variable.

80
Coefficientsa
Model
1
Uns tandardized
Coefficients
B
Std. Error
-4.451
18.889
7.136E-02
.045
(Cons tant)
AGE
NUMBER OF
2.606
COMORBIDITES (0-9)
35-SYSTOLIC BLOOD
1.562E-02
PRESSURE FIRST ER
35-GLASGOW COMA
1.067
SCALE FIRST ER
35-PULSE FIRST ER
2.581E-02
35-RESPIRATION
-8.00E-02
RATE FIRST ER
a. Dependent Variable: 49-DAYS IN HOSPITAL
Standardi
zed
Coefficien
ts
Beta
.053
t
-.236
1.571
Sig.
.814
.117
.548
.159
4.757
.000
.022
.024
.726
.468
1.170
.030
.912
.362
.047
.019
.554
.580
.188
-.014
-.425
.671
81
Logistic Regression

Used to assess the predictive value of one or more
variables on an outcome that is a yes/no question.

“Do age, gender, and comorbidities predict which
hip fracture patients will develop pneumonia?”

Multivariate, Difference, Nominal dependent
variable, not time-dependent, 2 groups.
82
1
2
Total number of
comorbidities
Cirrhosis
3
COPD
4
Gender
5
Age
83
Draw Conclusions
We reject the null hypothesis.
 Patients who are at high risk of developing
pneumonia during their hospitalization for a
hip fracture can be identified by:
 total number of pre-existing conditions
 cirrhosis
 COPD
 male gender

84
How this information could be used
to predict pneumonia on admission
Probabilit y of Pneumonia 




1
(1  e
-Z
)
Z
Z=-4.899 + (number
of comorbidities x 0.469) +
(cirrhosis x 2.275) + (COPD x 0.714) + (age x
0.021) + (gender[female=1, male=0] x –0.715)
e=2.718
Example, an 80 year old male with cirrhosis and
one other comorbidity (but not COPD) had a
99.4% chance of developing pneumonia.
Z=-4.899 + (2 x 0.469) + (1 x 2.275) + (0 x 0.714)
85
+ (80 x 0.021) (0 x –0.715)
Survival Analysis
Kaplan-Meier method
 Used to plot cumulative
survival
 Log-rank test
 Used to compare survival
curves
 Cox proportional-hazards
 Used to adjust for
covariates in survival

86
Odds and Ends You Will Need
87
95% Confidence Intervals



A 95% confidence interval is an estimate that you
make from your sample as to where the true
population value lies.
If your study were to be repeated 100 times, you
would expect the 95% CIs to cross the true value
for the population in 95 of these 100 studies.
 the value might be a mean, percentage or RR
Confidence intervals should be included in
publications for the major findings of the study.
88
Prevalence vs. Incidence
Prevalence
 How many of you now have the flu?
 Incidence
 How many of you have had the flu in the
past year?

89
Random
Random is not the same as haphazard,
unplanned, incidental.
 Allocating patients to the treatment group
on even days and to the control group on
odd days is systematic – not random.
 Random refers to the idea that each element
in a set has an equal probability of
occurrence.

90
Improving a RCT
See the handout, Table 3-2 pages18-19.
 “Checklist to Be Used by Authors When
Preparing or by Readers When Analyzing a
Report of a Randomized Controlled Trial”.

91
IX. You Will Need to Continue
Learning About Statistics
92
Recommended books on
statistics
Kuzma – Statistics in the Health Sciences
 Norusis – Data Analysis with SPSS
 Altman – Statistics with Confidence
 Friedman – Fundamentals of Clinical Trials
 Pagano – Principles of Biostatistics
 Encyclopedia of Biostatistics
 SPSS manuals

93
Future Workshops
94
Future CRC Workshops





Oct 11 - How to use wireless hand-helds for clinical
research
(Paul St Jacques, MD, Anesthesiology)
Oct 18 - How to conduct Anova statistical tests - Part 1/3
(Ayumi Shintani, PhD, MPH, Center for Health Services
Research)
Oct 25 - How to conduct Anova statistical tests - Part 2/3
(Ayumi Shintani, PhD, MPH, Center for Health Services
Research)
Nov 1 - How to conduct Anova statistical tests - Part 3/3
(Ayumi Shintani, PhD, MPH, Center for Health Services
Research)
Nov 8 - How to write a data and safety-monitoring plan
(Harvey Murff, MD)
95
X. One Final Skill
You Will Need to Master
96
A response to the comment: You’re
comparing apples and oranges”

“No – this is comparing apples and
oranges!”
97