What is Statistics? Introduction Bayesian Methods in Mplus

What is Statistics?
Introduction
Bayesian Methods in Mplus

Statistics is about uncertainty
“To err is human, to forgive divine, but to
include errors in your design is statistical”
Mplus Users Group
27 October 2010
Leslie Kish, 1977
Presidential address A.S.A.
2
mcmc
Uncertainty in
Classical Statistics

Uncertainty = sampling distribution




Inference in
Classical Statistics

Estimate population parameter 
Imagine drawing an infinity of samples
Distribution of ˆ over samples
by ˆ

We have only one sample


What does 95% confidence interval
actually mean?

Estimate ˆ and its sampling distribution
Estimate 95% confidence interval

Over an infinity of samples, 95% of these
contain the true population value 
But we have only one sample
We never know if our present estimate ˆ and
confidence interval is one of those 95% or not
3
4
Inference in
Classical Statistics

Uncertainty in
Bayesian Statistics
What does 95% confidence interval NOT
mean?



We have a 95% probability that the true
population value  is within the limits of
our confidence interval
We only have an aggregate assurance that
in the long run 95% of our confidence
intervals contain the true population value
Uncertainty = probability distribution for
the population parameter

In classical statistics the population parameter
 has one single true value



5
Only we happen to not know it
In Bayesian statistics we imagine a distribution
of possible values of population parameter 
Each unknown parameter must have an
associated probability distribution
6
1
Uncertainty in
Bayesian Statistics

Inference in
Bayesian Statistics
Each unknown parameter must have an
associated probability distribution

Before we have data: prior distribution
After we have data:
posterior distribution = f (prior + data)
 Posterior distribution used to find estimate for
 and confidence interval
 ˆ = mode, median, mean of posterior

Posterior distribution to estimate  and
confidence interval







confidence interval = central 95% region
credibility interval, (percentile method)
Posterior = f (prior + data)
Prior distribution influences posterior
Bayesian statistical inference depends partly
on the prior
Which does not depend on the data

(in empirical Bayes it does…)
7
8
Components of
Bayesian Inference



Inference in
Bayesian Statistics
Prior Distribution – probability distribution
to quantify uncertainty about unknown
parameters
Likelihood function – relates all variables
into a full probability model
Posterior Distribution – after using data to
update the prior information about
unknown parameters

Bayesian statistical inference depends
partly on the prior, so: which prior?

Technical considerations



Conjugate prior (posterior belongs to same
distribution family as the prior)
Proper prior (real probability distribution)
Fundamental consideration

Informative prior or ignorance prior?

Total ignorance does not exist … all priors add some
information to the data
9
10
Inference in
Bayesian Statistics



Bayes’ Theorem
Posterior distribution is used to find
estimate for  and confidence interval


ˆ = mode ( to maximum likelihood estimate)

confidence interval = central 95% region
Assumes a simple posterior distribution, so
we can compute its characteristics
In realistically complex models, the
posterior is often intractable
11
Bayes' theorem relates the
probability of the Hypothesis (H) and
of the Data (D):
Pr( H | D ) 

Pr( D | H ) Pr( H )
Pr  D 
http://en.wikipedia.org/wiki/Bayes’ theorem
12
2
Likelihood
Information about μ contained in the data is represented in the
likelihood function
Bayesian Inference in a Nutshell



Prior distribution p(θ) on parameters θ
Likelihood of data y given parameter values
f(y| θ)
Bayes’ theorem to put it all together
p ( | y ) 
Data:
N(5, 0.4)
f ( y | ) p ()
f ( y)
The posterior distribution is proportional to the
likelihood × prior distribution
13
14
PriorPrior
distribution
Distribution
(1)
Posterior Distribution
Bayesian analysis requires specification of a prior distribution for μ,
representing the knowledge about μ before observing the data
The posterior distribution combines the information in the prior and the
likelihood; it represents knowledge about μ after observing the data
Data:
N(5, 0.4)
Data:
N(5, 0.4)
Prior:
N(8, 0.25)
Prior:
N(8, 0.25)
Posterior:
N(6.8, 0.15)
15
16
An Uninformative (Vague) Prior
Posterior Distribution
Bayesian analysis requires specification of a prior distribution for μ,
representing the knowledge about μ before observing the data
The posterior distribution combines the information in the prior and the
likelihood and represents knowledge about μ after observing the data
Data:
N(5, 0.4)
Data:
N(5, 0.4)
Prior:
N(5, 2)
Prior:
N(5, 2)
Posterior:
N(5, 0.33)
17
18
3
“…in realistically complex models, the
posterior is often intractable…”
Computational Issues in
Bayesian Statistics
What Does ‘Intractable’ Mean?
p ( | y ) 
f ( y) 


f ( y | ) p ()
f ( y)




f ( y | ) p () d
In complex models, the posterior is often
intractable (impossible to compute exactly)
Solution: approximate posterior by
simulation

To calculate the posterior distribution we
must integrate a function which is
generally very complex

Simulate many draws from posterior
distribution
Compute mode, median, mean, 95% interval
et cetera from the simulated draws
19
Bayesian Statistics Reaches Parts
that other Statistics Do Not Reach…
Why Bayesian Statistics?


Can do some things that cannot be done in
classical statistics
Valid in small samples


Complex models with complex constraints

Estimation including missing data

Maximum Likelihood is not


20
“Asymptotically we are all dead …”

(Novick)
Always proper estimates

No negative variances


E.g. include a model for the missingness
Each missing data point is just another
parameter to estimate…
Multiple Imputation (MI) of complex data
Estimation of scores for latent variables

Which are all missing…
21
22
Simulating
the Posterior Distribution
Why Bayesian Statistics?


Any disadvantages except computational burden?
Yes: Prior information introduces bias



Markov Chain Monte Carlo (MCMC)

Biased estimates
But hopefully more precise


“In a corner of the forest, Dwells alone my Hiawatha
Permanently cogitating, On the normal law of error
Wondering in idle moments, Whether an increased precision
Might perhaps be rather better, Even at the risk of bias
If thereby one, now and then, Could register upon the target”
Given a draw from a specific probability
distribution, MCMC produces a new
pseudorandom draw from that distribution

Kendall, 1959, ‘Hiawatha designs an experiment’
Gibbs sampling
Metropolis-Hastings

…then Repeat, Repeat, Repeat…
Distributions typically multivariate
(Italics mine)
23
24
4
MCMC Issues: Burn In

Sequence of draws Z(1)




Z(2)
,…, 
MCMC Issues: Burn In
Z(t)

From target distribution f(Z)
Even if Z(1) not from f(Z), the distribution of Z(t)
is f(Z), as t  
So, for arbitrary Z(1), if t is sufficiently large, Z(t)
is from target distribution f(Z)

MCMC must run t iterations ‘burn in’
before we reach target distribution f(Z)


Diagnostics

But having good starting values helps

 MCMC must run t iterations ‘burn in’ before

we reach target distribution f(Z)
How many iterations are needed to
converge on the target distribution?
Examine graph of burn in
Try different starting values
Run several chains in parallel
25
26
MCMC Issues: Monitoring

How many iterations must be monitored?



Summing Up
Depends on required accuracy
Problem: successive draws are correlated
Diagnostics





Probability
Prior

Posterior


Informative prior


Non-informative prior


MCMC methods


Graph successive draws
Compute autocorrelations
Raftery-Lewis: nhat = minimum for quantile
Brooks-Draper: nhat = minimum for mean


Degree of belief
What is known before observing the
data
What is known after observing the
data (prior + data)
Tool to include subjective knowledge
Try to express absence of prior
knowledge

Posterior mainly determined by data
Simulation (sampling) techniques to
obtain the posterior distribution and all
posterior summary measures
27
Bayesian Methods
in Current Software

Example:
Confirmatory Factor Analysis
BUGS


MLwiN

Mplus 6.1

NORM, Amelia

R packages




TITLE: CFA using ML on Holzinger/Swineford data
DATA: FILE IS "Grant.dat";
VARIABLE:
NAMES ARE visperc cubes lozenges paragrap sentence wordmean
gender;
USEVARIABLES ARE visperc cubes lozenges paragrap sentence
wordmean;
ANALYSIS:
TYPE IS GENERAL; ESTIMATOR IS ML;
MODEL:
spatial BY visperc@1 cubes lozenges;
verbal BY paragrap@1 sentence wordmean;
OUTPUT: sampstat standardized;
Bayesian inference Using Gibbs Sampling
Very general, user must set up model

Special implementation for multilevel regression
Very general
Multiple Imputation
LearnBayes, R2Winbugs, MCMCpack
28
29
30
5
Selected Output, ML Estimation
Chi-Square Test of Model Fit
Value
Degrees of Freedom
P-Value
Loglikelihood
H0 Value
H1 Value
Selected Output, ML Estimation
Information Criteria
Number of Free Parameters
19
Akaike (AIC)
5188.256
Bayesian (BIC)
5244.814
Sample-Size Adjusted BIC
5184.692
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.000
90 Percent C.I.
0.000 0.046
Probability RMSEA <= .05
0.957
CFI/TLI
CFI
1.000
TLI
1.026
SRMR (Standardized Root Mean Square Residual)
Value
0.024
3.663
8
0.8862
-2575.128
-2573.297
31
Bayesian CFA, Minimalist Setup
32
Bayesian CFA, Selected Output
TESTS OF MODEL FIT
TITLE: CFA using Bayes on Holzinger/Swineford data
DATA: FILE IS "Grant.dat";
VARIABLE:
NAMES ARE visperc cubes lozenges paragrap sentence wordmean
gender;
USEVARIABLES ARE visperc cubes lozenges paragrap sentence
wordmean;
ANALYSIS:
TYPE IS GENERAL; ESTIMATOR IS BAYES; Choose Bayes
MODEL:
spatial BY visperc@1 cubes lozenges;
verbal BY paragrap@1 sentence wordmean;
OUTPUT: sampstat standardized tech8; Iterations on screen
PLOT: TYPE IS PLOT2; Plots to monitor convergence
Bayesian Posterior Predictive Checking using Chi-Square
95% Confidence Interval for the Difference Between
the Observed and the Replicated Chi-Square Values
-23.549
22.553
Posterior Predictive P-Value
0.500
Proportion observed chi-squares larger than replicated
Information Criterion
Number of Free Parameters
19
Deviance (DIC) Bayesian AIC
5187.886
Estimated Number of Parameters (pD) 17.703
33
Assessing Convergence of MCMC
Chain on Correct Distribution


34
Autocorrelation Plot
Burn-in: Mplus deletes first half of chain
Run multiple chains (Mplus default 2)
PSR statistic compares variances within
chains to variance between chains
 Must be close to 1


Graphical evaluation

Plots of chain
35
36
6
Trace Plot, Two Chains
Posterior Distribution, Kernel Plot
37
38
Some Mplus Commands for
Bayesian Estimation







Let’s analyze!
POINT is mode, median, mean
CHAINS is 2 (4 is nicer)
STVALUES is unperturbed, perturbed, ML
ALGORITHM is Gibbs, MH
PROCESSORS is 1 (2 generally faster)
THIN is 1 (use every #th iteration)
BCONVERGENCE is 0.05 (PSR criterion)


Make stricter if convergence seems a problem
I prefer 0.01 for more precision (more stable)
39
40
An example where Bayesian
analysis is an improvement
ATLAS Example, cntd.

The intervention program ATLAS (Adolescent Training and Learning to
Avoid Steroids) was administered to high school football players to
prevent use of anabolic steroids. Data are from 861 high school
football players.
Example from Bengt Muthén, Bayesian Analysis In Mplus: A Brief Introduction (Incomplete Draft, Version 3, May 17,
2010). Data from MacKinnon, D.P., Lockwood, C.M., & Williams, J. (2004). Confidence limits for the indirect effect:
41
Distribution of the product and resampling methods. Multivariate Behavioral Research, 39, 99-128.

The indirect effect of the intervention on
nutrition via perceived severity of using steroids
is the focus.
The ML estimate is 0.02 (.011), p = 0.056ns
42
7
ATLAS Example, cntd.
ATLAS Example, cntd.
Bayesian Posterior Predictive Checking using Chi-Square
95% Confidence Interval for the Difference Between
the Observed and the Replicated Chi-Square Values
-4.579
5.914
Posterior Predictive P-Value
0.583
MODEL RESULTS (default settings)
Posterior
One-Tailed
95% C.I.
Estimate
S.D.
P-Value Lower 2.5% Upper 2.5%
New/Additional Parameters
INDIRECT
0.018
0.012
0.010
0.002
0.045
MODEL RESULTS
Posterior
Estimate
S.D.
New/Additional Parameters
INDIRECT
0.018
0.012
MODEL RESULTS: chains = 4; bconvergence = 0.01;
Posterior
One-Tailed
95% C.I.
Estimate
S.D.
P-Value Lower 2.5% Upper 2.5%
New/Additional Parameters
INDIRECT
0.019
0.011
0.011
0.002
0.043
One-Tailed
95% C.I.
P-Value Lower 2.5% Upper 2.5%
0.010
0.002
0.045
43
ATLAS Example, cntd.


44
ATLAS Example, setup
Why
is the
estimate
and a
Because
theBayes
indirect
effect significant
does not have
the
ML
estimate
not?
nice symmetric distribution!
TITLE: Mediation using Bayesian analysis ATLAS data from MacKinnon et al.;
DATA: file = mbr2004atlas.dat;
VARIABLE: names = obs group severity nutrit; usevariables = group - nutrit;
ANALYSIS: estimator = bayes;
processors = 2; chains=4; bconvergence=0.01;
MODEL:
severity on group (a);
nutrit on severity (b) group;
MODEL CONSTRAINT:
With Bayes currently no
new(indirect);
INDIRECT or VIA command
indirect = a*b;
OUTPUT: tech8 standardized;
PLOT: type = plot2;
45
46
Prior Normal
=0, 2=very large = 1010
How about the prior?
Default priors in Mplus 6
Most software uses by default uninformative or vague
priors (but all priors add some information)
47
48
8
Prior Normal
=0, 2=very large = 1010
Prior Normal
=0, 2=very large = 1010
Default priors in Mplus 6
Default priors in Mplus 6
49
50
Prior is Inverse Gamma
 (shape)=-1,  (scale)=0
Default priors in Mplus 6
Let’s analyze!
51
Path Diagram
CFA Sibling Data
Two-level CFA, Sibling Data

37 families, 187 children
Scores on 6 intelligence tests

Multilevel structure: children nested in families




52
Between
Example from Hox Multilevel Analysis, 1st Ed., 2002
Problematic because of small family level sample size
Source: Van Peet, A.A.J. (1992). De potentieeltheorie van intelligentie. [The
potentiality theory of intelligence] Amsterdam: University of Amsterdam,
Unpublished Ph.D. Thesis
53
Within
54
9
ML estimation 2-level CFA
ML estimation 2-level CFA
MODEL RESULTS
TITLE: two level factor analysis Van Peet data, using ML estimators;
DATA: FILE IS ggkind.dat;
VARIABLE:
NAMES ARE famnr wordlist cards figures matrices animals occup;
CLUSTER IS famnr;
ANALYSIS:
TYPE IS TWOLEVEL;
ESTIMATOR IS ML;
MODEL:
%BETWEEN%
general by wordlist cards figures matrices animals occup;
%WITHIN%
numeric by wordlist cards matrices;
percept by occup animals figures;
OUTPUT: SAMPSTAT STANDARDIZED;
Residual
Variances
WORDLIST
CARDS
FIGURES
MATRICES
ANIMALS
OCCUP
Estimate
S.E.
Est./S.E.
1.598
3.871
2.315
-0.160
1.085
5.705
1.323
1.769
1.496
0.673
1.400
1.988
1.208
2.188
1.548
-0.237
0.775
2.870
Two-Tailed
P-Value
0.227
0.029
0.122
0.813
0.438
0.004
55
Bayes estimation 2-level CFA
56
Bayes estimation 2-level CFA
MODEL RESULTS
TITLE: two level factor analysis Van Peet data, using ML estimators;
DATA: FILE IS ggkind.dat;
VARIABLE:
NAMES ARE famnr wordlist cards figures matrices animals occup;
CLUSTER IS famnr;
ANALYSIS:
TYPE IS TWOLEVEL;
ESTIMATOR IS Bayes; STVALUES=ML;
MODEL:
%BETWEEN%
general by wordlist cards figures matrices animals occup;
%WITHIN%
numeric by wordlist cards matrices;
percept by occup animals figures;
OUTPUT: SAMPSTAT STANDARDIZED;
Residual
Variances
WORDLIST
CARDS
FIGURES
MATRICES
ANIMALS
OCCUP
Estimate
Posterior One-Tailed
S.D.
P-Value
4.618
3.991
2.150
0.712
3.104
6.389
2.235
2.369
1.625
0.669
1.891
2.511
0.000
0.000
0.000
0.000
0.000
0.000
95% C.I.
L 2.5% U2.5%
1.322 9.820
0.434 9.468
0.201 6.299
0.101 2.604
0.676 7.825
2.888 12.652
57
Specifying your own prior


Examples with different priors
Informative priors are used to incorporate prior
knowledge
Flexible in Mplus, but (Win)Bugs is more flexible
and offers more tools


Tihomir Asparouhov and Bengt Muthen
Bayesian Analysis of Latent Variable
Models using Mplus. (Version 3)
August 11, 2010
Angels fear to tread here… (= be careful)
In models with small variance parameters and
small samples the posterior is often sensitive to
choice of prior


58
File BayesAdvantages18.pdf on
www.statmodel.com
Especially the variance estimates
Do sensitivity analysis (try different priors)
Evans, Hastings, & Peacock (2000). Statistical Distributions. New York: Wiley
59
60
10
Specifying your own prior





Specifying your own prior
Parameters must be labeled
Priors specified via these labels


MODEL: F by y1-y4* (p1-p4); F@1;
MODEL PRIORS: p1-p4 ~ N(0,5);
Use same distribution, but give it different
shape to represent prior information



Parameters must be labeled
Priors specified via these labels
MODEL: F by y1-y4; F (var);
MODEL PRIORS: var ~ IG(0.001,0.001);
Use same distribution, but give it different
shape to represent prior information

61
IG(.001,.001) is BUGS default variance prior
62
11
Important Distinctions
Multiple Imputation in Mplus
using Bayesian Estimation

Missing Completely At Random (MCAR)



Missing At Random (MAR)

Mplus Users Group
27 October 2010


missing values are a random sample of all values
(not related to any observed or unobserved variable)
missing values are a random sample of all values
conditional on other observed variables
(not related to unobserved value, but related to other
observed variables)
Not Missing At Random (NMAR)

missingness is related to unobserved (missing) value
(Litle & Rubin, 1987, p14)
mcmc
Mplus Approaches
to Incomplete Data
Consequences for Analysis




Missing Completely At Random (MCAR)
Missing At Random (MAR)
• Sensitive to
Not Missing At Random (NMAR) misspecification
• Requires ML
MCAR or MAR: Ignorable




Full Information ML estimation


WLS estimation

Bayesian estimation (missing data estimated)

Any estimation method on Multiply Imputed data

construct a model for both the observed data and the
missingness process (difficult)



3
Single versus Multiple
Imputation
Fine, assumes MAR
Fine, assumes MAR
Mplus can combine results automatically
Mplus 6.* can generate MI datasets
4
Multiple Imputation:
Imputation Step
Imputation = fill the holes in the data




Fine, assumes MCAR
usually with best possible estimate
followed by standard analysis
overestimates sample size, thus underestimates error
Multiple Imputation (MI) = do this m times




?
with randomly chosen estimate from distribution of
possible estimates
followed by m standard analyses
the m outcomes are then combined
the variation of the m imputations restores the error
!
?
?
!
?
?
!
!
Var 1 … p
Case 1 … n

Fine, assumes MAR
May require numerical integration (slow/impossible)


analyze all observed data and ignore missing data
MAR is ignorable if proper model & estimation is used
NMAR: Nonignorable/Informative

2
!
!
!
!
...
!
Create m different imputed data sets
5
!
!
!
!
!
!
6
1
Multiple Imputation:
Analysis Step
How can we Create Imputations?

!
!
!
!
!
!
!
!

!
...
Parametric method

Mplus approach
specify a model for complete data
for each missing data point:


!
!
!
!
!
!

Nonparametric method

Do standard complete data analysis m times

group similar cases into adjustment cells
for each missing data point


Combine results
Parametric method


Mplus approach
specify a model for complete data
for each missing data point:



8
How Many Imputations?


estimate predictive distribution of the missing data
impute with a random value from this distribution
Thus Bayesian estimation is used


An estimator based on m <  imputations
has efficiency
1
 


1


m

with  = fraction missing information
missing values are multiple times imputed by taking
random draws from their posterior distribution
Which model?

collect non-missing cases from same adjustment cell
impute with value from randomly selected non-missing case
7
How can we Create Imputations?

estimate predictive distribution of the missing data
impute with a random value from this distribution

Mplus 6.1 default is full covariance matrix
Mplus 6.0 default is H0 model specified
note that   fraction missing data
9
10
How Many Imputations?
About 5 is Often Enough!
Example of Missing Data Analysis

m
.1
.3
3
97
91 86 81 77
GPA data (Hox, 2002, 2010)
 200 College students
 GPA measured on 6 occasions
 Time varying covariate: Job
5
98
94 91 88 85

 No of hours worked in off-campus job
Time invariant covariates: HighGPA, Sex
10
99
97 95 93 92

Variant with GPA variables incomplete
20
100 99 98 97 96
.5
.7
.9
Graham JW, Olchowski AE, Gilreath TD. (2007) How many imputations are really needed? Some practical
clarifications of multiple imputation theory. Prevention Science, 8, 206-213. Advise m > 10

MAR, missingness depends on previous GPA measure
Artificial data, SPSS files GPA2 and GPA2Mis
11
12
2
Latent Curve Model for GPA
Mplus Setup, Incomplete Data
TITLE: Growth model for incomplete GPA data, Hox 2010
DATA: FILE IS "gpa2mis.dat";
VARIABLE:
NAMES ARE student sex highgpa gpa1 - gpa6;
USEVARIABLES ARE gpa1 - gpa6;
MISSING ARE gpa1 – gpa6 (9);
ANALYSIS: ! TYPE IS MISSING; ESTIMATOR IS ML;
MODEL:
interc slope | gpa1@0 gpa2@1 gpa3@2 gpa4@3 gpa5@4
gpa6@5;
OUTPUT: SAMPSTAT STANDARDIZED;
13
Mplus Setup, Incomplete Data
TITLE: Growth model for incomplete GPA data, Hox 2010
DATA: FILE IS "gpa2mis.dat";
VARIABLE:
NAMES ARE student sex highgpa gpa1 - gpa6;
USEVARIABLES ARE gpa1 - gpa6;
MISSING ARE gpa1 – gpa6 (9);
ANALYSIS: Estimator is Bayes; Processors = 2;
Bconvergence = 0.01;
MODEL:
interc slope | gpa1@0 gpa2@1 gpa3@2 gpa4@3 gpa5@4
gpa6@5;
OUTPUT: TECH8 STANDARDIZED;
PLOT: TYPE IS PLOT2;
15
Mplus Setup, Incomplete Data
Multiple Imputation of 10 Data Sets
TITLE: Growth model for incomplete GPA data, Hox 2010
DATA: FILE IS "gpa2mis.dat";
No Model: 
VARIABLE:
imputation from
NAMES ARE student sex highgpa gpa1 - gpa6;
saturated model
USEVARIABLES ARE gpa1 - gpa6;
MISSING ARE gpa1 – gpa6 (9);
ANALYSIS: Estimator is Bayes; Processors = 2;
Bconvergence = 0.01; TYPE = Basic;
DATA IMPUTATION: Ndatasets=10; Impute gpa1 - gpa6;
Save = gpa2imp*.dat;
14
Estimated Means and Variances
Intercept
Mean
Intercept
Variance
Slope
Mean
Slope
Variance
Complete
ML
2.60
Incomplete Incomplete
ML
Bayes
2.61
2.61
0.04
0.03
0.04
0.11
0.10
0.10
0.003
0.004
0.005
16
Mplus Setup, Incomplete Data
Multiple Imputation of 10 Data Sets
TITLE: Growth model for incomplete GPA data, Hox 2010
DATA: FILE IS "gpa2mis.dat";
Imputation model must
Imputation from
VARIABLE:
be at least as complex
NAMES ARE student sex highgpa gpa1 - gpa6; specified
model
as the analysis
model!
USEVARIABLES ARE gpa1 - gpa6;
MISSING ARE gpa1 – gpa6 (9);
ANALYSIS: Estimator is Bayes; Processors = 2;
Bconvergence = 0.01;
MODEL:
interc slope | gpa1@0 gpa2@1 gpa3@2 gpa4@3 gpa5@4 gpa6@5;
DATA IMPUTATION: Ndatasets=10; Impute gpa1 - gpa6;
Save = gpa2imp*.dat;
OUTPUT: TECH8 STANDARDIZED;
PLOT: TYPE IS PLOT2;
17
OUTPUT: TECH8 STANDARDIZED;
PLOT: TYPE IS PLOT2;
18
3
Mplus Setup, Incomplete Data
Multiple Imputation Analysis
Estimated Means and Variances
Analysis of 10 MI datasets
TITLE: Growth model for incomplete GPA data, Hox 2010
DATA: FILE IS "gpa2implist.dat"; TYPE = Imputation;
VARIABLE: NAMES ARE gpa1 - gpa6;
ANALYSIS: ESTIMATOR = ML;
MODEL:
interc slope | gpa1@0 gpa2@1 gpa3@2 gpa4@3 gpa5@4 gpa6@5;
OUTPUT: SAMPSTAT STANDARDIZED;
19
Intercept
Mean
Intercept
Variance
Slope
Mean
Slope
Variance
Multiple Imputation versus
Likelihood Based Procedures

Incomplete Incomplete Incomplete
ML
Bayes
MI (ML)
2.61
2.61
2.61
0.03
0.04
0.03
0.10
0.10
0.10
0.004
0.005
0.004
20
Background Reading
Incomplete Data
ML procedures

+ efficient
- model specific
- complicated
+ if ML estimation slow/impossible use Bayes

MI procedures
+ general, uses standard complete data techniques
(which need not be Likelihood-based)
+ possible to use auxiliary data in Imputation step
- complicated

21
McKnight, P.E., McKnight, K.M., Sidani, S. & Figueredo,
A.J. (2007). Missing data, a gentle introduction. London:
Guilford Press. Very nice
Schafer, J.L., & Graham, J.W. (2002). Missing data: Our
view of the state of the art. Psychological Methods, 7,
147–177. Very readable
Schafer, J.L. (1997). Analysis of incomplete multivariate
data. London: Chapman & Hall. Great, but very technical
22
4