Auxiliary Variables in Mixture Modeling: Distal Outcome Model and an Arbitrary

Auxiliary Variables in Mixture Modeling:
Using the BCH Method in Mplus to Estimate a
Distal Outcome Model and an Arbitrary
Secondary Model
Tihomir Asparouhov and Bengt Muth´en
Mplus Web Notes: No. 21
Version 2
October 7, 2014
1
1
Introduction
In mixture modeling, indicator variables are used to identify an underlying latent
categorical variable. In many practical applications we are interested in using
the latent categorical variable for further analysis and exploring the relationship
between that variable and other, auxiliary observed variables. If we use a direct
approach where the auxiliary variables are included in the mixture model the
latent class variable may have an undesirable shift in the sense that it is no longer
measured simply by the original latent class indicator variables but now it is also
measured by the auxiliary variables. The shift can be so substantial that the
analysis can yield meaningless results because it is no longer based on the original
latent class variable.
Different approaches have been proposed recently to remedy this problem and
are discussed in detail in Asparouhov and Muth´en (2014). Among these are
the 3-step approach proposed by Vermunt (2010) and the approach of Lanza et
al. (2013). Both of these approaches are implemented in Mplus and the details
of that implementation are discussed in Asparouhov and Muth´en (2014). It is
pointed out in Asparouhov and Muth´en (2014) that the 3-step approach does not
resolve the problem of shifting classes completely. In some situations when the
auxiliary variable is included in the final stage the latent class variable can shift
substantially and invalidate the results. Mplus monitors the shift in classes with
the 3-stage approach and if this shift is substantial results are not reported. This
monitoring is conducted with the automatic Mplus commands DU3STEP and
DE3STEP, however, if a manual 3-step approach is conducted the monitoring has
to be done manually as well.
2
Further simulation studies conducted in Bakk and Vermunt (2014) confirm the
finding that the 3-step approach fails in certain situations. Bakk and Vermunt
(2014) also point out that the approach of Lanza (2013) for distal continuous
outcomes, implemented in Mplus with the DCON command, can also fail due to
assumptions underlying this method, primarily related to unequal variance across
classes. The method yields poor results when the entropy is low and there is a
substantial difference between the variances of the distal outcome across classes. If
either one of these is not present then Lanza’s method works well. With categorical
distal outcome Lanza’s method has no such drawbacks or any assumptions that
can be violated.
A method proposed in Bray et al. (2014) appears to yield results similar to
the method in Lanza et al. (2013) for continuous distal outcomes. This method
also fails when the distal outcome has unequal variance across classes.
Bakk and Vermunt (2014) also consider in simulation studies the modified
BCH method, BCH for short, described in Vermunt (2010) and also in Bakk et
al. (2013). For the distal outcome model that evaluates the means across classes
for a continuous auxiliary variable these simulations show that the BCH method
substantially outperforms Lanza’s method and the 3-step method. The BCH
method avoids shifts in latent class in the final stage that the 3-step method is
susceptible to. In its final stage the BCH method uses a weighted multiple group
analysis, where the groups correspond to the latent classes, and thus the class
shift is not possible because the classes are known. In addition, the BCH method
performs well when the variance of the auxiliary variable differs substantially
across classes, i.e., resolving the problems that Lanza’s method is susceptible to.
The BCH method uses weights wij which reflect the measurement error of the
3
latent class variable. In the estimation of the auxiliary model, the i-th observation
in class/group j is assigned a weight of wij and the auxiliary model is estimated
as a multiple group model using these weights. The main drawback of the BCH
method is that it is based on weighting the observations with weights that can take
negative values. If the entropy is large and the latent class variable is measured
without error then the weight wij is 1 if the i-th observation belongs to class j and
zero otherwise. If the entropy is low, however, the weights wij can become negative
and the estimates for the auxiliary model can become inadmissible. For example,
it is possible that the variance of the distal outcome is estimated to a negative
value or that the frequency table of a categorical auxiliary variable has a negative
value. In such cases it would be difficult to utilize the BCH method beyond the
basic distal outcome mean comparison model. Bakk and Vermunt (2014) show
that the means of a continuous distal outcomes can be estimated correctly even
when the sample group specific variances are negative. To obtain an admissible
solution the estimated model holds equal the variances across group/class. In
this simple model the mean and variance estimates are independent and thus the
equal variance restriction has no effect on the mean estimates. However, if one is
interested in evaluating the effect of the latent class variable on a more general
auxiliary model it is not clear how to resolve the problems with inadmissible
solutions due to negative weights.
Two versions of the BCH method are implemented in Mplus. The first version
is referred to as the automatic version. This procedure evaluates the mean of a
continuous distal outcome variable across classes using the approach of Bakk and
Vermunt (2014). In this version one simply specifies the measurement model for
the latent class variable and specifies the auxiliary variable as such. The second
4
version is the manual version which allows us to estimate the effect of a latent class
variable on an arbitrary auxiliary model. This version requires two separate runs.
In the first run we estimate the latent class measurement model and save the BCH
weights. In the second run we estimate the general auxiliary model conditional on
the latent class variable using the BCH weights. Both BCH versions are illustrated
in the next two sections.
2
The automatic BCH approach for estimating
the mean of a distal continuous outcome
across latent class
This approach is very similar to the DU3STEP and DE3STEP commands in
Mplus. With the following input file we estimate a latent class model using the 8
binary indicator variables U1 , ..., U8 . We also independently estimate the mean of
the auxiliary variable Y across the different classes with the BCH method.
Variable:
Names are U1-U8 Y;
Categorical = U1-U8;
Classes = C(4);
Auxiliary = Y(bch);
Data: file=a1.dat;
Analysis: Type = Mixture;
5
The model estimates for the latent class model are not affected by the auxiliary
variable and the results for the auxiliary variable mean estimates can be located
in the output file
3
Using Mplus to conduct the BCH method
with an arbitrary secondary model
In many situations it would be of interest to estimate a more advanced secondary
model with the BCH method. In the Mplus implementation the secondary model
can be an arbitrary model with any number and types of variables. The model
is essentially estimated as a multiple group model as if the latent class variable
is observed. The BCH method uses group specific weights for each observation
that are computed during the latent class model estimation. An outline of the
procedure is as follows. First estimate a latent class model using only the latent
class indicator variables and save the BCH weights. All variables that will be
6
used in the secondary model should be placed in the auxiliary variable command
without any specification. That way the auxiliary variables will be saved in the
same file as the BCH weights. This is step 1 of the estimation. In step 2 we simply
specify the auxiliary model and we use the BCH weights as training data.
3.1
Regression auxiliary model
In the following example we estimate the auxiliary regression model of a dependent
variable Y on a covariate X. We measure a 3-class latent variable using an LCA
model with 10 binary items and then use that latent variable to estimate class
specific regression Y on X. The example and the data are the same as the example
presented on page 332 in Asparouhov and Muth´en (2014). In the first step we use
the following input file to estimate the LCA model and save the BCH weights
Variable:
Names=U1-U10 Y X;
Categorical = U1-U10;
Classes = C(3);
Usevar=U1-U10;
Auxiliary=Y X;
Data: file=manBCH.dat;
Analysis: Type = Mixture;
Savedata: File= manBCH2.dat; Save=bchweights;
Here the key command is Save=bchweights; which requests the BCH weight
7
for further analysis. In the second step the following input file can be used to
estimate the class specific regression of Y on X.
Variable:
Names = U1-U10 Y X W1-W3 MLC;
Usevar are Y X W1-W3;
Classes = C(3);
Training=W1-W3(bch);
Data: file=manBCH2.dat;
Analysis: Type = Mixture; Starts=0; Estimator=mlr;
Model:
%overall%
Y on X;
%C#1%
Y on X;
%C#2%
Y on X;
%C#3%
Y on X;
Note that the latent class indicator variables U1-U10 are not on the USEVAR
list in this step. The key commands here are Training=W1-W3(bch); which
specifies the BCH weights to be used in this secondary analysis, Starts=0;
because this is a multiple group analysis and random starting values are not
8
needed, and Estimator=mlr; because that estimator leads to better standard
errors becasue the analysis utilizes weights, see Bakk and Vermunt (2014). The
results of the auxiliary model estimation are found as usual in the output file of
the second step run.
3.2
Regression auxiliary model combined with latent class
regression
Distal outcomes are often studied in the presence of covariates so that the effect
of the latent class variable on the distal is controlled for by those covariates. This
is a variation on the modeling just discussed where the covariate X influences not
only Y but also the latent class variable. Following is an illustration of the manual
BCH estimation for such a model.
The auxiliary model we are interested in estimating with the BCH method is
given by the following two equations
Y |C = αc + βc X
Exp(γ0c + γ1c X)
P (C = c|X) = P
c Exp(γ0c + γ1c X)
We illustrate this BCH manual estimation with a four class model measured by 8
binary indicators Ui where
P (Ui = 1|C) = 1/(1 + Exp(sci τ ))
where s1p = −1, s4p = 1, s2p = 1 for p = 1, ..., 4, s1p = −1 for p = 5, ..., 8, s3p = −1
9
for p = 1, ..., 4 and s3p = 1 for p = 5, ..., 8. We set the value of τ to 1 to generate
the data. We generate a single data set of size N = 50000 according to the above
model. The first step model input is as follows.
Variable:
Names are U1-U8 y x;
Usevar=U1-U18;
Categorical = U1-U8;
Classes = C(4);
Auxiliary=Y X;
Data: file=1.dat;
Analysis: Type = Mixture; starts=0;
Savedata: File= 2.dat; Save=bchweights;
Model:
%Overall%
%c#1%
U1$1-U8$1*-1.0 ;
%c#2%
U1$1-U4$1*1.0 U5$1-U8$1*-1.0 ;
%c#3%
U1$1-U4$1*-1.0 U5$1-U8$1*1.0 ;
%c#4%
U1$1-U8$1*1.0 ;
10
Starting values are provided so that the class order does not reverse from the
generated order. In real data analysis starting values are not needed. Instead, a
large number of random starting value should be set using the starts command.
The second step input is as follows
Variable:
Names = U1-U8 Y X W1-W4 MLC;
Usevar = Y X W1-W4;
Classes = c(4);
Training=W1-W4(bch);
Data: file=1.dat;
Analysis: Type = Mixture; starts=0;
Model:
%Overall%
C on X;
Y on X;
%c#1%
Y on X;
%c#2%
Y on X;
%c#3%
Y on X;
%c#4%
Y on X;
11
The results of this simulation are presented in Table 1. All estimates are
close to the true parameter values and all but one of them are within the implied
confidence limits. Thus we conclude that the manual BCH approach can be used
for more complex auxiliary models. If we remove the variable Y from the above
example we get an example where the auxiliary variable is a latent class predictor.
Thus the BCH manual approach can be used as an alternative to the R3STEP
auxiliary command which uses a 3-step estimation approach.
4
Simulation study with a continuous distal
auxiliary outcome
In this section we extend the simulation studies presented in Section 6.1 of
Asparouhov and Muth´en (2014) to include the BCH method and the Lanza et al.
(2013) method referred to as DCON. For completeness we describe the simulation
and include the results already presented in that article.
We estimate a 2-class model with 5 binary indicator variables. The distribution
for each binary indicator variable U is determined by the usual logit relationship
P (U = 1|C) = 1/(1 + Exp(τc ))
where C is the latent class variable which takes values 1 or 2 and the threshold
value τc is the same for all 5 binary indicators. In addition we set τ2 = −τ1 for
all five indicators. We choose three values for τ1 to obtain different level of class
separation/entropy. Using the value of τ1 = 1.25 we obtain an entropy of 0.7,
12
Table 1: Manual BCH estimation
Parameter
α1
α2
α3
α4
β1
β2
β3
β4
γ1 1
γ1 2
γ1 3
True Value
0
1
0
2
1
2
-1
0
1
0.5
-0.3
Estimated Value
0.013
0.984
0.123
1.979
0.964
2.043
-0.910
-0.005
1.004
0.542
-0.246
SE
0.035
0.030
0.042
0.022
0.037
0.047
0.046
0.027
0.027
0.029
0.030
with value τ1 = 1 we obtain an entropy of 0.6, and with value τ1 = 0.75 we obtain
an entropy of 0.5. The latent class variable is generated with proportions 43%
and 57%. In addition to the above latent class model we also generate a normally
distributed distal auxiliary variable with mean 0 in class one and mean 0.7 in class
2 and variance 1 in both classes. We apply the pseudo-class method, the 3-step
method, Lanza’s method, the 1-step method, and the BCH method to estimate
the mean of the auxiliary variable in the two classes.
Table 2 presents the results for the mean of the auxiliary variable in class 2.
We generate 500 samples of size 500 and 2000 and analyze the data with the five
methods. The results in Table 2 show that the BCH procedure and the 3-step
procedure have almost identical performance in terms of bias, MSE and coverage.
In this simulation the BCH method shows no bias and the coverage is near the
nominal level with the exception of the case of low entropy of 0.5 and sample size
of 500 where a small bias is observed which also leads to decrease of coverage.
13
Table 2: Distal outcome simulation study: Bias/Mean Squared Error/Coverage
N
500
500
500
2000
2000
2000
Entropy
0.7
0.6
0.5
0.7
0.6
0.5
PC
(E)
.10/.015/.76
.16/.029/.50
.22/.056/.24
.10/.011/.23
.15/.025/.03
.22/.051/.00
3-step
(DU3STEP)
.00/.007/.95
.01/.008/.94
.03/.017/.86
.00/.002/.93
.00/.002/.93
.00/.004/.91
Lanza
(DCON)
.00/.006/.92
.00/.007/.89
.00/.012/.80
.00/.002/.89
.00/.002/.87
.00/.003/.80
1-step
.00/.006/.94
.00/.007/.94
.01/.012/.96
.00/.002/.93
.00/.002/.94
.00/.003/.94
BCH
.00/.007/.94
.01/.008/.94
.03/.017/.86
.00/.002/.93
.00/.002/.94
.00/.004/.91
Next we conduct a simulation study to compare the performance of the four
different methods DU3STEP, DE3STEP, Lanza’s method and the BCH method
in the situation when the distal variable variances are different across class. The
two 3-step approaches DU3STEP and DE3STEP differ in the third step. The
DU3STEP approach estimates different means and variances for the distal variable
in the different classes while the DE3STEP approach estimates different means but
equal variances. The second approach is more robust and more likely to converge
but may suffer from the mis-specification that the variances are held equal in the
different classes. We use the same simulation as above except that we generate a
distal outcome in the second class with variance 20 instead of 1. The results for
the mean in the second class are presented in Table 3.
It is clear from these results that the unequal variance 3-step approach
(DU3STEP) is superior particularly when the class separation is poor (entropy
level of 0.6 or less). The equal variance approach (DE3STEP) can lead to severely
biased estimates when the class separation is poor and the variances are different
across classes. Lanza’s method appears to have completely failed particularly
when the class separation is poor. The BCH method appears to be slightly worse
14
Table 3: Distal outcome with unequal variance simulation study: Bias/Mean
Squared Error/Coverage
N
500
500
500
2000
2000
2000
Entropy
0.7
0.6
0.5
0.7
0.6
0.5
DE3STEP
.05/.147/.95
.06/.174/.96
.12/.822/.93
.05/.040/.92
.09/.056/.92
.11/.094/.95
DU3STEP
.00/.099/.94
.00/.099/.95
.01/.101/.95
.00/.027/.92
.00/.027/.93
.00/.029/.92
Lanza(DCON)
.03/.129/.77
.15/.397/.70
1.20/5.755/.46
.03/.035/.76
.07/.056/.70
1.18/4.613/.44
BCH
.00/.114/.93
.00/.121/.94
.04/.160/.94
.00/.029/.94
.00/.031/.93
.00/.041/.94
than the DU3STEP approach in terms of bias and MSE but the coverage remains
good near the nominal level. Thus for the continuous distal variable estimation if
the distal variable variances are unequal across class we can recommend only the
DU3STEP and the BCH methods.
5
Simulation study with a non-normal distal
auxiliary outcome
In Section 7.1 of Asparouhov and Muth´en (2014) it was shown that when the
distal outcome is not normally distributed the 3-step estimation can fail due
to switching of the classes and the parameter estimates maybe severely biased.
Further simulations illustrating this point were conducted in Bakk and Vermunt
(2014). In this section we conduct a simulation study similar to the those in Bakk
and Vermunt (2014).
We estimate and generate data according to a 4 class LCA model with 8 binary
indicators. The class proportions are as follows: 0.375, 0.25, 0.1875 and 0.1875.
15
The measurement model is described as follows
P (Up = 1|C) = 1/(1 + Exp(scp τ ))
where s2p = 1, s4p = −1, s1p = −1 for p = 1, ..., 5, s1p = 1 for p = 6, ..., 8, s3p = 1
for p = 1, ..., 5 and s3p = −1 for p = 6, ..., 8. We vary the value of τ to obtain
different entropy value and class separation. If τ = 1.5 the entropy is 0.7. If
τ = 1.25 the entropy is 0.6. If τ = 1 the entropy is 0.5. The distal outcome in
class 1 has the following bimodal distribution 0.5N (0, 0.1)+0.5N (−2, 0.1), in class
two it is also bimodal 0.75N (−2/3, 0.1) + 0.25N (2, 0.1), in class 3 it is the normal
distribution N (2, 0.1) and in class 4 it is the normal distribution N (0.5, 0.1). We
use three different sample sizes N =2000, 5000 and 10000 and generate and analyze
500 replications for each size. In this simulation we can expect that the DU3STEP,
DE3STEP, 1-step and PC method to fail due to non-normality and we can expect
Lanza’s method to fail due to varying variances across class.
In Table 4 we present the results for the distal mean in class 2 for the most
favorable case where Entropy=0.7 and N = 10000 for all of the estimation
methods. No results are presented for the DE3STEP and DU3STEP because in
almost all replications there was no convergence due to large differences between
the step 1 class allocation and step 3 class allocation. Mplus will not report any
results if substantial shift in the classes occur in step 3. The remaining methods
fail dramatically as well with the exception of the BCH method. This simple
simulation suggest that BCH may indeed be much more robust than any other
method.
Next we evaluate the performance of the BCH method for different sample
16
Table 4: Non-normal distal outcome simulation study
Method
DE3STEP
DU3STEP
Lanza
BCH
1-Step
PC
Bias
0.663
0.004
0.643
0.155
MSE
0.440
0.001
0.414
0.025
Coverage
0.00
0.89
0.00
0.00
Table 5: Non-normal distal outcome simulation study for the BCH method
N
2000
5000
10000
2000
5000
10000
2000
5000
10000
Entropy
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.5
Bias
0.00
0.00
0.00
0.00
0.00
0.00
0.05
0.01
0.00
MSE
0.007
0.003
0.001
0.016
0.005
0.003
0.057
0.021
0.010
Coverage
0.89
0.89
0.89
0.80
0.82
0.82
0.58
0.59
0.67
17
Std. Err/Std. Dev.
0.82
0.83
0.81
0.62
0.66
0.67
0.42
0.43
0.43
sizes and entropy levels. The results are presented in Table 5. The estimates are
unbiased in all cases with small bias being visible for smaller sample sizes and
entropy levels. On the other hand the coverage drops substantially particularly
when the entropy is low. Also the ratio of the standard errors to the standard
deviation, which should be near 1 for large sample sizes is consistently smaller
and it does not improve with increasing the sample size. For example in the
last row of Table 5 we see that even when the sample size is 10000 and entropy
is 0.5 the ratio is 0.43, i.e., the standard errors are underestimated by 57%
and should be nearly twice to what the method currently computes. This has
been noted also in Bakk and Vermunt (2014) and has been suggested there
that the underestimation occurs due to unaccounted variability of the posterior
probabilities that are used as weights in step 3. The BCH method heavily depends
on these posterior probabilities and one can expect that this effect is substantial.
When the class separation is large the underestimation disappears which also
reflects the diminished variability in the posterior probabilities. At this point no
reasonable method is available to resolve this shortcoming although bootstrapping
would resolve this problem and it can be run in Mplus as external montecarlo
where the bootstrap samples are obtained separately.
6
Summary
Many methods have been proposed in recent years for mixture modeling with
auxiliary variables. To clarify the choice of method, Table 6 and 7 list the Mplus
options, give their intended use, and give recommendations on which method
should be used for which purpose.
18
Table 6: Alternative auxiliary settings for mixture modeling
DU3STEP
Useage:
Continuous distal outcomes
Description; reference:
Classification-error corrected; Vermunt (2010) and Asparouhov-Muth´en (2014)
Pros and cons:
Susceptible to class changes. Mplus will not report results if the class formation changes.
Manual version also available for an arbitrary auxiliary model, including controlling for covariates.
Estimates unequal distal variances across classes.
Recommendation:
Preferred method for continuous distal outcomes
Use when Mplus reports results, i.e., there are no class formation changes, otherwise use BCH.
BCH
Useage:
Continuous distal outcomes
Description; reference:
Measurement-error weighted; Bakk and Vermunt (2014)
Pros and cons:
Avoids class changes. Avoids the DCON shortcomings with class-varying variances for distals.
Manual version also available for an arbitrary auxiliary model, including controlling for covariates.
Possible SE underestimation with low entropy.
Recommendation:
Preferred method for continuous distal outcomes
DCAT
Useage:
Categorical distal outcomes
Description; reference:
Distal treated as covariate; Lanza et al. (2013)
Pros and cons:
Avoids class changes
Recommendation:
Preferred method for categorical distal outcomes
R3STEP
Useage:
Covariates
Description; reference:
Classification-error corrected; Vermunt (2010)
Pros and cons:
Works well
Recommendation:
Recommended method with covariates
19
Table 7: Alternative auxiliary settings for mixture modeling, continued
DE3STEP
Useage:
Continuous distal outcomes. Equal distal variances across classes
Description; reference:
Classification-error corrected; Vermunt (2010) and Asparouhov-Muth´en (2014)
Pros and cons:
Susceptible to class changes and class-varying variances.
Mplus will not report results if the class formation changes.
Recommendation:
Inferior to BCH and DU3STEP.
Use only when DU3STEP does not converge.
DCON
Useage:
Continuous distal outcomes
Description; reference:
Distal treated as covariate; Lanza et al. (2013) and Asparouhov-Muth´en (2014)
Pros and cons:
Avoids class changes. Sensitive to class-varying variances for distals when entropy is low
Recommendation:
Inferior to BCH and DU3STEP when DU3STEP does not change the class formation.
Use only when entropy is higher than 0.6.
If variance appears to be varying across class more than a factor of 2 do not use this method.
This check can be done using most likely class assignment - it is not done automatically by Mplus.
Use only for methods research purposes
E
Useage:
Continuous distal outcomes
Description; reference:
Pseudo-class (PC) method; Wang et al. (2005)
Pros and cons:
Gives biased results
Recommendation:
Superseded by BCH and DU3STEP. Use only for methods research purposes
R
Useage:
Covariates
Description; reference:
Pseudo-class (PC) method; Wang et al. (2005)
Pros and cons:
Gives biased results
Recommendation:
Superseded by R3STEP. Use only for methods research purposes
20
References
[1] Asparouhov T. & Muth´en B. (2014). Auxiliary variables in mixture modeling:
Three-step approaches using Mplus. Structural Equation Modeling:
A
Multidisciplinary Journal, 21, 329-341.
[2] Bakk, Z. and Vermunt, J.K. (2014). Robustness of stepwise latent
class
in
modeling
Structural
with
continuous
Equation
distal
Modeling:
A
outcomes.
Forthcoming
Multidisciplinary
Journal.
http://members.home.nl/jeroenvermunt/bakk2014.pdf
[3] Bakk, Z., Tekle, F.B., & Vermunt, J.K. (2013). Estimating the association
betwen latent class membership and external variables using bias adjusted
three-step approaches. In T.F. Liao (ed.), Sociological Methodology.
Thousand Oake, CA: SAGE publications.
[4] Bray,
B.C.,
Lanza,
S. T. & Tan,
X. (2014) Eliminating Bias
in Classify-Analyze Approaches for Latent Class Analysis. Forthcoming in Structural Equation Modeling:
A Multidisciplinary Journal.
http://dx.doi.org/10.1080/10705511.2014.935265
[5] Lanza S. T., Tan X., & Bray B. C. (2013). Latent Class Analysis With
Distal Outcomes: A Flexible Model-Based Approach. Structural Equation
Modeling, 20, 1-26.
[6] Vermunt, J. K. (2010). Latent Class Modeling with Covariates:
Improved Three-Step Approaches. Political Analysis, 18, 450-469.
21
Two
[7] Wang C.P., Brown, C.H., Bandeen-Roche, K. (2005). Residual diagnostics
for growth mixture models: Examining the impact of preventive intervention
on multiple trajectories of aggressive behavior. Journal of the American
Statistical Association, 100 (3), 1054-1076.
22