An_Introduction_to_LPA_Using_MplusAutomation
Figure. Gaussian mixture models. Data simulated from a 2-class model.
Preparation
Download repository here: https://github.com/MM4DBER/Intro_to_LPA
Load packages
library(tidyverse)
library(glue)
library(MplusAutomation)
library(here)
library(gt)
library(tidyLPA)
library(patchwork)
Variance-covariance specifications in LPA
Which variance-covariance specification is appropriate?
Figure. Picture adapted from Masyn (2013)
The variance-covariance matrix:
Figure. Picture adapted from tutorial (Rosenberg, 2019)
Terminology for specifying variance-covariance matrix:
Equal variances
: Variances are fixed to equality across class. (i.e., variances are constrained to be equal for each class)Free variances
: Variances parameters are freely estimated across class (i.e., variances vary by class)Covariances fixed to 0
: No covariances are estimated (i.e., the off-diagonal cells of the matrix are zero)Free covariances
: The covariances are freely estimated (e.g., a 4-class model has 6 covariance parameters estimated per class)Equal covariances
: The covariances are fixed to equality across class (i.e., covariances are constrained to be equal across class)
Model presentation order:
- Models are presented in the order described in the book chapter by Masyn (2013; p.591).
- An additional model
3a
demonstrates the specification ofcovariances fixed to equality across class
. - NOTE: Many model specifications are possible. Among variance and covariance parameter groups any combination of these parameters can be freely estimated or fixed.
Example: For an LPA model with 4 indicators the variance-covariance matrix includes 4 variance and 7 covariance parameters per class. A methods effect may be indentified in a survey (e.g., similar item wording) which provides a rationale to freely estimate the associated covariance. However, the remaining 6 covariances may be negligible in magnitude providing an empirical basis to fix this sub-set of covariances at zero.
Models 1-4 from book chapter in Masyn (2013):
model 1
: Equal variances, and covariances fixed to 0model 2
: Free variances and covariances fixed to 0
model 3
: Equal variances, and free covariancesmodel 3a
: Equal variances, and equal covariances (extra model)model 4
: Free variances and free covariances
Example: Math, Science, Physics, and Biology test score measures (LSAY).
Data source: The example presented in this tutorial utilizes 4 test score measures (Math, Science, Physics, and Biology) from the public-use data source, The Longitudinal Survey of American Youth (LSAY): \(\color{blue}{\text{See documentation here}}\)
Read in data
<- read_csv(here("data", "lsay_lpa.csv")) lsay_data
model 1
: Equal variances (fixed) & covariances
fixed to zero
In model-1 the item variances are fixed to equality
across latent classes (4 item variances per-class) and covariance
parameters are fixed to zero (not estimated). In the applied example,
each of the 4 items has a unique variance estimated but the variances
are held to equality across each of the latent classes in the example
(4*1= 4 variance parameters
). Relating this to the
variance-covariance matrix this model is also named the,
class-invariant, diagonal model. This is the default
model in Mplus. Because it is a relatively parsimonious model it
provides a good entry point to begin an LPA analysis.
Figure. Picture of “diagonal class-invariant model” adapted from Masyn (2013)
Note: Model variance-covariance specifications are
stated explicitly here for pedagogical purposes.
Model 1
can be replicated by removing the syntax under the
MODEL=
command as this is the default specification in
Mplus.
<- lapply(1:6, function(k) {
lpa_m1 <- if_else(k < 2 , '!' , '')
z2 <- if_else(k < 3 , '!' , '')
z3 <- if_else(k < 4 , '!' , '')
z4 <- if_else(k < 5 , '!' , '')
z5 <- if_else(k < 6 , '!' , '')
z6
<- mplusObject(
lpa_enum
TITLE = glue("M1: Class{k} "),
VARIABLE = glue(
"usevar = mth_scor-bio_scor;
classes = c({k}); "),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 200 50;
processors = 10;",
MODEL = glue(
"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! Syntax replicates default Mplus specification !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
%c#1%
[mth_scor-bio_scor]; !!! means are are freely estimated (default)
mth_scor-bio_scor(1-4); !!! variances are fixed to equality (default)
!!! covariances are fixed to zero (default)
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor(1-4);
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor(1-4);
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor(1-4);
{z5} %c#5%
{z5} [mth_scor-bio_scor];
{z5} mth_scor-bio_scor(1-4);
{z6} %c#6%
{z6} [mth_scor-bio_scor];
{z6} mth_scor-bio_scor(1-4);"),
OUTPUT = "sampstat residual tech11 tech14;",
PLOT =
"type = plot3;
series = mth_scor-bio_scor(*);",
usevariables = colnames(lsay_data),
rdata = lsay_data)
<- mplusModeler(lpa_enum,
lpa_m1_fit dataout=glue(here("m1_enum", "c_lpa_m1.dat")),
modelout=glue(here("m1_enum", "c{k}_lpa_m1.inp")) ,
check=TRUE, run = TRUE, hashfilename = FALSE)
})
Model 1: Produce model fit table
<- readModels(here("m1_enum"), quiet = TRUE)
m1_output
<- LatexSummaryTable(m1_output,
m1_enum keepCols=c("Title","Parameters", "LL", "BIC",
"aBIC", "BLRT_PValue", "T11_VLMR_PValue",
"Observations"),
sortBy = "Title")
gt(m1_enum)
Title | Parameters | LL | BIC | aBIC | BLRT_PValue | T11_VLMR_PValue | Observations |
---|---|---|---|---|---|---|---|
M1: Class1 | 8 | -46288.29 | 92640.89 | 92615.47 | NA | NA | 3103 |
M1: Class2 | 13 | -43352.36 | 86809.23 | 86767.93 | 0 | 0.0000 | 3103 |
M1: Class3 | 18 | -42126.11 | 84396.93 | 84339.74 | 0 | 0.0000 | 3103 |
M1: Class4 | 23 | -41433.72 | 83052.37 | 82979.29 | 0 | 0.0000 | 3103 |
M1: Class5 | 28 | -41051.51 | 82328.14 | 82239.17 | 0 | 0.1547 | 3103 |
M1: Class6 | 33 | -40755.99 | 81777.31 | 81672.46 | 0 | 0.0149 | 3103 |
model 2
: Free variances & covariances fixed to
zero
In model-2 the variances are freely estimated across
latent classes (4 item variances per-class;
4*4=16 variance parameters
) and covariance parameters are
fixed to zero. In regards to the variance-covariance matrix this model
is also named the, class-varying, diagonal model.
<- lapply(1:6, function(k) {
lpa_m2 <- if_else(k < 2 , '!' , '')
z2 <- if_else(k < 3 , '!' , '')
z3 <- if_else(k < 4 , '!' , '')
z4 <- if_else(k < 5 , '!' , '')
z5 <- if_else(k < 6 , '!' , '')
z6
<- mplusObject(
lpa_enum
TITLE = glue("M2: Class{k} "),
VARIABLE = glue(
"usevar = mth_scor-bio_scor;
classes = c({k}); "),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 200 50;
processors = 10;",
MODEL = glue(
"%c#1%
[mth_scor-bio_scor]; !!! means are are freely estimated
mth_scor-bio_scor; !!! variances are freely estimated
!!! covariances are fixed to zero (default)
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor;
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor;
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor;
{z5} %c#5%
{z5} [mth_scor-bio_scor];
{z5} mth_scor-bio_scor;
{z6} %c#6%
{z6} [mth_scor-bio_scor];
{z6} mth_scor-bio_scor;
"),
OUTPUT = "sampstat residual tech11 tech14;",
PLOT =
"type = plot3;
series = mth_scor-bio_scor(*);",
usevariables = colnames(lsay_data),
rdata = lsay_data)
<- mplusModeler(lpa_enum,
lpa_m2_fit dataout=glue(here("m2_enum", "c_lpa_m2.dat")),
modelout=glue(here("m2_enum", "c{k}_lpa_m2.inp")) ,
check=TRUE, run = TRUE, hashfilename = FALSE)
})
Model 2: Produce model fit table
<- readModels(here("m2_enum"), quiet = TRUE)
m2_output
<- LatexSummaryTable(m2_output,
m2_enum keepCols=c("Title","Parameters", "LL", "BIC",
"aBIC", "BLRT_PValue", "T11_VLMR_PValue",
"Observations"),
sortBy = "Title")
gt(m2_enum)
Title | Parameters | LL | BIC | aBIC | BLRT_PValue | T11_VLMR_PValue | Observations |
---|---|---|---|---|---|---|---|
M2: Class1 | 8 | -46288.29 | 92640.89 | 92615.47 | NA | NA | 3103 |
M2: Class2 | 17 | -43313.36 | 86763.40 | 86709.38 | 0 | 0.0000 | 3103 |
M2: Class3 | 26 | -42030.92 | 84270.88 | 84188.27 | 0 | 0.0000 | 3103 |
M2: Class4 | 35 | -41324.70 | 82930.80 | 82819.59 | 0 | 0.0002 | 3103 |
M2: Class5 | 44 | -40939.64 | 82233.04 | 82093.24 | 0 | 0.0281 | 3103 |
M2: Class6 | 53 | -40668.83 | 81763.79 | 81595.38 | 0 | 0.0842 | 3103 |
model 3
: Equal (fixed) variances & free
covariances
In model-3 the variances are fixed to equality
across latent classes and covariance parameters are freely estimated. In
relation to the variance-covariance matrix this model is also named the,
class-invariant, non-diagonal model. In the example
below a total of 4
variance parameters and 24
covariance parameters are estimated (i.e.,
6 covariances per-class
). Note that although this example
demonstrates a model with all covariances estimated, any number of the
covariance (or variance) parameters can be either fixed or freely
estimated. For example, 2 item covariances may have substantive meaning
while no theoretical precedent exists for estimating the remaining 4
pairwise item covariances.
<- lapply(1:4, function(k) {
lpa_m3 <- if_else(k < 2 , '!' , '')
z2 <- if_else(k < 3 , '!' , '')
z3 <- if_else(k < 4 , '!' , '')
z4
<- mplusObject(
lpa_enum
TITLE = glue("M3: Class{k} "),
VARIABLE = glue(
"usevar = mth_scor-bio_scor;
classes = c({k}); "),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 200 50;
processors = 10;",
MODEL = glue(
"%c#1%
[mth_scor-bio_scor]; !!! means are freely estimated
mth_scor-bio_scor(1-4); !!! variances are fixed to equality
mth_scor with sci_scor; !!! covariances are freely estimated
mth_scor with phy_scor;
mth_scor with bio_scor;
sci_scor with phy_scor;
sci_scor with bio_scor;
phy_scor with bio_scor;
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor(1-4);
{z2} mth_scor with sci_scor;
{z2} mth_scor with phy_scor;
{z2} mth_scor with bio_scor;
{z2} sci_scor with phy_scor;
{z2} sci_scor with bio_scor;
{z2} phy_scor with bio_scor;
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor(1-4);
{z3} mth_scor with sci_scor;
{z3} mth_scor with phy_scor;
{z3} mth_scor with bio_scor;
{z3} sci_scor with phy_scor;
{z3} sci_scor with bio_scor;
{z3} phy_scor with bio_scor;
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor(1-4);
{z4} mth_scor with sci_scor;
{z4} mth_scor with phy_scor;
{z4} mth_scor with bio_scor;
{z4} sci_scor with phy_scor;
{z4} sci_scor with bio_scor;
{z4} phy_scor with bio_scor;"),
OUTPUT = "sampstat residual tech11 tech14;",
PLOT =
"type = plot3;
series = mth_scor-bio_scor(*);",
usevariables = colnames(lsay_data),
rdata = lsay_data)
<- mplusModeler(lpa_enum,
lpa_m3_fit dataout=glue(here("m3_enum", "c_lpa_m3.dat")),
modelout=glue(here("m3_enum", "c{k}_lpa_m3.inp")) ,
check=TRUE, run = TRUE, hashfilename = FALSE)
})
Model 3: Produce model fit table
<- readModels(here("m3_enum"), quiet = TRUE)
m3_output
<- LatexSummaryTable(m3_output,
m3_enum keepCols=c("Title","Parameters", "LL", "BIC",
"aBIC", "BLRT_PValue", "T11_VLMR_PValue",
"Observations"),
sortBy = "Title")
gt(m3_enum)
Title | Parameters | LL | BIC | aBIC | BLRT_PValue | T11_VLMR_PValue | Observations |
---|---|---|---|---|---|---|---|
M3: Class1 | 14 | -39883.58 | 79879.73 | 79835.24 | NA | NA | 3103 |
M3: Class2 | 25 | -39769.44 | 79739.87 | 79660.44 | 0 | 0.0000 | 3103 |
M3: Class3 | 36 | -39724.25 | 79737.94 | 79623.55 | 0 | 0.1976 | 3103 |
M3: Class4 | 47 | -39688.14 | 79754.17 | 79604.83 | 0 | 0.0322 | 3103 |
model 3a
: Equal variances & equal (fixed)
covariances
In model-3a the variances are fixed to equality across latent classes and covariance parameters are fixed to equality across the 4 latent classes. In-other-words, this model estimates 4 item variances & 6 item covariances. Each unique item has a separate variance estimated and each pairwise covariance has a separate covariance estimated but all the variance-covariance parameters are held to equality across class.
<- lapply(1:4, function(k) {
lpa_m3a <- if_else(k < 2 , '!' , '')
z2 <- if_else(k < 3 , '!' , '')
z3 <- if_else(k < 4 , '!' , '')
z4
<- mplusObject(
lpa_enum
TITLE = glue("M3a: Class{k} "),
VARIABLE = glue(
"usevar = mth_scor-bio_scor;
classes = c({k}); "),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 200 50;
processors = 10;",
MODEL = glue(
"%c#1%
[mth_scor-bio_scor]; !!! means are freely estimated
mth_scor-bio_scor(1-4); !!! variances are fixed to equality
mth_scor with sci_scor(5); !!! covariances are fixed to equality
mth_scor with phy_scor(6);
mth_scor with bio_scor(7);
sci_scor with phy_scor(8);
sci_scor with bio_scor(9);
phy_scor with bio_scor(10);
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor(1-4);
{z2} mth_scor with sci_scor(5);
{z2} mth_scor with phy_scor(6);
{z2} mth_scor with bio_scor(7);
{z2} sci_scor with phy_scor(8);
{z2} sci_scor with bio_scor(9);
{z2} phy_scor with bio_scor(10);
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor(1-4);
{z3} mth_scor with sci_scor(5);
{z3} mth_scor with phy_scor(6);
{z3} mth_scor with bio_scor(7);
{z3} sci_scor with phy_scor(8);
{z3} sci_scor with bio_scor(9);
{z3} phy_scor with bio_scor(10);
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor(1-4);
{z4} mth_scor with sci_scor(5);
{z4} mth_scor with phy_scor(6);
{z4} mth_scor with bio_scor(7);
{z4} sci_scor with phy_scor(8);
{z4} sci_scor with bio_scor(9);
{z4} phy_scor with bio_scor(10);"),
OUTPUT = "sampstat residual tech11 tech14;",
PLOT =
"type = plot3;
series = mth_scor-bio_scor(*);",
usevariables = colnames(lsay_data),
rdata = lsay_data)
<- mplusModeler(lpa_enum,
lpa_m3a_fit dataout=glue(here("m3a_enum", "c_lpa_m3a.dat")),
modelout=glue(here("m3a_enum", "c{k}_lpa_m3a.inp")) ,
check=TRUE, run = TRUE, hashfilename = FALSE)
})
Model 3a: Produce model fit table
<- readModels(here("m3a_enum"), quiet = TRUE)
m3a_output
<- LatexSummaryTable(m3a_output,
m3a_enum keepCols=c("Title","Parameters", "LL", "BIC",
"aBIC", "BLRT_PValue", "T11_VLMR_PValue",
"Observations"),
sortBy = "Title")
gt(m3a_enum)
Title | Parameters | LL | BIC | aBIC | BLRT_PValue | T11_VLMR_PValue | Observations |
---|---|---|---|---|---|---|---|
M3a: Class1 | 14 | -39883.58 | 79879.73 | 79835.24 | NA | NA | 3103 |
M3a: Class2 | 19 | -39821.05 | 79794.86 | 79734.49 | 0 | 0.000 | 3103 |
M3a: Class3 | 24 | -39776.46 | 79745.89 | 79669.63 | 0 | 0.022 | 3103 |
M3a: Class4 | 29 | -39745.99 | 79725.14 | 79633.00 | 0 | 0.003 | 3103 |
model 4
: Free variances & free covariances
In model-4 the variances are freely estimated across
latent classes (4 item variances per-class;
4*4=16 variance parameters
) and covariance are freely
estimated (6 pairwise item covariances per-class;
6*4=24 covariance parameters
). Note that although this
example demonstrates a model with all variance/covariance parameters
estimated, any number of the covariance (or variance) parameters can be
either fixed or freely estimated.
Figure. Picture adapted from Masyn (2013)
<- lapply(1:4, function(k) {
lpa_m4 <- if_else(k < 2 , '!' , '')
z2 <- if_else(k < 3 , '!' , '')
z3 <- if_else(k < 4 , '!' , '')
z4
<- mplusObject(
lpa_enum
TITLE = glue("M4: Class{k} "),
VARIABLE = glue(
"usevar = mth_scor-bio_scor;
classes = c({k}); "),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 200 50;
processors = 10;",
MODEL = glue(
"%c#1%
[mth_scor-bio_scor]; !!! means are freely estimated
mth_scor-bio_scor; !!! variances are freely estimated
mth_scor with sci_scor; !!! covariances are freely estimated
mth_scor with phy_scor;
mth_scor with bio_scor;
sci_scor with phy_scor;
sci_scor with bio_scor;
phy_scor with bio_scor;
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor;
{z2} mth_scor with sci_scor;
{z2} mth_scor with phy_scor;
{z2} mth_scor with bio_scor;
{z2} sci_scor with phy_scor;
{z2} sci_scor with bio_scor;
{z2} phy_scor with bio_scor;
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor;
{z3} mth_scor with sci_scor;
{z3} mth_scor with phy_scor;
{z3} mth_scor with bio_scor;
{z3} sci_scor with phy_scor;
{z3} sci_scor with bio_scor;
{z3} phy_scor with bio_scor;
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor;
{z4} mth_scor with sci_scor;
{z4} mth_scor with phy_scor;
{z4} mth_scor with bio_scor;
{z4} sci_scor with phy_scor;
{z4} sci_scor with bio_scor;
{z4} phy_scor with bio_scor;"),
OUTPUT = "sampstat residual tech11 tech14;",
PLOT =
"type = plot3;
series = mth_scor-bio_scor(*);",
usevariables = colnames(lsay_data),
rdata = lsay_data)
<- mplusModeler(lpa_enum,
lpa_m4_fit dataout=glue(here("m4_enum", "c_lpa_m4.dat")),
modelout=glue(here("m4_enum", "c{k}_lpa_m4.inp")) ,
check=TRUE, run = TRUE, hashfilename = FALSE)
})
Model 4: Produce model fit table
<- readModels(here("m4_enum"), quiet = TRUE)
m4_output
<- LatexSummaryTable(m4_output,
m4_enum keepCols=c("Title","Parameters", "LL", "BIC",
"aBIC", "BLRT_PValue", "T11_VLMR_PValue",
"Observations"),
sortBy = "Title")
gt(m4_enum)
Title | Parameters | LL | BIC | aBIC | BLRT_PValue | T11_VLMR_PValue | Observations |
---|---|---|---|---|---|---|---|
M4: Class1 | 14 | -39883.58 | 79879.73 | 79835.24 | NA | NA | 3103 |
M4: Class2 | 29 | -39750.04 | 79733.26 | 79641.11 | 0 | 0.0000 | 3103 |
M4: Class3 | 44 | -39667.68 | 79689.12 | 79549.31 | 0 | 0.0000 | 3103 |
M4: Class4 | 59 | -39622.03 | 79718.43 | 79530.96 | 0 | 0.0028 | 3103 |
Final table of model fit (all models
)
<- rbind(m1_enum,m2_enum,m3_enum,m3a_enum,m4_enum)
allFit
<- allFit %>%
allFit1 mutate(aBIC = -2*LL+Parameters*log((Observations+2)/24)) %>%
mutate(CAIC = -2*LL+Parameters*(log(Observations)+1)) %>%
mutate(AWE = -2*LL+2*Parameters*(log(Observations)+1.5)) %>%
mutate(SIC = -.5*BIC) %>%
mutate(expSIC = exp(SIC - max(SIC))) %>%
mutate(BF = exp(SIC-lead(SIC))) %>%
mutate(cmPk = expSIC/sum(expSIC)) %>%
select(1:5,9:10,6:7,13,14)
Format the model fit table
%>%
allFit1 mutate(Title = str_remove_all(Title, "M...")) %>%
gt() %>%
tab_header(
title = md("**Model Fit Summary Table**"), subtitle = md(" ")) %>%
cols_label(
Title = "Classes",
Parameters = md("Par"),
LL = md("*LL*"),
T11_VLMR_PValue = "VLMR",
BLRT_PValue = "BLRT",
BF = md("BF"),
cmPk = md("*cmPk*")) %>%
tab_footnote(
footnote = md(
"*Note.* Par = Parameters; *LL* = model log likelihood;
BIC = Bayesian information criterion;
aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion;
AWE = approximate weight of evidence criterion;
BLRT = bootstrapped likelihood ratio test p-value;
VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value;
*cmPk* = approximate correct model probability."),
locations = cells_title()) %>%
tab_options(column_labels.font.weight = "bold") %>%
fmt_number(10,decimals = 2,
drop_trailing_zeros=TRUE,
suffixing = TRUE) %>%
fmt_number(c(3:9,11),
decimals = 2) %>%
sub_missing(1:11,
missing_text = "--") %>%
fmt(c(8:9,11),
fns = function(x)
ifelse(x<0.001, "<.001",
::number(x, accuracy = 0.01))) %>%
scalesfmt(10, fns = function(x)
ifelse(x>100, ">100",
::number(x, accuracy = .1))) %>%
scalestab_row_group(
label = "Model-1",
rows = 1:6) %>%
tab_row_group(
label = "Model-2",
rows = 7:12) %>%
tab_row_group(
label = "Model-3",
rows = 13:16) %>%
tab_row_group(
label = "Model-3a",
rows = 17:20) %>%
tab_row_group(
label = "Model-4",
rows = 21:24) %>%
row_group_order(
groups = c("Model-1","Model-2","Model-3",
"Model-3a","Model-4")) %>%
tab_style(style = list(cell_text(weight = "bold")),
locations = list(
cells_body(columns = BIC, row = BIC == min(BIC[c(1:6)])),
cells_body(columns = BIC, row = BIC == min(BIC[c(7:12)])),
cells_body(columns = BIC, row = BIC == min(BIC[c(13:16)])),
cells_body(columns = BIC, row = BIC == min(BIC[c(17:20)])),
cells_body(columns = BIC, row = BIC == min(BIC[c(21:24)])),
cells_body(columns = aBIC, row = aBIC == min(aBIC[c(1:6)])),
cells_body(columns = aBIC, row = aBIC == min(aBIC[c(7:12)])),
cells_body(columns = aBIC, row = aBIC == min(aBIC[c(13:16)])),
cells_body(columns = aBIC, row = aBIC == min(aBIC[c(17:20)])),
cells_body(columns = aBIC, row = aBIC == min(aBIC[c(21:24)])),
cells_body(columns = CAIC, row = CAIC == min(CAIC[c(1:6)])),
cells_body(columns = CAIC, row = CAIC == min(CAIC[c(7:12)])),
cells_body(columns = CAIC, row = CAIC == min(CAIC[c(13:16)])),
cells_body(columns = CAIC, row = CAIC == min(CAIC[c(17:20)])),
cells_body(columns = CAIC, row = CAIC == min(CAIC[c(21:24)])),
cells_body(columns = AWE, row = AWE == min(AWE[c(1:6)])),
cells_body(columns = AWE, row = AWE == min(AWE[c(7:12)])),
cells_body(columns = AWE, row = AWE == min(AWE[c(13:16)])),
cells_body(columns = AWE, row = AWE == min(AWE[c(17:20)])),
cells_body(columns = AWE, row = AWE == min(AWE[c(21:24)])),
cells_body(columns = cmPk, row = cmPk > .001),
cells_body(columns = BF, row = BF > 10),
cells_body(columns = T11_VLMR_PValue,
row =ifelse(T11_VLMR_PValue < .001 & lead(T11_VLMR_PValue) > .001, T11_VLMR_PValue < .001, NA)),
cells_body(columns = BLRT_PValue,
row =ifelse(BLRT_PValue < .001 & lead(BLRT_PValue) > .001, BLRT_PValue < .001, NA))))
Model Fit Summary Table1 | ||||||||||
Classes | Par | LL | BIC | aBIC | CAIC | AWE | BLRT | VLMR | BF | cmPk |
---|---|---|---|---|---|---|---|---|---|---|
Model-1 | ||||||||||
Class1 | 8 | −46,288.29 | 92,640.89 | 92,615.47 | 92,648.89 | 92,729.21 | – | – | 0.0 | <.001 |
Class2 | 13 | −43,352.36 | 86,809.23 | 86,767.93 | 86,822.23 | 86,952.76 | <.001 | <.001 | 0.0 | <.001 |
Class3 | 18 | −42,126.11 | 84,396.93 | 84,339.74 | 84,414.93 | 84,595.65 | <.001 | <.001 | 0.0 | <.001 |
Class4 | 23 | −41,433.72 | 83,052.37 | 82,979.29 | 83,075.37 | 83,306.29 | <.001 | <.001 | 0.0 | <.001 |
Class5 | 28 | −41,051.51 | 82,328.14 | 82,239.17 | 82,356.14 | 82,637.26 | <.001 | 0.15 | 0.0 | <.001 |
Class6 | 33 | −40,755.99 | 81,777.31 | 81,672.46 | 81,810.31 | 82,141.64 | <.001 | 0.01 | >100 | <.001 |
Model-2 | ||||||||||
Class1 | 8 | −46,288.29 | 92,640.89 | 92,615.47 | 92,648.89 | 92,729.21 | – | – | 0.0 | <.001 |
Class2 | 17 | −43,313.36 | 86,763.40 | 86,709.38 | 86,780.40 | 86,951.08 | <.001 | <.001 | 0.0 | <.001 |
Class3 | 26 | −42,030.92 | 84,270.88 | 84,188.27 | 84,296.88 | 84,557.93 | <.001 | <.001 | 0.0 | <.001 |
Class4 | 35 | −41,324.70 | 82,930.80 | 82,819.59 | 82,965.80 | 83,317.20 | <.001 | <.001 | 0.0 | <.001 |
Class5 | 44 | −40,939.64 | 82,233.04 | 82,093.24 | 82,277.04 | 82,718.81 | <.001 | 0.03 | 0.0 | <.001 |
Class6 | 53 | −40,668.83 | 81,763.79 | 81,595.39 | 81,816.79 | 82,348.92 | <.001 | 0.08 | 0.0 | <.001 |
Model-3 | ||||||||||
Class1 | 14 | −39,883.58 | 79,879.73 | 79,835.24 | 79,893.73 | 80,034.29 | – | – | 0.0 | <.001 |
Class2 | 25 | −39,769.44 | 79,739.87 | 79,660.44 | 79,764.88 | 80,015.88 | <.001 | <.001 | 0.4 | <.001 |
Class3 | 36 | −39,724.25 | 79,737.94 | 79,623.55 | 79,773.94 | 80,135.38 | <.001 | 0.20 | >100 | <.001 |
Class4 | 47 | −39,688.14 | 79,754.17 | 79,604.83 | 79,801.17 | 80,273.05 | <.001 | 0.03 | >100 | <.001 |
Model-3a | ||||||||||
Class1 | 14 | −39,883.58 | 79,879.73 | 79,835.24 | 79,893.73 | 80,034.29 | – | – | 0.0 | <.001 |
Class2 | 19 | −39,821.05 | 79,794.86 | 79,734.49 | 79,813.86 | 80,004.62 | <.001 | <.001 | 0.0 | <.001 |
Class3 | 24 | −39,776.46 | 79,745.89 | 79,669.63 | 79,769.88 | 80,010.85 | <.001 | 0.02 | 0.0 | <.001 |
Class4 | 29 | −39,745.99 | 79,725.14 | 79,633.00 | 79,754.14 | 80,045.31 | <.001 | 0.00 | >100 | <.001 |
Model-4 | ||||||||||
Class1 | 14 | −39,883.58 | 79,879.73 | 79,835.24 | 79,893.73 | 80,034.29 | – | – | 0.0 | <.001 |
Class2 | 29 | −39,750.04 | 79,733.26 | 79,641.11 | 79,762.25 | 80,053.42 | <.001 | <.001 | 0.0 | <.001 |
Class3 | 44 | −39,667.68 | 79,689.12 | 79,549.31 | 79,733.12 | 80,174.88 | <.001 | <.001 | >100 | 1.00 |
Class4 | 59 | −39,622.03 | 79,718.43 | 79,530.96 | 79,777.43 | 80,369.79 | <.001 | 0.00 | – | <.001 |
1 Note. Par = Parameters; LL = model log likelihood; BIC = Bayesian information criterion; aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion; AWE = approximate weight of evidence criterion; BLRT = bootstrapped likelihood ratio test p-value; VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value; cmPk = approximate correct model probability. |
Interpreting results:
For model-1 the information criteria fit indices
(BIC, aBIC, CAIC, AWE) show incremental decreases in the statistic value
as class number increases with the minimum value for each IC index at
the 6-class model (note; 1-6 class models enumerated). The model
comparison tests (BLRT, VLMR, BF, cmPk) show mixed results: The VLMR
endorses the 4-class solution while the BLRT, BF, and cmPk are
inconclusive. Note, that with the exception of the VLMR the remaining
summary statistics indicate best fit only among the 6 models
enumerated for this model variance/covariance specification.
From this summary table, it is not clear whether a model with a greater
number of classes would fit better than the set of solutions shown. This
pattern of incremental drop in fit values is not uncommon in LPA/LCA
analyses and in this context substantive factors such as the
discrimination of the class patterns should also be considered. One
common method is to plot the IC values and look for “diminishing
returns” in the ICs for each additional class (See figure; Nylund-Gibson
& Choi, 2018). However, when looking across variance-covariance
models this set of solutions has significantly higher values across all
information criteria indicating that the other variance/covariance
models have better fit to the data (i.e., model-2
,…,
model-4
).
Model-2 shows a similar pattern to model-1 with 7 of the 8 fit summary statistics endorsing the 6-class solution. The exception is the VLMR which endorses the 4-class solution. To demonstrate the specification difference between models 1 and 2, namely the freeing of the variance parameters across class, these solutions have been plotted side-by-side (see plot below).
Model-3 shows mixed endorsement across the fit indices. The 3-class solution is endorsed by the BIC and BF and the 2-class solution is endorsed by the CAIC and AWE. Model 3a shows the most consistent agreement for the 4-class solution (BIC, aBIC, CAIC, BF). However, the AWE and VLMR indicate the 2-class solution. Looking across the variance/covariance specifications, model-3 shows a significant improvement in model fit relative to model-1 & model-2.
Model-3a shows mixed support for the 2-class and 4-class solutions across the summary statistics. The 4-class solution is endorsed by the BIC, aBIC, CAIC, and BF. In contrast, the 2-class solution is endorsed by the AWE and VLMR. Models 3 and 3a overall have IC values that are very close in magnitude showing no clear support between the two models.
Finally, for model-4 the most consistent solution endorsed is the 3-class solution with five of the seven fit indices in agreement (BIC, CAIC, VLMR, BF, cmPK). For model-4, the CAIC is contra-indicative and endorses the 3-class solution. Overall, model-4 has the lowest relative values across the information criteria so based on these statistics it has better fit to the sample data. However, it is important to note that substantive factors and parsimony should be considered in choosing the final model.
Plot information criteria:
%>%
allFit1 ::select(1, 4:7) %>%
dplyrrowid_to_column() %>%
pivot_longer(`BIC`:`AWE`,
names_to = "Index",
values_to = "ic_value") %>%
separate(Title, c('Model', 'Class'), sep = ": Class") %>%
mutate(Class = as.numeric(Class)) %>%
mutate(Index = factor(Index,
levels = c ("AWE", "CAIC", "BIC", "aBIC"))) %>%
ggplot(aes(x = Class, y = ic_value,
color = Index, shape = Index, group = Index, lty = Index)) +
facet_wrap(~Model, scales="free") +
geom_point(size = 2.0) + geom_line(size = .8) +
scale_x_continuous(breaks = 1:nrow(allFit)) +
theme_minimal() +
labs(x = "Number of Classes", y = "Information Criteria Value",
title = "Information Criteria") +
theme(text = element_text(family = "serif", size = 12),
legend.text = element_text(family="serif", size=12),
legend.key.width = unit(3, "line"), legend.title = element_blank(),
legend.position = "top")
Plot comparison of Model-1
and Model 2
(variances; fixed to equality V. freely estimated):
The plot on the left illustrates variances constrained to equality across class while the plot on the right has variances which are freely estimated across the 6 classes.
NOTE: The plotMixtures()
function is used for plotting
LPA models only (i.e., means & variances)
<- plotMixtures(m1_output$c6_lpa_m1.out,
a ci = 0.95, bw = FALSE)
<- plotMixtures(m2_output$c6_lpa_m2.out,
b ci = 0.95, bw = FALSE)
+ labs(title = "Model-1: Variances fixed to equality") +
a theme(plot.title = element_text(size = 12)) +
+ labs(title = "Model-2: Variances freely estimated") +
b theme(plot.title = element_text(size = 12))
Figure. Here we see an ordered solution for model-1 and model-2.
This result might be anticipated given that the correlation between test scores ranges from moderate to high (r = .57 - .95).
Plot conditional means & variances: Model-4, C=3
(free variances & free covariances)
plotMixtures(m4_output$c3_lpa_m4.out, ci = 0.95, bw = FALSE) +
labs(title = "Model-4: Free variances & covariances")
Figure. Here we see an ordered solution with variation in item means across each of the four items within-class as well as between-class differences across the 3 classes.
References
Hallquist, M. N., & Wiley, J. F. (2018). MplusAutomation: An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus. Structural equation modeling: a multidisciplinary journal, 25(4), 621-638.
Masyn, K. E. (2013). Latent class analysis and finite mixture modeling.
Miller, J. D., Hoffer, T., Suchner, R., Brown, K., & Nelson, C. (1992). LSAY codebook. Northern Illinois University.
Muthén, B. O., Muthén, L. K., & Asparouhov, T. (2017). Regression and mediation analysis using Mplus. Los Angeles, CA: Muthén & Muthén.
Muthén, L.K. and Muthén, B.O. (1998-2017). Mplus User’s Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén
Rosenberg, J. M., van Lissa, C. J., Beymer, P. N., Anderson, D. J., Schell, M. J. & Schmidt, J. A. (2019). tidyLPA: Easily carry out Latent Profile Analysis (LPA) using open-source or commercial software [R package]. https://data-edu.github.io/tidyLPA/
R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/
Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686