An_Introduction_to_LPA_Using_MplusAutomation

Figure. Gaussian mixture models. Data simulated from a 2-class model.

Preparation

Download repository here: https://github.com/MM4DBER/Intro_to_LPA

Load packages

library(tidyverse)
library(glue)
library(MplusAutomation)
library(here)
library(gt)
library(tidyLPA)
library(patchwork)

Variance-covariance specifications in `LPA`

Which variance-covariance specification is appropriate?

Figure. Picture adapted from Masyn (2013)

The variance-covariance matrix:

Figure. Picture adapted from tutorial (Rosenberg, 2019)

Terminology for specifying variance-covariance matrix:

Equal variances: Variances are fixed to equality across class. (i.e., variances are constrained to be equal for each class)
Free variances: Variances parameters are freely estimated across class (i.e., variances vary by class)
Covariances fixed to 0: No covariances are estimated (i.e., the off-diagonal cells of the matrix are zero)
Free covariances: The covariances are freely estimated (e.g., a 4-class model has 6 covariance parameters estimated per class)
Equal covariances: The covariances are fixed to equality across class (i.e., covariances are constrained to be equal across class)

Model presentation order:

Models are presented in the order described in the book chapter by Masyn (2013; p.591).
An additional model 3a demonstrates the specification of covariances fixed to equality across class.
NOTE: Many model specifications are possible. Among variance and covariance parameter groups any combination of these parameters can be freely estimated or fixed.

Example: For an LPA model with 4 indicators the variance-covariance matrix includes 4 variance and 7 covariance parameters per class. A methods effect may be indentified in a survey (e.g., similar item wording) which provides a rationale to freely estimate the associated covariance. However, the remaining 6 covariances may be negligible in magnitude providing an empirical basis to fix this sub-set of covariances at zero.

Models 1-4 from book chapter in Masyn (2013):

model 1: Equal variances, and covariances fixed to 0
model 2: Free variances and covariances fixed to 0
model 3: Equal variances, and free covariances
model 3a: Equal variances, and equal covariances (extra model)
model 4: Free variances and free covariances

Example: Math, Science, Physics, and Biology test score measures (LSAY).

Data source: The example presented in this tutorial utilizes 4 test score measures (Math, Science, Physics, and Biology) from the public-use data source, The Longitudinal Survey of American Youth (LSAY): \(\color{blue}{\text{See documentation here}}\)

Read in data

lsay_data <- read_csv(here("data", "lsay_lpa.csv"))

`model 1`: Equal variances (fixed) & covariances fixed to zero

In model-1 the item variances are fixed to equality across latent classes (4 item variances per-class) and covariance parameters are fixed to zero (not estimated). In the applied example, each of the 4 items has a unique variance estimated but the variances are held to equality across each of the latent classes in the example (4*1= 4 variance parameters). Relating this to the variance-covariance matrix this model is also named the, class-invariant, diagonal model. This is the default model in Mplus. Because it is a relatively parsimonious model it provides a good entry point to begin an LPA analysis.

Figure. Picture of “diagonal class-invariant model” adapted from Masyn (2013)

Note: Model variance-covariance specifications are stated explicitly here for pedagogical purposes. Model 1 can be replicated by removing the syntax under the MODEL= command as this is the default specification in Mplus.

lpa_m1  <- lapply(1:6, function(k) { 
  z2 <- if_else(k < 2 , '!' , '')
  z3 <- if_else(k < 3 , '!' , '')
  z4 <- if_else(k < 4 , '!' , '')
  z5 <- if_else(k < 5 , '!' , '')
  z6 <- if_else(k < 6 , '!' , '')
  
lpa_enum  <- mplusObject(
      
  TITLE = glue("M1: Class{k} "), 
  
  VARIABLE = glue(
   "usevar = mth_scor-bio_scor;
    classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  MODEL = glue(
    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
     !!!          Syntax replicates default Mplus specification            !!!
     !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    
     %c#1%
     [mth_scor-bio_scor];     !!! means are are freely estimated  (default)
     mth_scor-bio_scor(1-4);  !!! variances are fixed to equality (default)
                              !!! covariances are fixed to zero   (default)
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor(1-4);
     
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor(1-4);
     
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor(1-4);

{z5} %c#5%
{z5} [mth_scor-bio_scor];
{z5} mth_scor-bio_scor(1-4);

{z6} %c#6%
{z6} [mth_scor-bio_scor];
{z6} mth_scor-bio_scor(1-4);"),
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_m1_fit <- mplusModeler(lpa_enum, 
    dataout=glue(here("m1_enum", "c_lpa_m1.dat")),
    modelout=glue(here("m1_enum", "c{k}_lpa_m1.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

Model 1: Produce model fit table

m1_output <- readModels(here("m1_enum"), quiet = TRUE)

m1_enum <- LatexSummaryTable(m1_output,                               
                keepCols=c("Title","Parameters", "LL", "BIC",         
                           "aBIC", "BLRT_PValue", "T11_VLMR_PValue",
                           "Observations"),    
                sortBy = "Title")                                     

gt(m1_enum)

Title	Parameters	LL	BIC	aBIC	BLRT_PValue	T11_VLMR_PValue	Observations
M1: Class1	8	-46288.29	92640.89	92615.47	NA	NA	3103
M1: Class2	13	-43352.36	86809.23	86767.93	0	0.0000	3103
M1: Class3	18	-42126.11	84396.93	84339.74	0	0.0000	3103
M1: Class4	23	-41433.72	83052.37	82979.29	0	0.0000	3103
M1: Class5	28	-41051.51	82328.14	82239.17	0	0.1547	3103
M1: Class6	33	-40755.99	81777.31	81672.46	0	0.0149	3103

`model 2`: Free variances & covariances fixed to zero

In model-2 the variances are freely estimated across latent classes (4 item variances per-class; 4*4=16 variance parameters) and covariance parameters are fixed to zero. In regards to the variance-covariance matrix this model is also named the, class-varying, diagonal model.

lpa_m2  <- lapply(1:6, function(k) { 
  z2 <- if_else(k < 2 , '!' , '')
  z3 <- if_else(k < 3 , '!' , '')
  z4 <- if_else(k < 4 , '!' , '')
  z5 <- if_else(k < 5 , '!' , '')
  z6 <- if_else(k < 6 , '!' , '')
  
lpa_enum  <- mplusObject(
      
  TITLE = glue("M2: Class{k} "), 
  
  VARIABLE = glue(
   "usevar = mth_scor-bio_scor;
    classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  MODEL = glue(
   
   "%c#1%
    [mth_scor-bio_scor];   !!! means are are freely estimated
    mth_scor-bio_scor;     !!! variances are freely estimated
                           !!! covariances are fixed to zero (default)
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor;
     
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor;
     
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor;

{z5} %c#5%
{z5} [mth_scor-bio_scor];
{z5} mth_scor-bio_scor;

{z6} %c#6%
{z6} [mth_scor-bio_scor];
{z6} mth_scor-bio_scor;

"),
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_m2_fit <- mplusModeler(lpa_enum, 
    dataout=glue(here("m2_enum", "c_lpa_m2.dat")),
    modelout=glue(here("m2_enum", "c{k}_lpa_m2.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

Model 2: Produce model fit table

m2_output <- readModels(here("m2_enum"), quiet = TRUE)

m2_enum <- LatexSummaryTable(m2_output,                               
                keepCols=c("Title","Parameters", "LL", "BIC",         
                           "aBIC", "BLRT_PValue", "T11_VLMR_PValue",
                           "Observations"),     
                sortBy = "Title")                                     

gt(m2_enum)

Title	Parameters	LL	BIC	aBIC	BLRT_PValue	T11_VLMR_PValue	Observations
M2: Class1	8	-46288.29	92640.89	92615.47	NA	NA	3103
M2: Class2	17	-43313.36	86763.40	86709.38	0	0.0000	3103
M2: Class3	26	-42030.92	84270.88	84188.27	0	0.0000	3103
M2: Class4	35	-41324.70	82930.80	82819.59	0	0.0002	3103
M2: Class5	44	-40939.64	82233.04	82093.24	0	0.0281	3103
M2: Class6	53	-40668.83	81763.79	81595.38	0	0.0842	3103

`model 3`: Equal (fixed) variances & free covariances

In model-3 the variances are fixed to equality across latent classes and covariance parameters are freely estimated. In relation to the variance-covariance matrix this model is also named the, class-invariant, non-diagonal model. In the example below a total of 4 variance parameters and 24 covariance parameters are estimated (i.e., 6 covariances per-class). Note that although this example demonstrates a model with all covariances estimated, any number of the covariance (or variance) parameters can be either fixed or freely estimated. For example, 2 item covariances may have substantive meaning while no theoretical precedent exists for estimating the remaining 4 pairwise item covariances.

lpa_m3  <- lapply(1:4, function(k) { 
  z2 <- if_else(k < 2 , '!' , '')
  z3 <- if_else(k < 3 , '!' , '')
  z4 <- if_else(k < 4 , '!' , '')
  
lpa_enum  <- mplusObject(
      
  TITLE = glue("M3: Class{k} "), 
  
  VARIABLE = glue(
   "usevar = mth_scor-bio_scor;
    classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  MODEL = glue(
    
   "%c#1%
    [mth_scor-bio_scor];          !!! means are freely estimated
    mth_scor-bio_scor(1-4);       !!! variances are fixed to equality
    mth_scor with sci_scor;       !!! covariances are freely estimated
    mth_scor with phy_scor;
    mth_scor with bio_scor;
    sci_scor with phy_scor;
    sci_scor with bio_scor;
    phy_scor with bio_scor;
   
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor(1-4);
{z2} mth_scor with sci_scor;  
{z2} mth_scor with phy_scor;
{z2} mth_scor with bio_scor;
{z2} sci_scor with phy_scor;
{z2} sci_scor with bio_scor;
{z2} phy_scor with bio_scor;
     
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor(1-4);
{z3} mth_scor with sci_scor;   
{z3} mth_scor with phy_scor;
{z3} mth_scor with bio_scor;
{z3} sci_scor with phy_scor;
{z3} sci_scor with bio_scor;
{z3} phy_scor with bio_scor;
     
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor(1-4);
{z4} mth_scor with sci_scor;  
{z4} mth_scor with phy_scor;
{z4} mth_scor with bio_scor;
{z4} sci_scor with phy_scor;
{z4} sci_scor with bio_scor;
{z4} phy_scor with bio_scor;"),
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_m3_fit <- mplusModeler(lpa_enum, 
    dataout=glue(here("m3_enum", "c_lpa_m3.dat")),
    modelout=glue(here("m3_enum", "c{k}_lpa_m3.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

Model 3: Produce model fit table

m3_output <- readModels(here("m3_enum"), quiet = TRUE)

m3_enum <- LatexSummaryTable(m3_output,                                   
                keepCols=c("Title","Parameters", "LL", "BIC",             
                           "aBIC", "BLRT_PValue", "T11_VLMR_PValue",
                           "Observations"),    
                sortBy = "Title")                                         

gt(m3_enum)

Title	Parameters	LL	BIC	aBIC	BLRT_PValue	T11_VLMR_PValue	Observations
M3: Class1	14	-39883.58	79879.73	79835.24	NA	NA	3103
M3: Class2	25	-39769.44	79739.87	79660.44	0	0.0000	3103
M3: Class3	36	-39724.25	79737.94	79623.55	0	0.1976	3103
M3: Class4	47	-39688.14	79754.17	79604.83	0	0.0322	3103

`model 3a`: Equal variances & equal (fixed) covariances

In model-3a the variances are fixed to equality across latent classes and covariance parameters are fixed to equality across the 4 latent classes. In-other-words, this model estimates 4 item variances & 6 item covariances. Each unique item has a separate variance estimated and each pairwise covariance has a separate covariance estimated but all the variance-covariance parameters are held to equality across class.

lpa_m3a  <- lapply(1:4, function(k) { 
  z2 <- if_else(k < 2 , '!' , '')
  z3 <- if_else(k < 3 , '!' , '')
  z4 <- if_else(k < 4 , '!' , '')
  
lpa_enum  <- mplusObject(
      
  TITLE = glue("M3a: Class{k} "), 
  
  VARIABLE = glue(
   "usevar = mth_scor-bio_scor;
    classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  MODEL = glue(
    
   "%c#1%
    [mth_scor-bio_scor];          !!! means are freely estimated
    mth_scor-bio_scor(1-4);       !!! variances are fixed to equality
    mth_scor with sci_scor(5);    !!! covariances are fixed to equality
    mth_scor with phy_scor(6);
    mth_scor with bio_scor(7);
    sci_scor with phy_scor(8);
    sci_scor with bio_scor(9);
    phy_scor with bio_scor(10);
   
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor(1-4);
{z2} mth_scor with sci_scor(5);   
{z2} mth_scor with phy_scor(6);
{z2} mth_scor with bio_scor(7);
{z2} sci_scor with phy_scor(8);
{z2} sci_scor with bio_scor(9);
{z2} phy_scor with bio_scor(10);
     
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor(1-4);
{z3} mth_scor with sci_scor(5);   
{z3} mth_scor with phy_scor(6);
{z3} mth_scor with bio_scor(7);
{z3} sci_scor with phy_scor(8);
{z3} sci_scor with bio_scor(9);
{z3} phy_scor with bio_scor(10);
     
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor(1-4);
{z4} mth_scor with sci_scor(5);   
{z4} mth_scor with phy_scor(6);
{z4} mth_scor with bio_scor(7);
{z4} sci_scor with phy_scor(8);
{z4} sci_scor with bio_scor(9);
{z4} phy_scor with bio_scor(10);"),
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_m3a_fit <- mplusModeler(lpa_enum, 
    dataout=glue(here("m3a_enum", "c_lpa_m3a.dat")),
    modelout=glue(here("m3a_enum", "c{k}_lpa_m3a.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

Model 3a: Produce model fit table

m3a_output <- readModels(here("m3a_enum"), quiet = TRUE)

m3a_enum <- LatexSummaryTable(m3a_output,                                     
                keepCols=c("Title","Parameters", "LL", "BIC",               
                           "aBIC", "BLRT_PValue", "T11_VLMR_PValue",
                           "Observations"),
                sortBy = "Title")                                           

gt(m3a_enum)

Title	Parameters	LL	BIC	aBIC	BLRT_PValue	T11_VLMR_PValue	Observations
M3a: Class1	14	-39883.58	79879.73	79835.24	NA	NA	3103
M3a: Class2	19	-39821.05	79794.86	79734.49	0	0.000	3103
M3a: Class3	24	-39776.46	79745.89	79669.63	0	0.022	3103
M3a: Class4	29	-39745.99	79725.14	79633.00	0	0.003	3103

`model 4`: Free variances & free covariances

In model-4 the variances are freely estimated across latent classes (4 item variances per-class; 4*4=16 variance parameters) and covariance are freely estimated (6 pairwise item covariances per-class; 6*4=24 covariance parameters). Note that although this example demonstrates a model with all variance/covariance parameters estimated, any number of the covariance (or variance) parameters can be either fixed or freely estimated.

Figure. Picture adapted from Masyn (2013)

lpa_m4  <- lapply(1:4, function(k) { 
  z2 <- if_else(k < 2 , '!' , '')
  z3 <- if_else(k < 3 , '!' , '')
  z4 <- if_else(k < 4 , '!' , '')
  
lpa_enum  <- mplusObject(
      
  TITLE = glue("M4: Class{k} "), 
  
  VARIABLE = glue(
   "usevar = mth_scor-bio_scor;
    classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  MODEL = glue(
    
   "%c#1%
    [mth_scor-bio_scor];          !!! means are freely estimated
    mth_scor-bio_scor;            !!! variances are freely estimated
    mth_scor with sci_scor;       !!! covariances are freely estimated
    mth_scor with phy_scor;
    mth_scor with bio_scor;
    sci_scor with phy_scor;
    sci_scor with bio_scor;
    phy_scor with bio_scor;
   
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor;
{z2} mth_scor with sci_scor;  
{z2} mth_scor with phy_scor;
{z2} mth_scor with bio_scor;
{z2} sci_scor with phy_scor;
{z2} sci_scor with bio_scor;
{z2} phy_scor with bio_scor;
     
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor;
{z3} mth_scor with sci_scor;   
{z3} mth_scor with phy_scor;
{z3} mth_scor with bio_scor;
{z3} sci_scor with phy_scor;
{z3} sci_scor with bio_scor;
{z3} phy_scor with bio_scor;
     
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor;
{z4} mth_scor with sci_scor;  
{z4} mth_scor with phy_scor;
{z4} mth_scor with bio_scor;
{z4} sci_scor with phy_scor;
{z4} sci_scor with bio_scor;
{z4} phy_scor with bio_scor;"),
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_m4_fit <- mplusModeler(lpa_enum, 
    dataout=glue(here("m4_enum", "c_lpa_m4.dat")),
    modelout=glue(here("m4_enum", "c{k}_lpa_m4.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

Model 4: Produce model fit table

m4_output <- readModels(here("m4_enum"), quiet = TRUE)

m4_enum <- LatexSummaryTable(m4_output,                                 
                keepCols=c("Title","Parameters", "LL", "BIC",           
                           "aBIC", "BLRT_PValue", "T11_VLMR_PValue",    
                           "Observations"),                             
                sortBy = "Title")                                       

gt(m4_enum)

Title	Parameters	LL	BIC	aBIC	BLRT_PValue	T11_VLMR_PValue	Observations
M4: Class1	14	-39883.58	79879.73	79835.24	NA	NA	3103
M4: Class2	29	-39750.04	79733.26	79641.11	0	0.0000	3103
M4: Class3	44	-39667.68	79689.12	79549.31	0	0.0000	3103
M4: Class4	59	-39622.03	79718.43	79530.96	0	0.0028	3103

Final table of model fit (`all models`)

allFit <- rbind(m1_enum,m2_enum,m3_enum,m3a_enum,m4_enum)
                   
allFit1 <- allFit %>% 
  mutate(aBIC = -2*LL+Parameters*log((Observations+2)/24)) %>% 
  mutate(CAIC = -2*LL+Parameters*(log(Observations)+1)) %>% 
  mutate(AWE = -2*LL+2*Parameters*(log(Observations)+1.5)) %>%
  mutate(SIC = -.5*BIC) %>% 
  mutate(expSIC = exp(SIC - max(SIC))) %>% 
  mutate(BF = exp(SIC-lead(SIC))) %>% 
  mutate(cmPk = expSIC/sum(expSIC)) %>% 
  select(1:5,9:10,6:7,13,14)

Format the model fit table

allFit1 %>% 
  mutate(Title = str_remove_all(Title, "M...")) %>% 
  gt() %>%
  tab_header(
    title = md("**Model Fit Summary Table**"), subtitle = md("&nbsp;")) %>% 
  cols_label(
    Title = "Classes",
    Parameters = md("Par"),
    LL = md("*LL*"),
    T11_VLMR_PValue = "VLMR",
    BLRT_PValue = "BLRT",
    BF = md("BF"),
    cmPk = md("*cmPk*")) %>%
  tab_footnote(
    footnote = md(
    "*Note.* Par = Parameters; *LL* = model log likelihood;
      BIC = Bayesian information criterion;
      aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion;
      AWE = approximate weight of evidence criterion;
      BLRT = bootstrapped likelihood ratio test p-value;
      VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value;
      *cmPk* = approximate correct model probability."), 
    locations = cells_title()) %>% 
  tab_options(column_labels.font.weight = "bold") %>% 
  fmt_number(10,decimals = 2,
             drop_trailing_zeros=TRUE,
             suffixing = TRUE) %>% 
  fmt_number(c(3:9,11), 
             decimals = 2) %>% 
  sub_missing(1:11,
              missing_text = "--") %>% 
  fmt(c(8:9,11),
    fns = function(x) 
    ifelse(x<0.001, "<.001",
           scales::number(x, accuracy = 0.01))) %>%
  fmt(10, fns = function(x) 
    ifelse(x>100, ">100",
           scales::number(x, accuracy = .1))) %>%
  tab_row_group(
    label = "Model-1",
    rows = 1:6) %>%
  tab_row_group(
    label = "Model-2",
    rows = 7:12) %>%
  tab_row_group(
    label = "Model-3",
    rows = 13:16) %>%
  tab_row_group(
    label = "Model-3a",
    rows = 17:20) %>%
  tab_row_group(
    label = "Model-4",
    rows = 21:24) %>%
  row_group_order(
      groups = c("Model-1","Model-2","Model-3",
                 "Model-3a","Model-4")) %>% 
tab_style(style = list(cell_text(weight = "bold")),
  locations = list(
  cells_body(columns = BIC, row = BIC == min(BIC[c(1:6)])),
  cells_body(columns = BIC, row = BIC == min(BIC[c(7:12)])),
  cells_body(columns = BIC, row = BIC == min(BIC[c(13:16)])),
  cells_body(columns = BIC, row = BIC == min(BIC[c(17:20)])),
  cells_body(columns = BIC, row = BIC == min(BIC[c(21:24)])),
  cells_body(columns = aBIC, row = aBIC == min(aBIC[c(1:6)])),
  cells_body(columns = aBIC, row = aBIC == min(aBIC[c(7:12)])),
  cells_body(columns = aBIC, row = aBIC == min(aBIC[c(13:16)])),
  cells_body(columns = aBIC, row = aBIC == min(aBIC[c(17:20)])),
  cells_body(columns = aBIC, row = aBIC == min(aBIC[c(21:24)])),
  cells_body(columns = CAIC, row = CAIC == min(CAIC[c(1:6)])),
  cells_body(columns = CAIC, row = CAIC == min(CAIC[c(7:12)])),
  cells_body(columns = CAIC, row = CAIC == min(CAIC[c(13:16)])),
  cells_body(columns = CAIC, row = CAIC == min(CAIC[c(17:20)])),
  cells_body(columns = CAIC, row = CAIC == min(CAIC[c(21:24)])),
  cells_body(columns = AWE, row = AWE == min(AWE[c(1:6)])),
  cells_body(columns = AWE, row = AWE == min(AWE[c(7:12)])),
  cells_body(columns = AWE, row = AWE == min(AWE[c(13:16)])),
  cells_body(columns = AWE, row = AWE == min(AWE[c(17:20)])),
  cells_body(columns = AWE, row = AWE == min(AWE[c(21:24)])),
  cells_body(columns = cmPk, row =  cmPk > .001),
  cells_body(columns = BF, row =  BF > 10),
  cells_body(columns =  T11_VLMR_PValue,
  row =ifelse(T11_VLMR_PValue < .001 & lead(T11_VLMR_PValue) > .001, T11_VLMR_PValue < .001, NA)),
  cells_body(columns =  BLRT_PValue,
  row =ifelse(BLRT_PValue < .001 & lead(BLRT_PValue) > .001, BLRT_PValue < .001, NA))))

Classes	Par	LL	BIC	aBIC	CAIC	AWE	BLRT	VLMR	BF	cmPk
Model Fit Summary Table¹
Model-1
Class1	8	−46,288.29	92,640.89	92,615.47	92,648.89	92,729.21	–	–	0.0	<.001
Class2	13	−43,352.36	86,809.23	86,767.93	86,822.23	86,952.76	<.001	<.001	0.0	<.001
Class3	18	−42,126.11	84,396.93	84,339.74	84,414.93	84,595.65	<.001	<.001	0.0	<.001
Class4	23	−41,433.72	83,052.37	82,979.29	83,075.37	83,306.29	<.001	<.001	0.0	<.001
Class5	28	−41,051.51	82,328.14	82,239.17	82,356.14	82,637.26	<.001	0.15	0.0	<.001
Class6	33	−40,755.99	81,777.31	81,672.46	81,810.31	82,141.64	<.001	0.01	>100	<.001
Model-2
Class1	8	−46,288.29	92,640.89	92,615.47	92,648.89	92,729.21	–	–	0.0	<.001
Class2	17	−43,313.36	86,763.40	86,709.38	86,780.40	86,951.08	<.001	<.001	0.0	<.001
Class3	26	−42,030.92	84,270.88	84,188.27	84,296.88	84,557.93	<.001	<.001	0.0	<.001
Class4	35	−41,324.70	82,930.80	82,819.59	82,965.80	83,317.20	<.001	<.001	0.0	<.001
Class5	44	−40,939.64	82,233.04	82,093.24	82,277.04	82,718.81	<.001	0.03	0.0	<.001
Class6	53	−40,668.83	81,763.79	81,595.39	81,816.79	82,348.92	<.001	0.08	0.0	<.001
Model-3
Class1	14	−39,883.58	79,879.73	79,835.24	79,893.73	80,034.29	–	–	0.0	<.001
Class2	25	−39,769.44	79,739.87	79,660.44	79,764.88	80,015.88	<.001	<.001	0.4	<.001
Class3	36	−39,724.25	79,737.94	79,623.55	79,773.94	80,135.38	<.001	0.20	>100	<.001
Class4	47	−39,688.14	79,754.17	79,604.83	79,801.17	80,273.05	<.001	0.03	>100	<.001
Model-3a
Class1	14	−39,883.58	79,879.73	79,835.24	79,893.73	80,034.29	–	–	0.0	<.001
Class2	19	−39,821.05	79,794.86	79,734.49	79,813.86	80,004.62	<.001	<.001	0.0	<.001
Class3	24	−39,776.46	79,745.89	79,669.63	79,769.88	80,010.85	<.001	0.02	0.0	<.001
Class4	29	−39,745.99	79,725.14	79,633.00	79,754.14	80,045.31	<.001	0.00	>100	<.001
Model-4
Class1	14	−39,883.58	79,879.73	79,835.24	79,893.73	80,034.29	–	–	0.0	<.001
Class2	29	−39,750.04	79,733.26	79,641.11	79,762.25	80,053.42	<.001	<.001	0.0	<.001
Class3	44	−39,667.68	79,689.12	79,549.31	79,733.12	80,174.88	<.001	<.001	>100	1.00
Class4	59	−39,622.03	79,718.43	79,530.96	79,777.43	80,369.79	<.001	0.00	–	<.001
¹ Note. Par = Parameters; LL = model log likelihood; BIC = Bayesian information criterion; aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion; AWE = approximate weight of evidence criterion; BLRT = bootstrapped likelihood ratio test p-value; VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value; cmPk = approximate correct model probability.

Interpreting results:

For model-1 the information criteria fit indices (BIC, aBIC, CAIC, AWE) show incremental decreases in the statistic value as class number increases with the minimum value for each IC index at the 6-class model (note; 1-6 class models enumerated). The model comparison tests (BLRT, VLMR, BF, cmPk) show mixed results: The VLMR endorses the 4-class solution while the BLRT, BF, and cmPk are inconclusive. Note, that with the exception of the VLMR the remaining summary statistics indicate best fit only among the 6 models enumerated for this model variance/covariance specification. From this summary table, it is not clear whether a model with a greater number of classes would fit better than the set of solutions shown. This pattern of incremental drop in fit values is not uncommon in LPA/LCA analyses and in this context substantive factors such as the discrimination of the class patterns should also be considered. One common method is to plot the IC values and look for “diminishing returns” in the ICs for each additional class (See figure; Nylund-Gibson & Choi, 2018). However, when looking across variance-covariance models this set of solutions has significantly higher values across all information criteria indicating that the other variance/covariance models have better fit to the data (i.e., model-2,…, model-4).

Model-2 shows a similar pattern to model-1 with 7 of the 8 fit summary statistics endorsing the 6-class solution. The exception is the VLMR which endorses the 4-class solution. To demonstrate the specification difference between models 1 and 2, namely the freeing of the variance parameters across class, these solutions have been plotted side-by-side (see plot below).

Model-3 shows mixed endorsement across the fit indices. The 3-class solution is endorsed by the BIC and BF and the 2-class solution is endorsed by the CAIC and AWE. Model 3a shows the most consistent agreement for the 4-class solution (BIC, aBIC, CAIC, BF). However, the AWE and VLMR indicate the 2-class solution. Looking across the variance/covariance specifications, model-3 shows a significant improvement in model fit relative to model-1 & model-2.

Model-3a shows mixed support for the 2-class and 4-class solutions across the summary statistics. The 4-class solution is endorsed by the BIC, aBIC, CAIC, and BF. In contrast, the 2-class solution is endorsed by the AWE and VLMR. Models 3 and 3a overall have IC values that are very close in magnitude showing no clear support between the two models.

Finally, for model-4 the most consistent solution endorsed is the 3-class solution with five of the seven fit indices in agreement (BIC, CAIC, VLMR, BF, cmPK). For model-4, the CAIC is contra-indicative and endorses the 3-class solution. Overall, model-4 has the lowest relative values across the information criteria so based on these statistics it has better fit to the sample data. However, it is important to note that substantive factors and parsimony should be considered in choosing the final model.

Plot information criteria:

allFit1 %>%
  dplyr::select(1, 4:7) %>%
  rowid_to_column() %>%
  pivot_longer(`BIC`:`AWE`,
               names_to = "Index",
               values_to = "ic_value") %>% 
  separate(Title, c('Model', 'Class'), sep = ": Class")  %>% 
  mutate(Class = as.numeric(Class)) %>%
  mutate(Index = factor(Index,
                        levels = c ("AWE", "CAIC", "BIC", "aBIC"))) %>%
  ggplot(aes(x = Class, y = ic_value,
    color = Index, shape = Index, group = Index, lty = Index)) +
  facet_wrap(~Model, scales="free") +
  geom_point(size = 2.0) + geom_line(size = .8) +
  scale_x_continuous(breaks = 1:nrow(allFit)) +
  theme_minimal() +
  labs(x = "Number of Classes", y = "Information Criteria Value",
       title = "Information Criteria") +
  theme(text = element_text(family = "serif", size = 12),
    legend.text = element_text(family="serif", size=12),
    legend.key.width = unit(3, "line"), legend.title = element_blank(),
    legend.position = "top")

Plot comparison of `Model-1` and `Model 2` (variances; fixed to equality V. freely estimated):

The plot on the left illustrates variances constrained to equality across class while the plot on the right has variances which are freely estimated across the 6 classes.

NOTE: The plotMixtures() function is used for plotting LPA models only (i.e., means & variances)

a <- plotMixtures(m1_output$c6_lpa_m1.out,
  ci = 0.95, bw = FALSE) 

b <- plotMixtures(m2_output$c6_lpa_m2.out,
  ci = 0.95, bw = FALSE) 

a + labs(title = "Model-1: Variances fixed to equality") +
    theme(plot.title = element_text(size = 12)) +
b + labs(title = "Model-2: Variances freely estimated") +
    theme(plot.title = element_text(size = 12))

Figure. Here we see an ordered solution for model-1 and model-2.

This result might be anticipated given that the correlation between test scores ranges from moderate to high (r = .57 - .95).

Plot conditional means & variances: `Model-4, C=3` (free variances & free covariances)

plotMixtures(m4_output$c3_lpa_m4.out, ci = 0.95, bw = FALSE) + 
  labs(title = "Model-4: Free variances & covariances")

Figure. Here we see an ordered solution with variation in item means across each of the four items within-class as well as between-class differences across the 3 classes.

References

Hallquist, M. N., & Wiley, J. F. (2018). MplusAutomation: An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus. Structural equation modeling: a multidisciplinary journal, 25(4), 621-638.

Masyn, K. E. (2013). Latent class analysis and finite mixture modeling.

Miller, J. D., Hoffer, T., Suchner, R., Brown, K., & Nelson, C. (1992). LSAY codebook. Northern Illinois University.

Muthén, B. O., Muthén, L. K., & Asparouhov, T. (2017). Regression and mediation analysis using Mplus. Los Angeles, CA: Muthén & Muthén.

Muthén, L.K. and Muthén, B.O. (1998-2017). Mplus User’s Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén

Rosenberg, J. M., van Lissa, C. J., Beymer, P. N., Anderson, D. J., Schell, M. J. & Schmidt, J. A. (2019). tidyLPA: Easily carry out Latent Profile Analysis (LPA) using open-source or commercial software [R package]. https://data-edu.github.io/tidyLPA/

R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/

Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686

Introduction to Latent Profile Analysis (LPA)

Adam Garber

February 21, 2024

An_Introduction_to_LPA_Using_MplusAutomation

Preparation

Variance-covariance specifications in `LPA`

Example: Math, Science, Physics, and Biology test score measures (LSAY).

`model 1`: Equal variances (fixed) & covariances fixed to zero

`model 2`: Free variances & covariances fixed to zero

`model 3`: Equal (fixed) variances & free covariances

`model 3a`: Equal variances & equal (fixed) covariances

`model 4`: Free variances & free covariances

Final table of model fit (`all models`)

Plot comparison of `Model-1` and `Model 2` (variances; fixed to equality V. freely estimated):

Plot conditional means & variances: `Model-4, C=3` (free variances & free covariances)

References

An_Introduction_to_LPA_Using_MplusAutomation

Preparation

Variance-covariance specifications in LPA

Example: Math, Science, Physics, and Biology test score measures (LSAY).

model 1: Equal variances (fixed) & covariances fixed to zero

model 2: Free variances & covariances fixed to zero

model 3: Equal (fixed) variances & free covariances

model 3a: Equal variances & equal (fixed) covariances

model 4: Free variances & free covariances

Final table of model fit (all models)

Plot comparison of Model-1 and Model 2 (variances; fixed to equality V. freely estimated):

Plot conditional means & variances: Model-4, C=3 (free variances & free covariances)

References

Variance-covariance specifications in `LPA`

`model 1`: Equal variances (fixed) & covariances fixed to zero

`model 2`: Free variances & covariances fixed to zero

`model 3`: Equal (fixed) variances & free covariances

`model 3a`: Equal variances & equal (fixed) covariances

`model 4`: Free variances & free covariances

Final table of model fit (`all models`)

Plot comparison of `Model-1` and `Model 2` (variances; fixed to equality V. freely estimated):

Plot conditional means & variances: `Model-4, C=3` (free variances & free covariances)