Introduction to Latent Profile Analysis (LPA)

Adam Garber

February 21, 2024


An_Introduction_to_LPA_Using_MplusAutomation

Figure. Gaussian mixture models. Data simulated from a 2-class model.


Preparation

Download repository here: https://github.com/MM4DBER/Intro_to_LPA


Load packages

library(tidyverse)
library(glue)
library(MplusAutomation)
library(here)
library(gt)
library(tidyLPA)
library(patchwork)

Variance-covariance specifications in LPA


Which variance-covariance specification is appropriate?

Figure. Picture adapted from Masyn (2013)


The variance-covariance matrix:

Figure. Picture adapted from tutorial (Rosenberg, 2019)


Terminology for specifying variance-covariance matrix:

  • Equal variances: Variances are fixed to equality across class. (i.e., variances are constrained to be equal for each class)
  • Free variances: Variances parameters are freely estimated across class (i.e., variances vary by class)
  • Covariances fixed to 0: No covariances are estimated (i.e., the off-diagonal cells of the matrix are zero)
  • Free covariances: The covariances are freely estimated (e.g., a 4-class model has 6 covariance parameters estimated per class)
  • Equal covariances: The covariances are fixed to equality across class (i.e., covariances are constrained to be equal across class)

Model presentation order:

  • Models are presented in the order described in the book chapter by Masyn (2013; p.591).
  • An additional model 3a demonstrates the specification of covariances fixed to equality across class.
  • NOTE: Many model specifications are possible. Among variance and covariance parameter groups any combination of these parameters can be freely estimated or fixed.

Example: For an LPA model with 4 indicators the variance-covariance matrix includes 4 variance and 7 covariance parameters per class. A methods effect may be indentified in a survey (e.g., similar item wording) which provides a rationale to freely estimate the associated covariance. However, the remaining 6 covariances may be negligible in magnitude providing an empirical basis to fix this sub-set of covariances at zero.

Models 1-4 from book chapter in Masyn (2013):

  • model 1: Equal variances, and covariances fixed to 0
  • model 2: Free variances and covariances fixed to 0
  • model 3: Equal variances, and free covariances
  • model 3a: Equal variances, and equal covariances (extra model)
  • model 4: Free variances and free covariances

Example: Math, Science, Physics, and Biology test score measures (LSAY).

Data source: The example presented in this tutorial utilizes 4 test score measures (Math, Science, Physics, and Biology) from the public-use data source, The Longitudinal Survey of American Youth (LSAY): \(\color{blue}{\text{See documentation here}}\)


Read in data

lsay_data <- read_csv(here("data", "lsay_lpa.csv"))

model 1: Equal variances (fixed) & covariances fixed to zero

In model-1 the item variances are fixed to equality across latent classes (4 item variances per-class) and covariance parameters are fixed to zero (not estimated). In the applied example, each of the 4 items has a unique variance estimated but the variances are held to equality across each of the latent classes in the example (4*1= 4 variance parameters). Relating this to the variance-covariance matrix this model is also named the, class-invariant, diagonal model. This is the default model in Mplus. Because it is a relatively parsimonious model it provides a good entry point to begin an LPA analysis.

Figure. Picture of “diagonal class-invariant model” adapted from Masyn (2013)


Note: Model variance-covariance specifications are stated explicitly here for pedagogical purposes. Model 1 can be replicated by removing the syntax under the MODEL= command as this is the default specification in Mplus.

lpa_m1  <- lapply(1:6, function(k) { 
  z2 <- if_else(k < 2 , '!' , '')
  z3 <- if_else(k < 3 , '!' , '')
  z4 <- if_else(k < 4 , '!' , '')
  z5 <- if_else(k < 5 , '!' , '')
  z6 <- if_else(k < 6 , '!' , '')
  
lpa_enum  <- mplusObject(
      
  TITLE = glue("M1: Class{k} "), 
  
  VARIABLE = glue(
   "usevar = mth_scor-bio_scor;
    classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  MODEL = glue(
    "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
     !!!          Syntax replicates default Mplus specification            !!!
     !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    
     %c#1%
     [mth_scor-bio_scor];     !!! means are are freely estimated  (default)
     mth_scor-bio_scor(1-4);  !!! variances are fixed to equality (default)
                              !!! covariances are fixed to zero   (default)
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor(1-4);
     
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor(1-4);
     
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor(1-4);

{z5} %c#5%
{z5} [mth_scor-bio_scor];
{z5} mth_scor-bio_scor(1-4);

{z6} %c#6%
{z6} [mth_scor-bio_scor];
{z6} mth_scor-bio_scor(1-4);"),
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_m1_fit <- mplusModeler(lpa_enum, 
    dataout=glue(here("m1_enum", "c_lpa_m1.dat")),
    modelout=glue(here("m1_enum", "c{k}_lpa_m1.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

Model 1: Produce model fit table

m1_output <- readModels(here("m1_enum"), quiet = TRUE)

m1_enum <- LatexSummaryTable(m1_output,                               
                keepCols=c("Title","Parameters", "LL", "BIC",         
                           "aBIC", "BLRT_PValue", "T11_VLMR_PValue",
                           "Observations"),    
                sortBy = "Title")                                     

gt(m1_enum)
Title Parameters LL BIC aBIC BLRT_PValue T11_VLMR_PValue Observations
M1: Class1 8 -46288.29 92640.89 92615.47 NA NA 3103
M1: Class2 13 -43352.36 86809.23 86767.93 0 0.0000 3103
M1: Class3 18 -42126.11 84396.93 84339.74 0 0.0000 3103
M1: Class4 23 -41433.72 83052.37 82979.29 0 0.0000 3103
M1: Class5 28 -41051.51 82328.14 82239.17 0 0.1547 3103
M1: Class6 33 -40755.99 81777.31 81672.46 0 0.0149 3103

model 2: Free variances & covariances fixed to zero

In model-2 the variances are freely estimated across latent classes (4 item variances per-class; 4*4=16 variance parameters) and covariance parameters are fixed to zero. In regards to the variance-covariance matrix this model is also named the, class-varying, diagonal model.

lpa_m2  <- lapply(1:6, function(k) { 
  z2 <- if_else(k < 2 , '!' , '')
  z3 <- if_else(k < 3 , '!' , '')
  z4 <- if_else(k < 4 , '!' , '')
  z5 <- if_else(k < 5 , '!' , '')
  z6 <- if_else(k < 6 , '!' , '')
  
lpa_enum  <- mplusObject(
      
  TITLE = glue("M2: Class{k} "), 
  
  VARIABLE = glue(
   "usevar = mth_scor-bio_scor;
    classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  MODEL = glue(
   
   "%c#1%
    [mth_scor-bio_scor];   !!! means are are freely estimated
    mth_scor-bio_scor;     !!! variances are freely estimated
                           !!! covariances are fixed to zero (default)
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor;
     
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor;
     
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor;

{z5} %c#5%
{z5} [mth_scor-bio_scor];
{z5} mth_scor-bio_scor;

{z6} %c#6%
{z6} [mth_scor-bio_scor];
{z6} mth_scor-bio_scor;

"),
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_m2_fit <- mplusModeler(lpa_enum, 
    dataout=glue(here("m2_enum", "c_lpa_m2.dat")),
    modelout=glue(here("m2_enum", "c{k}_lpa_m2.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

Model 2: Produce model fit table

m2_output <- readModels(here("m2_enum"), quiet = TRUE)

m2_enum <- LatexSummaryTable(m2_output,                               
                keepCols=c("Title","Parameters", "LL", "BIC",         
                           "aBIC", "BLRT_PValue", "T11_VLMR_PValue",
                           "Observations"),     
                sortBy = "Title")                                     

gt(m2_enum)
Title Parameters LL BIC aBIC BLRT_PValue T11_VLMR_PValue Observations
M2: Class1 8 -46288.29 92640.89 92615.47 NA NA 3103
M2: Class2 17 -43313.36 86763.40 86709.38 0 0.0000 3103
M2: Class3 26 -42030.92 84270.88 84188.27 0 0.0000 3103
M2: Class4 35 -41324.70 82930.80 82819.59 0 0.0002 3103
M2: Class5 44 -40939.64 82233.04 82093.24 0 0.0281 3103
M2: Class6 53 -40668.83 81763.79 81595.38 0 0.0842 3103

model 3: Equal (fixed) variances & free covariances

In model-3 the variances are fixed to equality across latent classes and covariance parameters are freely estimated. In relation to the variance-covariance matrix this model is also named the, class-invariant, non-diagonal model. In the example below a total of 4 variance parameters and 24 covariance parameters are estimated (i.e., 6 covariances per-class). Note that although this example demonstrates a model with all covariances estimated, any number of the covariance (or variance) parameters can be either fixed or freely estimated. For example, 2 item covariances may have substantive meaning while no theoretical precedent exists for estimating the remaining 4 pairwise item covariances.

lpa_m3  <- lapply(1:4, function(k) { 
  z2 <- if_else(k < 2 , '!' , '')
  z3 <- if_else(k < 3 , '!' , '')
  z4 <- if_else(k < 4 , '!' , '')
  
lpa_enum  <- mplusObject(
      
  TITLE = glue("M3: Class{k} "), 
  
  VARIABLE = glue(
   "usevar = mth_scor-bio_scor;
    classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  MODEL = glue(
    
   "%c#1%
    [mth_scor-bio_scor];          !!! means are freely estimated
    mth_scor-bio_scor(1-4);       !!! variances are fixed to equality
    mth_scor with sci_scor;       !!! covariances are freely estimated
    mth_scor with phy_scor;
    mth_scor with bio_scor;
    sci_scor with phy_scor;
    sci_scor with bio_scor;
    phy_scor with bio_scor;
   
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor(1-4);
{z2} mth_scor with sci_scor;  
{z2} mth_scor with phy_scor;
{z2} mth_scor with bio_scor;
{z2} sci_scor with phy_scor;
{z2} sci_scor with bio_scor;
{z2} phy_scor with bio_scor;
     
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor(1-4);
{z3} mth_scor with sci_scor;   
{z3} mth_scor with phy_scor;
{z3} mth_scor with bio_scor;
{z3} sci_scor with phy_scor;
{z3} sci_scor with bio_scor;
{z3} phy_scor with bio_scor;
     
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor(1-4);
{z4} mth_scor with sci_scor;  
{z4} mth_scor with phy_scor;
{z4} mth_scor with bio_scor;
{z4} sci_scor with phy_scor;
{z4} sci_scor with bio_scor;
{z4} phy_scor with bio_scor;"),
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_m3_fit <- mplusModeler(lpa_enum, 
    dataout=glue(here("m3_enum", "c_lpa_m3.dat")),
    modelout=glue(here("m3_enum", "c{k}_lpa_m3.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

Model 3: Produce model fit table

m3_output <- readModels(here("m3_enum"), quiet = TRUE)

m3_enum <- LatexSummaryTable(m3_output,                                   
                keepCols=c("Title","Parameters", "LL", "BIC",             
                           "aBIC", "BLRT_PValue", "T11_VLMR_PValue",
                           "Observations"),    
                sortBy = "Title")                                         

gt(m3_enum)
Title Parameters LL BIC aBIC BLRT_PValue T11_VLMR_PValue Observations
M3: Class1 14 -39883.58 79879.73 79835.24 NA NA 3103
M3: Class2 25 -39769.44 79739.87 79660.44 0 0.0000 3103
M3: Class3 36 -39724.25 79737.94 79623.55 0 0.1976 3103
M3: Class4 47 -39688.14 79754.17 79604.83 0 0.0322 3103

model 3a: Equal variances & equal (fixed) covariances

In model-3a the variances are fixed to equality across latent classes and covariance parameters are fixed to equality across the 4 latent classes. In-other-words, this model estimates 4 item variances & 6 item covariances. Each unique item has a separate variance estimated and each pairwise covariance has a separate covariance estimated but all the variance-covariance parameters are held to equality across class.

lpa_m3a  <- lapply(1:4, function(k) { 
  z2 <- if_else(k < 2 , '!' , '')
  z3 <- if_else(k < 3 , '!' , '')
  z4 <- if_else(k < 4 , '!' , '')
  
lpa_enum  <- mplusObject(
      
  TITLE = glue("M3a: Class{k} "), 
  
  VARIABLE = glue(
   "usevar = mth_scor-bio_scor;
    classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  MODEL = glue(
    
   "%c#1%
    [mth_scor-bio_scor];          !!! means are freely estimated
    mth_scor-bio_scor(1-4);       !!! variances are fixed to equality
    mth_scor with sci_scor(5);    !!! covariances are fixed to equality
    mth_scor with phy_scor(6);
    mth_scor with bio_scor(7);
    sci_scor with phy_scor(8);
    sci_scor with bio_scor(9);
    phy_scor with bio_scor(10);
   
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor(1-4);
{z2} mth_scor with sci_scor(5);   
{z2} mth_scor with phy_scor(6);
{z2} mth_scor with bio_scor(7);
{z2} sci_scor with phy_scor(8);
{z2} sci_scor with bio_scor(9);
{z2} phy_scor with bio_scor(10);
     
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor(1-4);
{z3} mth_scor with sci_scor(5);   
{z3} mth_scor with phy_scor(6);
{z3} mth_scor with bio_scor(7);
{z3} sci_scor with phy_scor(8);
{z3} sci_scor with bio_scor(9);
{z3} phy_scor with bio_scor(10);
     
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor(1-4);
{z4} mth_scor with sci_scor(5);   
{z4} mth_scor with phy_scor(6);
{z4} mth_scor with bio_scor(7);
{z4} sci_scor with phy_scor(8);
{z4} sci_scor with bio_scor(9);
{z4} phy_scor with bio_scor(10);"),
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_m3a_fit <- mplusModeler(lpa_enum, 
    dataout=glue(here("m3a_enum", "c_lpa_m3a.dat")),
    modelout=glue(here("m3a_enum", "c{k}_lpa_m3a.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

Model 3a: Produce model fit table

m3a_output <- readModels(here("m3a_enum"), quiet = TRUE)

m3a_enum <- LatexSummaryTable(m3a_output,                                     
                keepCols=c("Title","Parameters", "LL", "BIC",               
                           "aBIC", "BLRT_PValue", "T11_VLMR_PValue",
                           "Observations"),
                sortBy = "Title")                                           

gt(m3a_enum)
Title Parameters LL BIC aBIC BLRT_PValue T11_VLMR_PValue Observations
M3a: Class1 14 -39883.58 79879.73 79835.24 NA NA 3103
M3a: Class2 19 -39821.05 79794.86 79734.49 0 0.000 3103
M3a: Class3 24 -39776.46 79745.89 79669.63 0 0.022 3103
M3a: Class4 29 -39745.99 79725.14 79633.00 0 0.003 3103

model 4: Free variances & free covariances

In model-4 the variances are freely estimated across latent classes (4 item variances per-class; 4*4=16 variance parameters) and covariance are freely estimated (6 pairwise item covariances per-class; 6*4=24 covariance parameters). Note that although this example demonstrates a model with all variance/covariance parameters estimated, any number of the covariance (or variance) parameters can be either fixed or freely estimated.

Figure. Picture adapted from Masyn (2013)

lpa_m4  <- lapply(1:4, function(k) { 
  z2 <- if_else(k < 2 , '!' , '')
  z3 <- if_else(k < 3 , '!' , '')
  z4 <- if_else(k < 4 , '!' , '')
  
lpa_enum  <- mplusObject(
      
  TITLE = glue("M4: Class{k} "), 
  
  VARIABLE = glue(
   "usevar = mth_scor-bio_scor;
    classes = c({k}); "),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 200 50; 
    processors = 10;",
  
  MODEL = glue(
    
   "%c#1%
    [mth_scor-bio_scor];          !!! means are freely estimated
    mth_scor-bio_scor;            !!! variances are freely estimated
    mth_scor with sci_scor;       !!! covariances are freely estimated
    mth_scor with phy_scor;
    mth_scor with bio_scor;
    sci_scor with phy_scor;
    sci_scor with bio_scor;
    phy_scor with bio_scor;
   
{z2} %c#2%
{z2} [mth_scor-bio_scor];
{z2} mth_scor-bio_scor;
{z2} mth_scor with sci_scor;  
{z2} mth_scor with phy_scor;
{z2} mth_scor with bio_scor;
{z2} sci_scor with phy_scor;
{z2} sci_scor with bio_scor;
{z2} phy_scor with bio_scor;
     
{z3} %c#3%
{z3} [mth_scor-bio_scor];
{z3} mth_scor-bio_scor;
{z3} mth_scor with sci_scor;   
{z3} mth_scor with phy_scor;
{z3} mth_scor with bio_scor;
{z3} sci_scor with phy_scor;
{z3} sci_scor with bio_scor;
{z3} phy_scor with bio_scor;
     
{z4} %c#4%
{z4} [mth_scor-bio_scor];
{z4} mth_scor-bio_scor;
{z4} mth_scor with sci_scor;  
{z4} mth_scor with phy_scor;
{z4} mth_scor with bio_scor;
{z4} sci_scor with phy_scor;
{z4} sci_scor with bio_scor;
{z4} phy_scor with bio_scor;"),
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  PLOT = 
    "type = plot3; 
     series = mth_scor-bio_scor(*);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lpa_m4_fit <- mplusModeler(lpa_enum, 
    dataout=glue(here("m4_enum", "c_lpa_m4.dat")),
    modelout=glue(here("m4_enum", "c{k}_lpa_m4.inp")) ,
    check=TRUE, run = TRUE, hashfilename = FALSE)
})

Model 4: Produce model fit table

m4_output <- readModels(here("m4_enum"), quiet = TRUE)

m4_enum <- LatexSummaryTable(m4_output,                                 
                keepCols=c("Title","Parameters", "LL", "BIC",           
                           "aBIC", "BLRT_PValue", "T11_VLMR_PValue",    
                           "Observations"),                             
                sortBy = "Title")                                       

gt(m4_enum)
Title Parameters LL BIC aBIC BLRT_PValue T11_VLMR_PValue Observations
M4: Class1 14 -39883.58 79879.73 79835.24 NA NA 3103
M4: Class2 29 -39750.04 79733.26 79641.11 0 0.0000 3103
M4: Class3 44 -39667.68 79689.12 79549.31 0 0.0000 3103
M4: Class4 59 -39622.03 79718.43 79530.96 0 0.0028 3103

Final table of model fit (all models)

allFit <- rbind(m1_enum,m2_enum,m3_enum,m3a_enum,m4_enum)
                   
allFit1 <- allFit %>% 
  mutate(aBIC = -2*LL+Parameters*log((Observations+2)/24)) %>% 
  mutate(CAIC = -2*LL+Parameters*(log(Observations)+1)) %>% 
  mutate(AWE = -2*LL+2*Parameters*(log(Observations)+1.5)) %>%
  mutate(SIC = -.5*BIC) %>% 
  mutate(expSIC = exp(SIC - max(SIC))) %>% 
  mutate(BF = exp(SIC-lead(SIC))) %>% 
  mutate(cmPk = expSIC/sum(expSIC)) %>% 
  select(1:5,9:10,6:7,13,14) 

Format the model fit table

allFit1 %>% 
  mutate(Title = str_remove_all(Title, "M...")) %>% 
  gt() %>%
  tab_header(
    title = md("**Model Fit Summary Table**"), subtitle = md("&nbsp;")) %>% 
  cols_label(
    Title = "Classes",
    Parameters = md("Par"),
    LL = md("*LL*"),
    T11_VLMR_PValue = "VLMR",
    BLRT_PValue = "BLRT",
    BF = md("BF"),
    cmPk = md("*cmPk*")) %>%
  tab_footnote(
    footnote = md(
    "*Note.* Par = Parameters; *LL* = model log likelihood;
      BIC = Bayesian information criterion;
      aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion;
      AWE = approximate weight of evidence criterion;
      BLRT = bootstrapped likelihood ratio test p-value;
      VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value;
      *cmPk* = approximate correct model probability."), 
    locations = cells_title()) %>% 
  tab_options(column_labels.font.weight = "bold") %>% 
  fmt_number(10,decimals = 2,
             drop_trailing_zeros=TRUE,
             suffixing = TRUE) %>% 
  fmt_number(c(3:9,11), 
             decimals = 2) %>% 
  sub_missing(1:11,
              missing_text = "--") %>% 
  fmt(c(8:9,11),
    fns = function(x) 
    ifelse(x<0.001, "<.001",
           scales::number(x, accuracy = 0.01))) %>%
  fmt(10, fns = function(x) 
    ifelse(x>100, ">100",
           scales::number(x, accuracy = .1))) %>%
  tab_row_group(
    label = "Model-1",
    rows = 1:6) %>%
  tab_row_group(
    label = "Model-2",
    rows = 7:12) %>%
  tab_row_group(
    label = "Model-3",
    rows = 13:16) %>%
  tab_row_group(
    label = "Model-3a",
    rows = 17:20) %>%
  tab_row_group(
    label = "Model-4",
    rows = 21:24) %>%
  row_group_order(
      groups = c("Model-1","Model-2","Model-3",
                 "Model-3a","Model-4")) %>% 
tab_style(style = list(cell_text(weight = "bold")),
  locations = list(
  cells_body(columns = BIC, row = BIC == min(BIC[c(1:6)])),
  cells_body(columns = BIC, row = BIC == min(BIC[c(7:12)])),
  cells_body(columns = BIC, row = BIC == min(BIC[c(13:16)])),
  cells_body(columns = BIC, row = BIC == min(BIC[c(17:20)])),
  cells_body(columns = BIC, row = BIC == min(BIC[c(21:24)])),
  cells_body(columns = aBIC, row = aBIC == min(aBIC[c(1:6)])),
  cells_body(columns = aBIC, row = aBIC == min(aBIC[c(7:12)])),
  cells_body(columns = aBIC, row = aBIC == min(aBIC[c(13:16)])),
  cells_body(columns = aBIC, row = aBIC == min(aBIC[c(17:20)])),
  cells_body(columns = aBIC, row = aBIC == min(aBIC[c(21:24)])),
  cells_body(columns = CAIC, row = CAIC == min(CAIC[c(1:6)])),
  cells_body(columns = CAIC, row = CAIC == min(CAIC[c(7:12)])),
  cells_body(columns = CAIC, row = CAIC == min(CAIC[c(13:16)])),
  cells_body(columns = CAIC, row = CAIC == min(CAIC[c(17:20)])),
  cells_body(columns = CAIC, row = CAIC == min(CAIC[c(21:24)])),
  cells_body(columns = AWE, row = AWE == min(AWE[c(1:6)])),
  cells_body(columns = AWE, row = AWE == min(AWE[c(7:12)])),
  cells_body(columns = AWE, row = AWE == min(AWE[c(13:16)])),
  cells_body(columns = AWE, row = AWE == min(AWE[c(17:20)])),
  cells_body(columns = AWE, row = AWE == min(AWE[c(21:24)])),
  cells_body(columns = cmPk, row =  cmPk > .001),
  cells_body(columns = BF, row =  BF > 10),
  cells_body(columns =  T11_VLMR_PValue,
  row =ifelse(T11_VLMR_PValue < .001 & lead(T11_VLMR_PValue) > .001, T11_VLMR_PValue < .001, NA)),
  cells_body(columns =  BLRT_PValue,
  row =ifelse(BLRT_PValue < .001 & lead(BLRT_PValue) > .001, BLRT_PValue < .001, NA))))
Model Fit Summary Table1
Classes Par LL BIC aBIC CAIC AWE BLRT VLMR BF cmPk
Model-1
Class1 8 −46,288.29 92,640.89 92,615.47 92,648.89 92,729.21 0.0 <.001
Class2 13 −43,352.36 86,809.23 86,767.93 86,822.23 86,952.76 <.001 <.001 0.0 <.001
Class3 18 −42,126.11 84,396.93 84,339.74 84,414.93 84,595.65 <.001 <.001 0.0 <.001
Class4 23 −41,433.72 83,052.37 82,979.29 83,075.37 83,306.29 <.001 <.001 0.0 <.001
Class5 28 −41,051.51 82,328.14 82,239.17 82,356.14 82,637.26 <.001 0.15 0.0 <.001
Class6 33 −40,755.99 81,777.31 81,672.46 81,810.31 82,141.64 <.001 0.01 >100 <.001
Model-2
Class1 8 −46,288.29 92,640.89 92,615.47 92,648.89 92,729.21 0.0 <.001
Class2 17 −43,313.36 86,763.40 86,709.38 86,780.40 86,951.08 <.001 <.001 0.0 <.001
Class3 26 −42,030.92 84,270.88 84,188.27 84,296.88 84,557.93 <.001 <.001 0.0 <.001
Class4 35 −41,324.70 82,930.80 82,819.59 82,965.80 83,317.20 <.001 <.001 0.0 <.001
Class5 44 −40,939.64 82,233.04 82,093.24 82,277.04 82,718.81 <.001 0.03 0.0 <.001
Class6 53 −40,668.83 81,763.79 81,595.39 81,816.79 82,348.92 <.001 0.08 0.0 <.001
Model-3
Class1 14 −39,883.58 79,879.73 79,835.24 79,893.73 80,034.29 0.0 <.001
Class2 25 −39,769.44 79,739.87 79,660.44 79,764.88 80,015.88 <.001 <.001 0.4 <.001
Class3 36 −39,724.25 79,737.94 79,623.55 79,773.94 80,135.38 <.001 0.20 >100 <.001
Class4 47 −39,688.14 79,754.17 79,604.83 79,801.17 80,273.05 <.001 0.03 >100 <.001
Model-3a
Class1 14 −39,883.58 79,879.73 79,835.24 79,893.73 80,034.29 0.0 <.001
Class2 19 −39,821.05 79,794.86 79,734.49 79,813.86 80,004.62 <.001 <.001 0.0 <.001
Class3 24 −39,776.46 79,745.89 79,669.63 79,769.88 80,010.85 <.001 0.02 0.0 <.001
Class4 29 −39,745.99 79,725.14 79,633.00 79,754.14 80,045.31 <.001 0.00 >100 <.001
Model-4
Class1 14 −39,883.58 79,879.73 79,835.24 79,893.73 80,034.29 0.0 <.001
Class2 29 −39,750.04 79,733.26 79,641.11 79,762.25 80,053.42 <.001 <.001 0.0 <.001
Class3 44 −39,667.68 79,689.12 79,549.31 79,733.12 80,174.88 <.001 <.001 >100 1.00
Class4 59 −39,622.03 79,718.43 79,530.96 79,777.43 80,369.79 <.001 0.00 <.001
1 Note. Par = Parameters; LL = model log likelihood; BIC = Bayesian information criterion; aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion; AWE = approximate weight of evidence criterion; BLRT = bootstrapped likelihood ratio test p-value; VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value; cmPk = approximate correct model probability.

Interpreting results:

For model-1 the information criteria fit indices (BIC, aBIC, CAIC, AWE) show incremental decreases in the statistic value as class number increases with the minimum value for each IC index at the 6-class model (note; 1-6 class models enumerated). The model comparison tests (BLRT, VLMR, BF, cmPk) show mixed results: The VLMR endorses the 4-class solution while the BLRT, BF, and cmPk are inconclusive. Note, that with the exception of the VLMR the remaining summary statistics indicate best fit only among the 6 models enumerated for this model variance/covariance specification. From this summary table, it is not clear whether a model with a greater number of classes would fit better than the set of solutions shown. This pattern of incremental drop in fit values is not uncommon in LPA/LCA analyses and in this context substantive factors such as the discrimination of the class patterns should also be considered. One common method is to plot the IC values and look for “diminishing returns” in the ICs for each additional class (See figure; Nylund-Gibson & Choi, 2018). However, when looking across variance-covariance models this set of solutions has significantly higher values across all information criteria indicating that the other variance/covariance models have better fit to the data (i.e., model-2,…, model-4).

Model-2 shows a similar pattern to model-1 with 7 of the 8 fit summary statistics endorsing the 6-class solution. The exception is the VLMR which endorses the 4-class solution. To demonstrate the specification difference between models 1 and 2, namely the freeing of the variance parameters across class, these solutions have been plotted side-by-side (see plot below).

Model-3 shows mixed endorsement across the fit indices. The 3-class solution is endorsed by the BIC and BF and the 2-class solution is endorsed by the CAIC and AWE. Model 3a shows the most consistent agreement for the 4-class solution (BIC, aBIC, CAIC, BF). However, the AWE and VLMR indicate the 2-class solution. Looking across the variance/covariance specifications, model-3 shows a significant improvement in model fit relative to model-1 & model-2.

Model-3a shows mixed support for the 2-class and 4-class solutions across the summary statistics. The 4-class solution is endorsed by the BIC, aBIC, CAIC, and BF. In contrast, the 2-class solution is endorsed by the AWE and VLMR. Models 3 and 3a overall have IC values that are very close in magnitude showing no clear support between the two models.

Finally, for model-4 the most consistent solution endorsed is the 3-class solution with five of the seven fit indices in agreement (BIC, CAIC, VLMR, BF, cmPK). For model-4, the CAIC is contra-indicative and endorses the 3-class solution. Overall, model-4 has the lowest relative values across the information criteria so based on these statistics it has better fit to the sample data. However, it is important to note that substantive factors and parsimony should be considered in choosing the final model.


Plot information criteria:

allFit1 %>%
  dplyr::select(1, 4:7) %>%
  rowid_to_column() %>%
  pivot_longer(`BIC`:`AWE`,
               names_to = "Index",
               values_to = "ic_value") %>% 
  separate(Title, c('Model', 'Class'), sep = ": Class")  %>% 
  mutate(Class = as.numeric(Class)) %>%
  mutate(Index = factor(Index,
                        levels = c ("AWE", "CAIC", "BIC", "aBIC"))) %>%
  ggplot(aes(x = Class, y = ic_value,
    color = Index, shape = Index, group = Index, lty = Index)) +
  facet_wrap(~Model, scales="free") +
  geom_point(size = 2.0) + geom_line(size = .8) +
  scale_x_continuous(breaks = 1:nrow(allFit)) +
  theme_minimal() +
  labs(x = "Number of Classes", y = "Information Criteria Value",
       title = "Information Criteria") +
  theme(text = element_text(family = "serif", size = 12),
    legend.text = element_text(family="serif", size=12),
    legend.key.width = unit(3, "line"), legend.title = element_blank(),
    legend.position = "top")


Plot comparison of Model-1 and Model 2 (variances; fixed to equality V. freely estimated):

The plot on the left illustrates variances constrained to equality across class while the plot on the right has variances which are freely estimated across the 6 classes.

NOTE: The plotMixtures() function is used for plotting LPA models only (i.e., means & variances)

a <- plotMixtures(m1_output$c6_lpa_m1.out,
  ci = 0.95, bw = FALSE) 

b <- plotMixtures(m2_output$c6_lpa_m2.out,
  ci = 0.95, bw = FALSE) 

a + labs(title = "Model-1: Variances fixed to equality") +
    theme(plot.title = element_text(size = 12)) +
b + labs(title = "Model-2: Variances freely estimated") +
    theme(plot.title = element_text(size = 12))

Figure. Here we see an ordered solution for model-1 and model-2.

This result might be anticipated given that the correlation between test scores ranges from moderate to high (r = .57 - .95).


Plot conditional means & variances: Model-4, C=3 (free variances & free covariances)

plotMixtures(m4_output$c3_lpa_m4.out, ci = 0.95, bw = FALSE) + 
  labs(title = "Model-4: Free variances & covariances")

Figure. Here we see an ordered solution with variation in item means across each of the four items within-class as well as between-class differences across the 3 classes.


References

Hallquist, M. N., & Wiley, J. F. (2018). MplusAutomation: An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus. Structural equation modeling: a multidisciplinary journal, 25(4), 621-638.

Masyn, K. E. (2013). Latent class analysis and finite mixture modeling.

Miller, J. D., Hoffer, T., Suchner, R., Brown, K., & Nelson, C. (1992). LSAY codebook. Northern Illinois University.

Muthén, B. O., Muthén, L. K., & Asparouhov, T. (2017). Regression and mediation analysis using Mplus. Los Angeles, CA: Muthén & Muthén.

Muthén, L.K. and Muthén, B.O. (1998-2017). Mplus User’s Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén

Rosenberg, J. M., van Lissa, C. J., Beymer, P. N., Anderson, D. J., Schell, M. J. & Schmidt, J. A. (2019). tidyLPA: Easily carry out Latent Profile Analysis (LPA) using open-source or commercial software [R package]. https://data-edu.github.io/tidyLPA/

R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/

Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686