R language Access Menu

Title Text Both  

Structural equation modelling

This is an interesting technique where one can make a model and test it for statistical validity. One can also include latent factors which are not measured and hence not in the dataset. The package lavaan makes structural equation modeling quite convenient. 

For example, we can create a hypothetical model for bwdf dataset. Here we say that age, race, ptl and ui affect a latent factor, say lat1, while lwt, smoke, ht and ftv affect another latent factor called lat2. Both lat1 and lat2 then affect low parameter. 

We can write this model in following form: 

Code:

> library(MASS)
> data(birthwt)
> library(lavaan)
> model.bwdf <- '
    lat1 =~ age + race + ptl + ui
    lat2 =~ lwt + smoke + ht + ftv
    low ~ lat1 + lat2
'
> mylavaan
function(model, mydf){
        library(lavaan)
        fit <- lavaan::sem(model, data=mydf)  

      cat('--------- summary of sem(model, data)---------')
        print(summary(fit, fit.measures=TRUE))

      # plot model: 
        library(semPlot)
        print(semPaths(fit,whatLabels="std",style="lisrel", residuals=TRUE, edge.color='black')) # earlier 'est' not 'std'

        fit2 <- cfa(model, data=mydf)

      cat('------- summary of cfa(model, data) –--------')
          print(summary(fit2, fit.measures=TRUE))

          print(rnlavaan_modindices(fit2))
          fit
}

> mylavaan(model.bwdf, birthwt)

Output graph:

                  

------------------- summary of sem(model, data) -------------------
lavaan (0.5-18) converged normally after 1163 iterations

  Number of observations                           189

  Estimator                                         ML
  Minimum Function Test Statistic               77.268
  Degrees of freedom                                25
  P-value (Chi-square)                           0.000

Model test baseline model:

  Minimum Function Test Statistic              129.622
  Degrees of freedom                                36
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.442
  Tucker-Lewis Index (TLI)                       0.196

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2464.504
  Loglikelihood unrestricted model (H1)      -2425.870

  Number of free parameters                         20
  Akaike (AIC)                                4969.008
  Bayesian (BIC)                              5033.843
  Sample-size adjusted Bayesian (BIC)         4970.492

Root Mean Square Error of Approximation:

  RMSEA                                          0.105
  90 Percent Confidence Interval          0.079  0.132
  P-value RMSEA <= 0.05                          0.001

Standardized Root Mean Square Residual:

  SRMR                                           0.089

Parameter estimates:

  Information                                 Expected
  Standard Errors                             Standard

                   Estimate  Std.err  Z-value  P(>|z|)
Latent variables:
  lat1 =~
    age               1.000
    race             -0.157
    ptl              -0.096
    ui               -0.072
  lat2 =~
    lwt               1.000
    smoke            -0.001
    ht                0.003
    ftv               0.010

Regressions:
  low ~
    lat1             28.768
    lat2             -1.535

Covariances:
  lat1 ~~
    lat2             27.640

Variances:
    age              26.504
    race              0.803
    ptl               0.229
    ui                0.119
    lwt             410.470
    smoke             0.237
    ht                0.056
    ftv               1.069
    low               7.336
    lat1              1.463
    lat2            519.224

-----------------------------------------------
From            To      Weight
10       -->     1       1 
10       -->     2       1 
10       -->     3       1 
10       -->     4       1 
11       -->     5       1 
11       -->     6       1 
11       -->     7       1 
11       -->     8       1 
10       -->     9       1 
11       -->     9       1 
1        -->     1       1 
2        -->     2       1 
3        -->     3       1 
4        -->     4       1 
5        -->     5       1 
6        -->     6       1 
7        -->     7       1 
8        -->     8       1 
9        -->     9       1 
10       <->     11      1 
11       <->     10      1 

------------------- summary of cfa(model, data) -------------------
lavaan (0.5-18) converged normally after 1163 iterations

  Number of observations                           189

  Estimator                                         ML
  Minimum Function Test Statistic               77.268
  Degrees of freedom                                25
  P-value (Chi-square)                           0.000

Model test baseline model:

  Minimum Function Test Statistic              129.622
  Degrees of freedom                                36
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.442
  Tucker-Lewis Index (TLI)                       0.196

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2464.504
  Loglikelihood unrestricted model (H1)      -2425.870

  Number of free parameters                         20
  Akaike (AIC)                                4969.008
  Bayesian (BIC)                              5033.843
  Sample-size adjusted Bayesian (BIC)         4970.492

Root Mean Square Error of Approximation:

  RMSEA                                          0.105
  90 Percent Confidence Interval          0.079  0.132
  P-value RMSEA <= 0.05                          0.001

Standardized Root Mean Square Residual:

  SRMR                                           0.089

Parameter estimates:

  Information                                 Expected
  Standard Errors                             Standard

                   Estimate  Std.err  Z-value  P(>|z|)
Latent variables:
  lat1 =~
    age               1.000
    race             -0.157
    ptl              -0.096
    ui               -0.072
  lat2 =~
    lwt               1.000
    smoke            -0.001
    ht                0.003
    ftv               0.010

Regressions:
  low ~
    lat1             28.768
    lat2             -1.535

Covariances:
  lat1 ~~
    lat2             27.640

Variances:
    age              26.504
    race              0.803
    ptl               0.229
    ui                0.119
    lwt             410.470
    smoke             0.237
    ht                0.056
    ftv               1.069
    low               7.336
    lat1              1.463
    lat2            519.224

-----------------------------------------------

------------------- moreFitIndices() -------------------
      gammaHat    adjGammaHat baseline.rmsea     aic.smallN     bic.priorN            hqc            sic 
     0.9418123      0.8952621      0.1173021   4930.4992370   4986.7249763   4965.4541181             NA 
-----------------------------------------------
Error in lavTech(object, "inverted.information.expected") * nobs(object) : 
  non-numeric argument to binary operator
In addition: Warning messages:
1: In lav_data_full(data = data, group = group, group.label = group.label,  :
  lavaan WARNING: some observed variances are (at least) a factor 1000 times larger than others; use varTable(fit) to investigate
2: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING: could not compute standard errors!
  lavaan NOTE: this may be a symptom that the model is not identified.

3: In lavaan::lavaan(model = model, data = mydf, model.type = "sem",  :
  lavaan WARNING: covariance matrix of latent variables is not positive definite; use inspect(fit,"cov.lv") to investigate.
4: In lav_data_full(data = data, group = group, group.label = group.label,  :
  lavaan WARNING: some observed variances are (at least) a factor 1000 times larger than others; use varTable(fit) to investigate
5: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING: could not compute standard errors!
  lavaan NOTE: this may be a symptom that the model is not identified.

6: In lavaan::lavaan(model = model, data = mydf, model.type = "cfa",  :
  lavaan WARNING: covariance matrix of latent variables is not positive definite; use inspect(fit,"cov.lv") to investigate.
7: In lav_model_vcov(lavmodel = lavobject@Model, lavsamplestats = lavobject@SampleStats,  :
  lavaan WARNING: could not compute standard errors!
  lavaan NOTE: this may be a symptom that the model is not identified.

In the above output, one must note following points: 

1. The warning at end states 'could not compute standard model; this may be a symptom that the model is not identified'. This indicates that the model is suboptimal.

2. The fit indices (comparative fit index or CFI and Tucker-Lewis index or TLI) should be as large as possible, preferably > 0.95. Here they are much smaller (0.442 and 0.196, respectively). This also indicates a suboptimal model.

3. The RMSEA and SRMR should be as small as possible, preferably < 0.05. Here, they are not too high (0.1 and 0.09, respectively).

Overall, above model is suboptimal and one needs to redesign it. 

References:
mass package: Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0 
https://cran.r-project.org/web/packages/MASS/index.html

Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. URL http://www.jstatsoft.org/v48/i02/.
 


    Comments & Feedback