##### Citations
Title Text Both

## Structural equation modelling

This is an interesting technique where one can make a model and test it for statistical validity. One can also include latent factors which are not measured and hence not in the dataset. The package lavaan makes structural equation modeling quite convenient.

For example, we can create a hypothetical model for bwdf dataset. Here we say that age, race, ptl and ui affect a latent factor, say lat1, while lwt, smoke, ht and ftv affect another latent factor called lat2. Both lat1 and lat2 then affect low parameter.

We can write this model in following form:

Code:

> library(MASS)
> data(birthwt)
> library(lavaan)
> model.bwdf <- '
lat1 =~ age + race + ptl + ui
lat2 =~ lwt + smoke + ht + ftv
low ~ lat1 + lat2
'
> mylavaan
function(model, mydf){
library(lavaan)
fit <- lavaan::sem(model, data=mydf)

cat('--------- summary of sem(model, data)---------')
print(summary(fit, fit.measures=TRUE))

# plot model:
library(semPlot)
print(semPaths(fit,whatLabels="std",style="lisrel", residuals=TRUE, edge.color='black')) # earlier 'est' not 'std'

fit2 <- cfa(model, data=mydf)

cat('------- summary of cfa(model, data) –--------')
print(summary(fit2, fit.measures=TRUE))

print(rnlavaan_modindices(fit2))
fit
}

> mylavaan(model.bwdf, birthwt)

Output graph:

------------------- summary of sem(model, data) -------------------
lavaan (0.5-18) converged normally after 1163 iterations

Number of observations                           189

Estimator                                         ML
Minimum Function Test Statistic               77.268
Degrees of freedom                                25
P-value (Chi-square)                           0.000

Model test baseline model:

Minimum Function Test Statistic              129.622
Degrees of freedom                                36
P-value                                        0.000

User model versus baseline model:

Comparative Fit Index (CFI)                    0.442
Tucker-Lewis Index (TLI)                       0.196

Loglikelihood and Information Criteria:

Loglikelihood user model (H0)              -2464.504
Loglikelihood unrestricted model (H1)      -2425.870

Number of free parameters                         20
Akaike (AIC)                                4969.008
Bayesian (BIC)                              5033.843

Root Mean Square Error of Approximation:

RMSEA                                          0.105
90 Percent Confidence Interval          0.079  0.132
P-value RMSEA <= 0.05                          0.001

Standardized Root Mean Square Residual:

SRMR                                           0.089

Parameter estimates:

Information                                 Expected
Standard Errors                             Standard

Estimate  Std.err  Z-value  P(>|z|)
Latent variables:
lat1 =~
age               1.000
race             -0.157
ptl              -0.096
ui               -0.072
lat2 =~
lwt               1.000
smoke            -0.001
ht                0.003
ftv               0.010

Regressions:
low ~
lat1             28.768
lat2             -1.535

Covariances:
lat1 ~~
lat2             27.640

Variances:
age              26.504
race              0.803
ptl               0.229
ui                0.119
lwt             410.470
smoke             0.237
ht                0.056
ftv               1.069
low               7.336
lat1              1.463
lat2            519.224

-----------------------------------------------
From            To      Weight
10       -->     1       1
10       -->     2       1
10       -->     3       1
10       -->     4       1
11       -->     5       1
11       -->     6       1
11       -->     7       1
11       -->     8       1
10       -->     9       1
11       -->     9       1
1        -->     1       1
2        -->     2       1
3        -->     3       1
4        -->     4       1
5        -->     5       1
6        -->     6       1
7        -->     7       1
8        -->     8       1
9        -->     9       1
10       <->     11      1
11       <->     10      1

------------------- summary of cfa(model, data) -------------------
lavaan (0.5-18) converged normally after 1163 iterations

Number of observations                           189

Estimator                                         ML
Minimum Function Test Statistic               77.268
Degrees of freedom                                25
P-value (Chi-square)                           0.000

Model test baseline model:

Minimum Function Test Statistic              129.622
Degrees of freedom                                36
P-value                                        0.000

User model versus baseline model:

Comparative Fit Index (CFI)                    0.442
Tucker-Lewis Index (TLI)                       0.196

Loglikelihood and Information Criteria:

Loglikelihood user model (H0)              -2464.504
Loglikelihood unrestricted model (H1)      -2425.870

Number of free parameters                         20
Akaike (AIC)                                4969.008
Bayesian (BIC)                              5033.843

Root Mean Square Error of Approximation:

RMSEA                                          0.105
90 Percent Confidence Interval          0.079  0.132
P-value RMSEA <= 0.05                          0.001

Standardized Root Mean Square Residual:

SRMR                                           0.089

Parameter estimates:

Information                                 Expected
Standard Errors                             Standard

Estimate  Std.err  Z-value  P(>|z|)
Latent variables:
lat1 =~
age               1.000
race             -0.157
ptl              -0.096
ui               -0.072
lat2 =~
lwt               1.000
smoke            -0.001
ht                0.003
ftv               0.010

Regressions:
low ~
lat1             28.768
lat2             -1.535

Covariances:
lat1 ~~
lat2             27.640

Variances:
age              26.504
race              0.803
ptl               0.229
ui                0.119
lwt             410.470
smoke             0.237
ht                0.056
ftv               1.069
low               7.336
lat1              1.463
lat2            519.224

-----------------------------------------------

------------------- moreFitIndices() -------------------
gammaHat    adjGammaHat baseline.rmsea     aic.smallN     bic.priorN            hqc            sic
0.9418123      0.8952621      0.1173021   4930.4992370   4986.7249763   4965.4541181             NA
-----------------------------------------------
Error in lavTech(object, "inverted.information.expected") * nobs(object) :
non-numeric argument to binary operator
1: In lav_data_full(data = data, group = group, group.label = group.label,  :
lavaan WARNING: some observed variances are (at least) a factor 1000 times larger than others; use varTable(fit) to investigate
2: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
lavaan WARNING: could not compute standard errors!
lavaan NOTE: this may be a symptom that the model is not identified.

3: In lavaan::lavaan(model = model, data = mydf, model.type = "sem",  :
lavaan WARNING: covariance matrix of latent variables is not positive definite; use inspect(fit,"cov.lv") to investigate.
4: In lav_data_full(data = data, group = group, group.label = group.label,  :
lavaan WARNING: some observed variances are (at least) a factor 1000 times larger than others; use varTable(fit) to investigate
5: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
lavaan WARNING: could not compute standard errors!
lavaan NOTE: this may be a symptom that the model is not identified.

6: In lavaan::lavaan(model = model, data = mydf, model.type = "cfa",  :
lavaan WARNING: covariance matrix of latent variables is not positive definite; use inspect(fit,"cov.lv") to investigate.
7: In lav_model_vcov(lavmodel = lavobject@Model, lavsamplestats = lavobject@SampleStats,  :
lavaan WARNING: could not compute standard errors!
lavaan NOTE: this may be a symptom that the model is not identified.

In the above output, one must note following points:

1. The warning at end states 'could not compute standard model; this may be a symptom that the model is not identified'. This indicates that the model is suboptimal.

2. The fit indices (comparative fit index or CFI and Tucker-Lewis index or TLI) should be as large as possible, preferably > 0.95. Here they are much smaller (0.442 and 0.196, respectively). This also indicates a suboptimal model.

3. The RMSEA and SRMR should be as small as possible, preferably < 0.05. Here, they are not too high (0.1 and 0.09, respectively).

Overall, above model is suboptimal and one needs to redesign it.

References:
mass package: Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0
https://cran.r-project.org/web/packages/MASS/index.html

Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. URL http://www.jstatsoft.org/v48/i02/.