View on GitHub

simsem

R package for simulated structural equation modeling

Download this project as a .zip file Download this project as a tar.gz file

simsem: SIMulated Structural Equation Modeling

This R package has been developed for facilitating simulation and analysis of data within the structural equation modeling (SEM) framework. This package aims to help analysts create simulated data from hypotheses or analytic results from obtained data. The simulated data can be used for different purposes, such as power analysis, model fit evaluation, and planned missing design.

  1. Building simulated sampling distribution for fit indices. This package will help researchers tailor fit indices cutoffs based on a priori alpha level and a priori definition of trivial model misspecification. In other words, this package will help researchers simulate data based on their actual model with their defined trivial model misspecification. Then, the fit indices from the simulated data can be used to create empirical sampling distributions and find the fit indices cutoff based on the sampling distributions. Model parameters can be specified manually or obtained from the results of data analysis. If model parameters are obtained from the results of data analysis, this approach is also called the parametric bootstrap or Monte Carlo approach. The Bollen-Stine bootstrap approach for data generation is also available, as well as creating nonnormal distribution by Gaussian copula.
  2. Power analysis. This package will help analysts find power of both parameter estimates and model evaluation. This package can find power when data are missing. Missing data imposed into simulated data can be 1) missing completely at random (MCAR), 2) missing at random (MAR), 3) missing not at random (NMAR), and 4) planned missing data (n-form design or two-method design). Longitudinal missing data (i.e., attrition) can be modeled as a MCAR or MAR process. Missing data can be handled through multiple imputation of Full Information Maximum Liklihood. Sample size and percent missing can be varied continuously and power plots given values of sample size or percent missing can be built.
  3. Methodological investigations. This package can be used for methodological studies concerning SEM. Researchers can easily vary parameter values and model misspecification across parameters and easily summarize results from simulations. Nested models can be easily tested by comparing the results from two simulation runs. The powerful and flexible missing data options available in simsem make the package extremely useful in methodological investigations concerning SEM and missing data.

Announcements

[May 20, 2014] Symposium on the lavaan ecosystem

Thank you to all those who presented and attended the session at the Modern Modeling Methods Conference on the lavaan ecosystem. Presentations, example code and other links are available here.

[March 31, 2014] Latest Update: simsem, Version 0.5-5

This version is available on KRAN. This version is required if users use lavaan 0.5-16. The generate function has an option to get the latent variable scores. Users can see help page of the generate function to see how it works. The sim function also provide the three more options. First, users may create latent variable scores. This option can be used with the second new option: outfundata. The outfundata argument can be used to extract extra information from the generated data sets (e.g., extract latent variable scores). Users may return the information directly or compare with the analyis output. Finally, users may stop all analyses if any errors occur during the estimation. This part of code is contributed by Mikko Ronkko. He also contribute the codes to provide the progress update for the sim function. The progress will be shown if the parallel processing is not used. Users can also globally adjust for the preference of using parallel processing by using R options: options('simsem.multicore'). We also fix all minor bugs we got feedback from users.

[March 17, 2013] Latest Update: simsem, Version 0.5-3

This version is available on CRAN and KRAN. This simulation result of this version can save confidence interval computed by any methods (e.g., bootstrap or profile-likelihood). The coverage rate can be investigated by getCoverage, findCoverage, and plotCoverage functions. The confidence interval widths can be investigated by getCIwidth and plotCIwidth, which allows users to estimate sample size and percent missing for accuracy in parameter estimation. The inspect function is written to extract the information from the simulation result, such as inspect(Output, "fit") for fit indices of convergent replications. The summaryTime, summarySeed, and coef are also written to extract information from the simulation result. The empirical argument is added into the generate and createData functions to create multivariate normal data with the sample statistics equal to the model-implied means and covariance matrix from the provided parameters.

Main Developers

Sunthud Pornprasertmanit

Patrick Miller

Alexander Schoemann

Program

The project is still under development. The package can be installed by copying this line into R:

install.packages("simsem")

If you are interested in the source code, please click here.

You may install the recently developed test version from KRAN (KU R Archive Network) by 1) making sure that you have the lavaan and copula packages in your personal library, and 2) copying this line into R:

install.packages("simsem", repos="http://rweb.quant.ku.edu/kran", type="source")

Here is how to install simsemClassic (version 0.2-8 of simsem) from KRAN only:

install.packages("simsemClassic", repos="http://rweb.quant.ku.edu/kran", type="source")
If users wish to use the OpenMx package for data generation or data analysis, please follow this link to check the download instruction.

Please report any bugs or give me any suggestions by email.

Materials

Presentations and Papers

Version History

Acknowledgement

The development of simsem has been supported by the University of Kansas Center for Research Methods and Data Analysis.

Partial support for this project was provided by grant NSF 1053160 (Wei Wu & Todd D. Little, co-PIs) and the Center for Research Methods and Data Analysis at the University of Kansas (Todd D. Little, director). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.