View on GitHub


R package for simulated structural equation modeling

Download this project as a .zip file Download this project as a tar.gz file

Version History (Official)

Version 0.5-13 06/06/16

This version is available on CRAN. This is simply a maintenance version.

Version 0.5-12 02/29/16

This version is available on CRAN. The contact information is updated. This is simply a maintenance version.

Version 0.5-11 06/29/15

This version is available on CRAN. Because lavaan changes the way to deal with equality constraints in parameter tables, the simsem package needs to change the codes that involves with lavaan parameter tables. This package does not have reverse compatibility. That is, the script in running simsem remains the same. However, the objects created from previous version may not run or produce erroneous results in this version.

Version 0.5-9 12/03/14

This version is available on KRAN. The OpenMx feature in this package is updated to be compatible with OpenMx 2.0. We do not make a backward compatibility so the codes from OpenMx lower than 2.0 may not work with this version. We also make the sim function to summarize standardized estimates and compare them with standardized parameter values by using the summaryParam function with std = TRUE.

Version 0.5-8 10/03/14

This version is available on CRAN and KRAN. This version is a maintenance version that fix minor bugs and update codes according to the CRAN policy. The additional feature in this version is that the generate argument in the sim function can take a function. The function must take sample size as its argument and return data frame with appropriate variable names.

Version 0.5-5 03/31/14

This version is available on KRAN. This version is required if users use lavaan 0.5-16. The generate function has an option to get the latent variable scores. Users can see help page of the generate function to see how it works. The sim function also provide the three more options. First, users may create latent variable scores. This option can be used with the second new option: outfundata. The outfundata argument can be used to extract extra information from the generated data sets (e.g., extract latent variable scores). Users may return the information directly or compare with the analyis output. Finally, users may stop all analyses if any errors occur during the estimation. This part of code is contributed by Mikko Ronkko. He also contribute the codes to provide the progress update for the sim function. The progress will be shown if the parallel processing is not used. Users can also globally adjust for the preference of using parallel processing by using R options: options('simsem.multicore'). We also fix all minor bugs we got feedback from users.

Version 0.5-3 03/09/13

This version is available on CRAN and KRAN. This simulation result of this version can save confidence interval computed by any methods (e.g., bootstrap or profile-likelihood). The coverage rate can be investigated by getCoverage, findCoverage, and plotCoverage functions. The confidence interval widths can be investigated by getCIwidth and plotCIwidth, which allows users to estimate sample size and percent missing for accuracy in parameter estimation. The inspect function is written to extract the information from the simulation result, such as inspect(Output, "fit") for fit indices of convergent replications. The summaryTime, summarySeed, and coef are also written to extract information from the simulation result. The empirical argument is added into the generate and createData functions to create multivariate normal data with the sample statistics equal to the model-implied means and covariance matrix from the provided parameters.

Version 0.5-2 03/09/13

This version is available on KRAN. The major update in this function is that users can write a function that returns vectors of parameter estimates, standard errors, fit indices, and convergence status and specify it in the model argument in the sim function. See the help page of the sim function for an example. A minor bug in multiple group analysis was fixed.

Version 0.5-0 02/10/13

This version is available on CRAN and KRAN. Here is a list of major updates:

  1. The generate and model arguments for the sim function can take three formats: simsem template (as previous versions), lavaan script (or lavaan parameter table or a list of arguments for the simulateData, lavaan, cfa, sem, or growth functions), or an OpenMx object. See more details and the example scripts in the vignette.
  2. The generate function can take simsem template, lavaan script, and an OpenMx object. If OpenMx object has definition variables, users need to provide the data of definition variables (in the covData argument) and the model parameters are specified based on the definition variables.
  3. The analyze function can take simsem template and OpenMx object
  4. In the simsem template format, users can add exogenous covariates for data generation, which can take any distributions (or even dummy variables). The KA and GA arguments are added to the model function to account for the effects of exogenous covariates on indicators or factors respectively. Users need to create their own covariate data and add to the covData argument in the generate or sim functions.
  5. Nonnormal data can be created from Vale and Maurelli's (1983) method. Users can simply specify the skewness and kurtosis arguments in the bindDist function.
  6. All arguments that take characters will be case-insensitive.
  7. The combineSim function is added to combine multiple (similar) simulation results into a single simulation result
  8. The sim function has an completeRep argument. If the argument is specified as TRUE, the simulation will be run until the number of convergent replications is equal to or greater than the specified number of replications.
  9. Some minor bugs were fixed.
We are still testing this version. We will announce this update at the beginning of February.

Version 0.4-6 12/30/12

This version is available on CRAN and KRAN. The Mair and colleagues (2012) method for nonnormally-distributed data generation is implemented. Users can specify any multivariate copula for data generation. Please check Example 12 in the vignette. The normally-distributed data are generated so that the common cases between two datasets with different sample size are replicated if the same random seed number is used (Thanks to Paul Johnson). We also make some internal codes to be more efficient and less time-consuming. Some error messages and documentations are updated. All examples in simsemClassic have been transcribed to the current simsem. Several minor bugs are fixed.

Version 0.4-1 10/26/12

This version is available in both CRAN and KRAN. We have recovered all functionalities from the simsemClassic (simsem 0.2-*). Here is a list of updates:

  1. If users have random parameters, equality constraints in data generation can be applied across groups and across matrices. In the previous version, equality constraints are applied within a matrix only.
  2. The model and related functions have the con argument that takes the lavaan script for additional parameters (the ":=" operator), complex equality constraints (the "==" operator), and inequality constraints (the "<" and ">" operators). In data generation, the right-handed-side term will be computed and and the value will be assigned to the left-handed-side term for the ":=" and "==" opertaors. In the "<" and ">" operators, the left-handed-side term will be compare with the right-handed-side term. If the statement is not true, the left-handed-side term will be adjusted to make the statement true with a small difference (i.e., 0.000001). For example, in "load1 > load2", if load1 is 0.5 and load2 is 0.7, load1 will be assigned as 0.7 + 0.000001, which is 0.700001. The additional parameters and the differences between left-handed-side and right-handed-side terms will be also summarized in the simulation result. See the help page of the model function for further details.
  3. The optMisfit and misfitBounds arguments in the draw, generate, and sim functions are working. The optMisfit argument is used to get the misspecification that attains the maximum amount of population misfit. The misfitBounds argument is used to get the misspecification that are within the specified range of population misfit.
  4. The createOrder arguments in the draw, generate, and sim functions are added. This argument is used to specify the order of each of three tasks in data generation process: 1) imposing equality/inequality constraints, 2) imposing model misspecifications, and 3) filling the unspecified parameters (e.g., filling the residual variances when the total variances are specified). For example, createOrder = c(1, 2, 3), which is the default, means that data-generation parameters will be 1) imposed by constraints, 2) imposed by model misspecification, and 3) filled on the unspecified parameters. See the help page of the draw function for further details.
  5. The labels of free parameters are provided in the summary and summaryParam function of a result object.
  6. Minor bugs have been fixed.
  7. The package is updated to be compatible with the lavaan package version 0.5-10.

From this update, all functionalities in the simsemClassic are avaiable in the current version of simsem. Please let us know if you have any questions, find any bugs, or provide any feedbacks. Thank you!

Version 0.3-14 10/22/12

This version is available on KRAN only. In this version, A serious bug was fixed. The previous versions of 0.3-*, including one currently on CRAN, have the problem on any models with the beta matrix (regression coefficients). The independent varaible is swapped with the dependent variable during the matrix translation into a parameter table for lavaan. Therefore, in the previous versions, BE[2,1] are interpreted as BE[1,2] and vice versus. Therefore, any results with a beta matrix from the previous versions cannot be trusted. We are really sorry for this bug. In addition, the problems with indicator and factor labels have been fixed, as well as other minor bugs.

Version 0.3-13 10/17/12

This version is available on KRAN only. This version allows users to impose missing values by logistic regression approach even easier. Users can use a p() function in the intercept of each logistic equation to set the intercept so that the average missing proportion equals the specified value. For example, 'y1 ~ p(0.2) + 0.5*y2' means that the missing proportion increases when y2 increases and the average missing proportion is 0.2. Users may write the target variable without any independent variable, which would be the missing completely at random, such as 'y2 ~ p(0.3)'. The logistic script can be visualized by the plotLogitMiss function.

Version 0.3-12 10/15/12

This version is available on CRAN and KRAN. This version allows users to impose missing values by logistic regression approach. Users can write a script similar to regression equation in lavaan syntax such that (a) each line begins with the imposed variable, (b) '~' is used to represent 'is regressed by', and (c) each line ends with the linear combination of other variables or a constant, such as 'y1 ~ 0.3 + 0.2*y2 + 0.1*y3'. The logistic regression approach can be used with other types of missing data, such as missing completely at random. See the help page of the miss function for further details. We also fix minor bugs in parameter table generation.

Version 0.3-11 09/26/12

This version is available on KRAN only. The getPowerFit and getPowerFitNested functions provide the power of traditional chi-square (absolute or difference) tests. The convergent status is recoded and expanded to account for improper correlation solutions. Most summary function family (e.g., summaryFit or summaryParam) provides an improper argument to include the replications providing improper solutions in the summary. The bugs on the simulation with varying sample size and on two-method designs have been fixed.

Version 0.3-9 09/24/12

This version is available on KRAN only. The problem on the high rate of convergence results has been resolved. The exportData function is introduced to export datasets generated from the model template into data files that are ready to be analyzed by LISREL or Mplus. This version also classify the reasons behind nonconvergent results as can be seen in the summary function. The output format is also slightly modified.

Version 0.3-8 09/24/12

Change the default of the smart argument in the startingVal as FALSE. The problem of high nonconvergent rates should be solved.

Version 0.3-7 09/19/12

This version is availabe on KRAN only. The sim function has the option, smartStart, to use population values as starting values in data analysis. The default is FALSE (not use the population values as starting values). From brief testing, using the population values could make the simulation time longer because of the data management in parsing population values as starting values. We also found the bug that the equality constraints are not applied when there are equality constraints in the model but model misspecification is not specified (when the model misspecification is specified, the previous version did correctly). We fix this bug now. This bug will be applied for those people who use random-parameter data generation model. If any current users have equality constraints in their models, the parameters in the equally constrained values are random, and model misspecification is not specified, the accuracy of the results might be compromised. We also fix the problem in all full SEM models that the population parameters did not match with the parameter estimates. In this version, the parameters are reparameterized to match the analysis result (e.g., change the residual variances of the data-generation population to 1 to match the fixed values in the analysis model). We also fix a couple of minor bugs in this version too. Finally, we still keep updating the documentation. Currently, Example 12 of the new version is posted.

Version 0.3-6 09/13/12

Fix the errors in the sim function when the noninvariance model is run and when the dataOnly and paramOnly arguments are used. This version also allows the number of observations to be unequal across groups.

Version 0.3-5 09/06/12

This version is available in both CRAN and KRAN. We have rewriten the whole package. simsem 0.2-* code cannot be run in the current version. Here is a list of major updates:

  1. All function names are changed so that the package is much more intuitive. The main idea is that users need to build their target models by the model function. This model is built by matrices and vectors in LISREL notation via the bind and binds functions. Then, the model can be used for data generation by the generate function, for data analysis by the analysis function, and for Monte Carlo simulation by the sim function. This version no longer requires data or model objects.
  2. The multiple group feature is available now. Users can put a list of multiple matrix or vector objects (e.g., multiple matrices for LY attribute) for specifying a multiple groups model. Click here to see example multiple group codes.
  3. Faster. From a brief test, this package is about 30% faster than the 0.2 family.
  4. X-side LISREL notation is no longer supported (e.g., no LX, PH, or TD).
  5. Equality constraints are set by putting the same character as the free elements in the matrix or vector objects. This version does not have the equality constraint object.
  6. Random parameters are specified by a quote of command that users use to draw one random number (such as "runif(1, -0.2, 0.2)" instead of using distribution objects in the previous version). This version no longer has distribution objects.
  7. Model misspecification is set in the bind and binds functions when making the matrix or vector objects by the misspec attribute. The misspecification object is not longer supported.
  8. This pacakge does not contain any native code for the auxiliary variables in FIML and multiple-imputation. Instead, this package will call commands from the semTools package which is more efficient. NOTE: auxiliary variables in FIML and multiple-impuation are still supported, the code has simply been moved.
  9. And much more...

Please click here for a brief introduction of the changes from the 0.2 family to 0.3 family. We will gradually update the online wiki pages (currently Examples 1-5 has been fixed) and all example code from our papers and presentations. There are two things that the current version cannot handle whereas the 0.2-family can handle:

  1. The flexible order of filling parameters, equality constraints, and adding model misspecification. This current version supports a single order: 1) impose equality constraints, 2) add model misspecification, and 3) fill parameters (such as residual variances when the total variances are specified).
  2. The maximal method and interval method of model misspecification.

We will update the current version with these features soon. For those people who are working on the old version of simsem, we still provide the old version of simsem but the name is changed to simsemClassic. simsemClassic is available on KRAN only. Please let us know if you have any questions, find any bugs, or provide any feedbacks. Thank you!

simsemClassic Version 0.2-9 09/12/12

This package is the updated version of the simsem 0.2-8. This is the last up date in the 0.2-* series! The change reflects the update in the copula package.

Version 0.2-8 07/06/12

This version is available in both CRAN and KRAN. We have fixed some minor bugs in the program and move indProd and residualCovariate functions to the semTools package. Hopefully, this is the last version in the 0.2 family. We are looking forward to a huge update soon. The (new!) simsem homepage can also be accessed directly from

Version 0.2-7 06/22/12

This version is only available in KRAN. We have fixed some minor bugs in the program. We also add four new examples about how to specify trivial misspecification (Example 22) and nested model comparison (Example 23, 24, and 25).

Version 0.2-6 06/17/12

This version is only available in KRAN. This version allows users to compare nested or nonnested models using the Monte Carlo approach (see the pValueNested and pValueNonNested functions), as well as comparing nonnested model based on likelihood-ratio test (see the likRatioFit function). The functions to find and plot the fit indices cutoff and power in nonnested model comparison are also available (see the getCutoffNonNested, plotCutoffNonNested, getPowerFitNonNested, and plotPowerFitNonNested functions). The varying parameter feature (e.g., continuous sample size) for the nonnested model comparison, however, is not ready yet. We also fixed some minor bugs we found in the program and change the output styles of some functions.

Version 0.2-5 06/13/12

This version is only available in KRAN. We have fixed some minor bugs we found in the program, including the bug when passing a list of datasets in the simResult command. We also added two more examples about power of rejecting misspecified models when sample size or percent missing are varying within a simulation.

Version 0.2-4 06/11/12

This version is only available in KRAN. This version allows users to find the statistical power in rejecting a misspecified model with varying parameters (such as varying sample size or percent missing). Please see the documentation of getPowerFit and plotPowerFit functions for further details. The anova function is also accounted for the varying parameters. This version also allows users to find the fit indices cutoffs in nested model comparison. For example, find the cutoff of the difference in CFI when trivial misspecification is added to the datasets created from the nested model (e.g., if CFI drops less than .004, retain the nested model). Please see the documentation of getCutoffNested and plotCutoffNested for further details. We also add getPowerFitNested and plotPowerFitNested functions to find the power in nested model comparison. All functions in nested model comparison are also accounted for the varying parameters.

Version 0.2-3 06/06/12

This version is only available in KRAN. We add the summaryFit function to directly obtain the fit indices information that is provided in the summary function. We also fix some minor bugs in the package.

Version 0.2-2 06/05/12

This version is only available in KRAN. We implemented the L'Ecuyer (1999) of random number generation to make sure that the simulated datasets in each replication are completely nonoverlapping. We also change the simMissing function to take additional arguments to be passed to Amelia for imputation.

Version 0.2-1 05/28/12

This version is only available in KRAN. We added more functions used for the simulation with varying parameters (e.g., sample size or percent missing completely at random), such as finding fit indices cutoffs, getting power of each parameter estimate, finding the values of varying parameters (e.g., sample size) that provides a specified power, or plot the powers of each parameter against varying parameter. We also added the Example 17 to 19 that shows how simsem deals with the simulation with varying parameters.

Version 0.2-0 05/18/12

This version is the first version uploaded into CRAN. Mostly, this version is similar to the previous version. The vignette is updated by adding a new example. The R documentation are changed so that the amount of time running through all examples dramatically reduced. All comments in the package are clean up.

Version 0.1-6 05/15/12

The internal codes of this version are changed in order to deal with distribution objects differently. By the new method, the problem of using distribution object in hpc is fixable. This version also allows users to specify parameter values only and the program will make nonzero values as free parameters (except 1 in symmetric matrix object). See the documentation of the simMatrix, symMatrix, and simVector functions.

Version 0.1-5 05/14/12

This version allows users to use the maximal (or minimal) method for the misspecified parameters. That is, users can specify the range of values in misspecified parameters regarding as trivial misspecification. Then, the program will draw multiple sets of misspecified parameters and pick the set that provides the maximum (or minimum) of population misfit. See the help page of the simResultParam or SimMisspec for further details. A minor bug is also fixed in this version.

Version 0.1-4 05/10/12

This version is just a bug-fixed version from the previous version. The package currently is available on KRAN.

Version 0.1-3 05/06/12

This version will allow us to calculate population misfit (f0, rmsea, and srmr), constrain the specified misspecification by a range of population misfit, such as rmsea from 0.05 to 0.10. In other words, create data that parameters have RMSEA from .05 to .10. We also make the new class called SimResultParam, which save all actual and misspecified parameters across replications before generating data. Therefore, this class can be used to investigating the population misfit by the summary or plotMisfit function across replications. See the documentation of popMisfit, simResultParam, and plotMisfit for further details.

Version 0.1-2 05/03/12

We made a minor change in the code. The documentations about almost all private functions are added.

Version 0.1-1 04/20/12

This version allows users to create simulated data using the Bollen-Stine bootstrap approach which uses the real data distribution as a template to create new data with similar observed distributions. See the simData function or the runFit function documentations for further details.

Version 0.1-0 04/19/12

This update includes several new features: 1. The code is easier to analyze real data. The result of the real data can be used as parameter values to run a simulation, which means the Monte Carlo approach for model evaluation can be done easily. See Examples 13 and 14 in the manual for further details. 2. Add continuous varying sample size and percent missing feature. Users can make a simulation with varying sample size and see the change of power by logistic curve. Check the continuousPower function for further details. 3. The simsem can set percent attrition that can simulate participants drop out from a study. See imposeMissing function for further details. 4. Add a new feature call function object that can transform data after generating data in the simulation study. See Example 15 in the manual for an example of residual centering for controlling covariates.

Version 0.0-9 03/11/12

The new features are 1) The package can take either correlation/variance inputs or covariance inputs. See Example 1 vs. 2 for a comparison. 2) The anova method is built for comparing two analysis results together (either data analysis or simulation result).

Version 0.0-8 03/05/12

We have added some new features. 1) simsem can account for auxiliary variable by just identifying the column index of the auxiliary variable. The package will make the model accounting the auxiliary variable in FIML for you. For MI, the auxiliary variables will be excluded in the analysis part. 2) We have changed how to impose the MAR missing to the 'threshold' method. See the Example 11 for the detail. 3) We changed the matrix name of all correlation matrices to begin with 'R', such as PS --> RPS, PH --> RPH, or TE --> RTE. The reason of this change is that we will make the simsem package to be able to be set by covariance matrices in the near future.

Version 0.0-7 02/26/12

There are two major updates on this version. 1. The alternative method for data generation is available, which is referred as sequential method. The data will create from exogenous factors first. Then, the data will be passed along the arrows in SEM diagram and adding the generated residuals until we get the observed variables data. 2. The nonnormal distribution data generation is available. The nonnormal data can be modeled on the indicator distributions directly or on the factor distributions (and error distributions). The package uses Gaussian copula for nonnormal data generation. See the details of Gaussian copula in the provided examples.