Simulation studies in structural equation modeling (SEM) can be
computationally demanding. This tutorial explores practical strategies
for running SEM simulations efficiently in high-performance computing
(HPC) environments using the R package simsem. I was
motivated by a simple question: Is it faster to split jobs into
chunks and run them separately, or to use the built-in
multicore feature in sim()?
To answer this, I developed a new feature and compared it with the
existing functionality in simsem. Since I had already
conducted this comparison, I turned the results into this tutorial to
demonstrate how to use simsem in HPC environments. I
consider three approaches:
multicore = TRUE and
numProcrepRun feature (manual
splitting)As a quick preview, parallel computation is more efficient than serialization and much easier to implement. Later, I discuss a situation where serialization might still be useful.
This tutorial demonstrates:
This note focuses on how to run simulations efficiently in HPC, rather than on the simulation results themselves.
Let me start with a simple bootstrap example that I will later use for empirical illustration. Consider a two-factor model with seven items per factor, standardized loadings of .7, and a factor correlation of .5:
generatescript <- '
f1 ~ 0.7*x1 + 0.7*x2 + 0.7*x3 + 0.7*x4 + 0.7*x5 + 0.7*x6 + 0.7*x7
f2 ~ 0.7*y1 + 0.7*y2 + 0.7*y3 + 0.7*y4 + 0.7*y5 + 0.7*y6 + 0.7*y7
f1 ~~ 0.5*fx
x1 ~~ 0.51*x1
x2 ~~ 0.51*x2
x3 ~~ 0.51*x3
x4 ~~ 0.51*x4
x5 ~~ 0.51*x5
x6 ~~ 0.51*x6
x7 ~~ 0.51*x7
y1 ~~ 0.51*y1
y2 ~~ 0.51*y2
y3 ~~ 0.51*y3
y4 ~~ 0.51*y4
y5 ~~ 0.51*y5
y6 ~~ 0.51*y6
y7 ~~ 0.51*y7
'
For illustration, I first generate a single dataset and compute the
confidence intervals manually before moving to simsem.
set.seed(123321) is used for data generation.
library(lavaan)
## Warning: package 'lavaan' was built under R version 4.5.2
## This is lavaan 0.6-21
## lavaan is FREE software! Please report any bugs.
set.seed(123321)
dat <- simulateData(generatescript, sample.nobs=500)
Next, I analyze data using bootstrap (se="boot"). Note
that iseed = 787221 is specified so that the same bootstrap
samples are drawn across runs.
analyzescript <- '
f1 ~ x1 + x2 + x3 + x4 + x5 + x6 + x7
f2 ~ y1 + y2 + y3 + y4 + y5 + y6 + y7
f1 ~~ cov*fx
'
out <- cfa(analyzescript, data=dat, std.lv=TRUE, se="boot", iseed=787221)
out
## lavaan 0.6-21 ended normally after 15 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 19
##
## Number of observations 500
##
## Model Test User Model:
##
## Test statistic 27.686
## Degrees of freedom 29
## P-value (Chi-square) 0.535
I compare three types of confidence intervals (CI) with a confidence
level of .95 and 1,000 bootstrap samples (the default in
lavaan).
First, Wald CI is computed by using vcov to get variance
of the factor correlation estimate, calculate standard error, and then
the Wald CI.
est <- coef(out)["cov"]
sewald <- sqrt(vcov(out)["cov", "cov"])
crit <- qnorm(1 - (1 - 0.95)/2)
waldlower <- est - crit*sewald
waldupper <- est + crit*sewald
c(waldlower, waldupper)
## cov cov
## 0.3426860 0.5098541
Second, percentile bootstrap CI
percout <- data.frame(parameterestimates(out, boot.ci.type = "perc", level=0.95))
perclower <- percout[percout$label == "cov", "ci.lower"]
percupper <- percout[percout$label == "cov", "ci.upper"]
c(perclower, percupper)
## [1] 0.3407223 0.5086200
Third, bias-corrected bootstrap CI
bcaout <- data.frame(parameterestimates(out, boot.ci.type = "bca.simple",
level=0.95))
bcalower <- bcaout[bcaout$label == "cov", "ci.lower"]
bcaupper <- bcaout[bcaout$label == "cov", "ci.upper"]
c(bcalower, bcaupper)
## [1] 0.3473720 0.5123705
These three approaches will later be compared within the simulation
framework. I then define a function to automatically obtain all types of
CI, which will be used later in simsem.
ciextract <- function(out) {
est <- coef(out)["cov"]
sewald <- sqrt(vcov(out)["cov", "cov"])
crit <- qnorm(1 - (1 - 0.95)/2)
waldlower <- est - crit*sewald
waldupper <- est + crit*sewald
percout <- data.frame(parameterestimates(out, boot.ci.type = "perc", level=0.95))
perclower <- percout[percout$label == "cov", "ci.lower"]
percupper <- percout[percout$label == "cov", "ci.upper"]
bcaout <- data.frame(parameterestimates(out, boot.ci.type = "bca.simple", level=0.95))
bcalower <- bcaout[bcaout$label == "cov", "ci.lower"]
bcaupper <- bcaout[bcaout$label == "cov", "ci.upper"]
c(waldlower=waldlower, waldupper=waldupper,
perclower=perclower, percupper=percupper,
bcalower=bcalower, bcaupper=bcaupper)
}
ciextract(out)
## waldlower.cov waldupper.cov perclower percupper bcalower
## 0.3426860 0.5098541 0.3407223 0.5086200 0.3473720
## bcaupper
## 0.5123705
As usual, sim() is used to run the simulation.
library(simsem)
##
## #################################################################
## This is simsem 0.5.16.914
## simsem is BETA software! Please report any bugs.
## simsem was first developed at the University of Kansas Center for
## Research Methods and Data Analysis, under NSF Grant 1053160.
## #################################################################
##
## Attaching package: 'simsem'
## The following object is masked from 'package:lavaan':
##
## inspect
output_n <- sim(nRep=10, # Set to 10 for illustration
model=analyzescript, # Analysis model
generate=generatescript, # Data-generation model
n=500, # Sample sizes for each generated data
std.lv=TRUE, # Lavaan option to standardize latent variables (factors)
lavaanfun="cfa", # Lavaan function used
se="boot", # Lavaan option to run bootstrap
iseed=787221, # Lavaan option to set seed for lavaan bootstrap (same row index is used)
outfun=ciextract, # Extra output extracted from each lavaan results
seed = 123321 # Set seed for data generation
)
## The RNGkind() was changed from "Mersenne-Twister" to RNGkind("L'Ecuyer-CMRG") in order to enable the multiple streams in the parallel package.
## To set a RNG seed using the previous RNGkind(), you can use
## RNGkind("Mersenne-Twister","Inversion","Rejection")
## or
## set.seed(seed, kind = "Mersenne-Twister")
## Progress: 1 / 10
## Progress: 2 / 10
## Progress: 3 / 10
## Progress: 4 / 10
## Progress: 5 / 10
## Progress: 6 / 10
## Progress: 7 / 10
## Progress: 8 / 10
## Progress: 9 / 10
## Progress: 10 / 10
Use getExtraOutput() to extract the lower and upper
bounds of the CIs.
cibounds_n <- getExtraOutput(output_n)
cibounds_n <- do.call(rbind, cibounds_n)
cibounds_n
## waldlower.cov waldupper.cov perclower percupper bcalower bcaupper
## [1,] 0.4444868 0.6728764 0.4393079 0.6703178 0.4393079 0.6703178
## [2,] 0.4254273 0.6217501 0.4251118 0.6207501 0.4359865 0.6328827
## [3,] 0.4088480 0.5943929 0.4092004 0.5997734 0.4111010 0.6034957
## [4,] 0.3082559 0.5023283 0.3084828 0.5009681 0.3124582 0.5130160
## [5,] 0.3936783 0.5805999 0.3900957 0.5782858 0.3977782 0.5806980
## [6,] 0.3722804 0.5526076 0.3780085 0.5582242 0.3783728 0.5587538
## [7,] 0.3751542 0.5953503 0.3753101 0.5962607 0.3825973 0.6101467
## [8,] 0.3897483 0.5596978 0.3924128 0.5658567 0.3953604 0.5683240
## [9,] 0.3955311 0.5982241 0.3955057 0.5998806 0.4072052 0.6130370
## [10,] 0.2963299 0.4669292 0.2950193 0.4633839 0.2963629 0.4658216
In this example, specify multicore = TRUE and
numProc = 4.
output_p <- sim(nRep=10, model=analyzescript, generate=generatescript,
n=500, std.lv=TRUE, lavaanfun="cfa", se="boot", iseed=787221,
outfun=ciextract, seed=123321,
multicore=TRUE, # Use parallel processing
numProc=4 # Use four processors
)
## Progress tracker is not available when 'multicore' is TRUE.
Use getExtraOutput() to extract the lower and upper
bounds of the CIs.
cibounds_p <- getExtraOutput(output_p)
cibounds_p <- do.call(rbind, cibounds_p)
cibounds_p
## waldlower.cov waldupper.cov perclower percupper bcalower bcaupper
## [1,] 0.4444868 0.6728764 0.4393079 0.6703178 0.4393079 0.6703178
## [2,] 0.4254273 0.6217501 0.4251118 0.6207501 0.4359865 0.6328827
## [3,] 0.4088480 0.5943929 0.4092004 0.5997734 0.4111010 0.6034957
## [4,] 0.3082559 0.5023283 0.3084828 0.5009681 0.3124582 0.5130160
## [5,] 0.3936783 0.5805999 0.3900957 0.5782858 0.3977782 0.5806980
## [6,] 0.3722804 0.5526076 0.3780085 0.5582242 0.3783728 0.5587538
## [7,] 0.3751542 0.5953503 0.3753101 0.5962607 0.3825973 0.6101467
## [8,] 0.3897483 0.5596978 0.3924128 0.5658567 0.3953604 0.5683240
## [9,] 0.3955311 0.5982241 0.3955057 0.5998806 0.4072052 0.6130370
## [10,] 0.2963299 0.4669292 0.2950193 0.4633839 0.2963629 0.4658216
In simsem version 0.5-17 or higher, users can specify
repRun so that simsem runs only a subset of
replications. For example, I split this simulation into three
sim(): 1:3, 4:6, and
7:10.
output_s1 <- sim(nRep=10, model=analyzescript, generate=generatescript,
n=500, std.lv=TRUE, lavaanfun="cfa", se="boot", iseed=787221,
outfun=ciextract, seed=123321,
repRun=1:3 # Run 1st, 2nd, 3rd replications
)
## Progress: 1 / 3
## Progress: 2 / 3
## Progress: 3 / 3
output_s2 <- sim(nRep=10, model=analyzescript, generate=generatescript,
n=500, std.lv=TRUE, lavaanfun="cfa", se="boot", iseed=787221,
outfun=ciextract, seed=123321,
repRun=4:6 # Run 4, 5, 6 replications
)
## Progress: 1 / 3
## Progress: 2 / 3
## Progress: 3 / 3
output_s3 <- sim(nRep=10, model=analyzescript, generate=generatescript,
n=500, std.lv=TRUE, lavaanfun="cfa", se="boot", iseed=787221,
outfun=ciextract, seed=123321,
repRun=7:10 # Run 7, 8, 9, 10 replications
)
## Progress: 1 / 4
## Progress: 2 / 4
## Progress: 3 / 4
## Progress: 4 / 4
output_s <- combineSim(output_s1, output_s2, output_s3)
combineSim() is used to pool results from the manually
split simulations. Use getExtraOutput() to extract the
lower and upper bounds of the CIs.
cibounds_s <- getExtraOutput(output_s)
cibounds_s <- do.call(rbind, cibounds_s)
cibounds_s
## waldlower.cov waldupper.cov perclower percupper bcalower bcaupper
## [1,] 0.4444868 0.6728764 0.4393079 0.6703178 0.4393079 0.6703178
## [2,] 0.4254273 0.6217501 0.4251118 0.6207501 0.4359865 0.6328827
## [3,] 0.4088480 0.5943929 0.4092004 0.5997734 0.4111010 0.6034957
## [4,] 0.3082559 0.5023283 0.3084828 0.5009681 0.3124582 0.5130160
## [5,] 0.3936783 0.5805999 0.3900957 0.5782858 0.3977782 0.5806980
## [6,] 0.3722804 0.5526076 0.3780085 0.5582242 0.3783728 0.5587538
## [7,] 0.3751542 0.5953503 0.3753101 0.5962607 0.3825973 0.6101467
## [8,] 0.3897483 0.5596978 0.3924128 0.5658567 0.3953604 0.5683240
## [9,] 0.3955311 0.5982241 0.3955057 0.5998806 0.4072052 0.6130370
## [10,] 0.2963299 0.4669292 0.2950193 0.4633839 0.2963629 0.4658216
As expected, all CI bounds are identical across the three approaches.
identical(cibounds_n, cibounds_p, cibounds_s)
## [1] TRUE
I used the bootstrap example described above to run the mock simulation with the following design conditions:
The total number of conditions is \(2
\times 2 \times 2 \times 2 = 16\). Without parallel processing or
splitting, sixteen separate sim() runs are required. Let me
briefly show how I set up the simulation workflow. This is just my
current style—other approaches would work as well.
expand.grid(). The
output is the data frame with 16 rows.p <- c(3, 7)
faccor <- c(0.2, 0.5)
loading <- c(0.5, 0.7)
samplesizes <- c(500, 1000)
conds <- expand.grid(p, faccor, loading, samplesizes) # 16 conditions
colnames(conds) <- c("p", "faccor", "loading", "samplesizes")
nRep <- 1000
sim(). Note that the
jobid ranges from 1 to 16, corresponding to each row of
conditions. I will show you how to change jobid using HPC
later. Note that cond represents the row corresponding to
the jobid.jobid <- 1
cond <- conds[jobid, ]
generatescript described above. String parsing is used to
modify the script according to the number of items per factor, factor
correlation, and factor loading.xs <- paste0("x", 1:cond$p)
ys <- paste0("y", 1:cond$p)
load_x <- paste(paste(cond$loading, xs, sep="*"), collapse = "+")
load_y <- paste(paste(cond$loading, ys, sep="*"), collapse = "+")
resid_x <- paste0(xs, "~~", (1 - cond$loading^2), "*", xs)
resid_y <- paste0(ys, "~~", (1 - cond$loading^2), "*", ys)
generatescript <- paste(
paste0("f1 =~ ", load_x),
paste0("f2 =~ ", load_y),
paste0("f1 ~~ ", cond$faccor, "*f2"),
paste(resid_x, collapse = "\n"),
paste(resid_y, collapse = "\n"),
sep = "\n"
)
load_x2 <- paste(xs, collapse = "+")
load_y2 <- paste(ys, collapse = "+")
analyzescript <- paste(
paste0("f1 =~ ", load_x2),
paste0("f2 =~ ", load_y2),
"f1 ~~ cov*f2",
sep = "\n"
)
ciextract() function to obtain extra outputs (CI
bounds from three methods)ciextract <- function(out) {
est <- coef(out)["cov"]
sewald <- sqrt(vcov(out)["cov", "cov"])
crit <- qnorm(1 - (1 - 0.95)/2)
waldlower <- est - crit*sewald
waldupper <- est + crit*sewald
percout <- data.frame(parameterestimates(out, boot.ci.type = "perc", level=0.95))
perclower <- percout[percout$label == "cov", "ci.lower"]
percupper <- percout[percout$label == "cov", "ci.upper"]
bcaout <- data.frame(parameterestimates(out, boot.ci.type = "bca.simple", level=0.95))
bcalower <- bcaout[bcaout$label == "cov", "ci.lower"]
bcaupper <- bcaout[bcaout$label == "cov", "ci.upper"]
c(waldlower=waldlower, waldupper=waldupper,
perclower=perclower, percupper=percupper,
bcalower=bcalower, bcaupper=bcaupper)
}
output <- sim(
nRep=nRep,
model=analyzescript,
generate=generatescript,
n=cond$samplesizes,
std.lv=TRUE,
lavaanfun="cfa",
se="boot",
iseed=787221,
outfun=ciextract,
seed = 123321,
repRun = repRun
)
name <- paste0("simsem_serial_", cond$p, "_", as.integer(cond$faccor * 10), "_",
as.integer(cond$loading * 10), "_", cond$samplesizes)
saveRDS(output, file = outfile)
For no-parallel jobs, 16 jobs for 16 conditions are run in HPC. Let’s
say that the SLURM script saved as submit_noparallel.sh
is as follows:
#!/bin/bash -l
#SBATCH -p compute
#SBATCH -A pv911001
#SBATCH -N 1
#SBATCH --cpus-per-task=1
#SBATCH --time=12:00:00
#SBATCH --job-name=simsem_noparallel
#SBATCH --array=1-16
#SBATCH --output=logs/simsem_noparallel_%A_%a.out
module load cray-R/4.4.0
Rscript --vanilla splitsimresult_hpc_noparallel.R
Note that this SLURM script is exactly as I used in my accessible HPC
(see acknowledgement). The R file I used to save the script is splitsimresult_hpc_noparallel.R,
which contains the script described above. The only notable change is
Step 3, which is modified as follows:
jobid <- as.numeric(Sys.getenv("SLURM_ARRAY_TASK_ID"))
cond <- conds[jobid, ]
That is, each HPC job will have its own array ID, which ranges from 1
to 16. Sys.getenv is to obtain the saved object, which is
"SLURM_ARRAY_TASK_ID". For example, if the value is 15,
cond will correspond to the 15th row.
After submit_noparallel.sh
and splitsimresult_hpc_noparallel.R
are in the same directory in HPC, the sbatch command is
used on the head node to submit the jobs as follows:
sbatch submit_noparallel.sh
Note that different HPC systems may have different routines. Also, in the actual HPC script, I check whether the output file already exists, because some jobs may fail before completion. I can resubmit the SLURM script and the simulation will be rerun for unfinished jobs.
As another note, users may need to check the R version used in their
system. The installation of simsem and lavaan
can also be a bit tricky in HPC environments. In practice, users should
load the same R version on the head node that will later be used on the
compute nodes. After that, run R and install simsem and
lavaan in a local directory.
If installation is required, make sure to install the packages in a user-level library, for example:
install.packages("package_name", lib = "~/R/4.4.0")
Feel free to adjust the library path to match your system.
Also note that I explicitly specify the library path at the beginning of the R script:
.libPaths(c("~/R/4.4.0", .libPaths()))
This step ensures that the required packages are available on the compute nodes during job execution.
For parallel jobs, 16 jobs for 16 conditions are run in HPC as in
no-parallel condition. Let’s say that the SLURM script saved as submit_parallel.sh
is as follows:
#!/bin/bash -l
#SBATCH -p compute
#SBATCH -A pv911001
#SBATCH -N 1
#SBATCH --cpus-per-task=8
#SBATCH --time=12:00:00
#SBATCH --job-name=simsem_parallel
#SBATCH --array=1-16
#SBATCH --output=logs/simsem_parallel_%A_%a.out
module load cray-R/4.4.0
Rscript --vanilla splitsimresult_hpc_parallel.R
Note that cpus-per-task is set to 8 to use eight
processors. The R script is changed to splitsimresult_hpc_parallel.R.
In the R file, these steps are adjusted. First, Step 3 is adjusted:
jobid <- as.numeric(Sys.getenv("SLURM_ARRAY_TASK_ID"))
cond <- conds[jobid, ]
Second, Step 7 is adjusted:
output <- sim(
nRep=nRep,
model=analyzescript,
generate=generatescript,
n=cond$samplesizes,
std.lv=TRUE,
lavaanfun="cfa",
se="boot",
iseed=787221,
outfun=ciextract,
seed = 123321,
multicore = TRUE,
numProc = 8
)
The sim() call is modified to enable parallel
processing. The numProc is set to 8 as in the SLURM script.
The sbatch command in the Linux system can be used to
submit the parallel jobs. In this empirical illustration, I also
evaluate performance with 16 and 32 processors.
As the new feature, the jobs are manually split into 20 chunks. Each chunk runs 50 replications. That is, \(16 \times 20\) jobs are run on the HPC system. Thus, the SLURM script will have 320 jobs with only one processor as follows:
#!/bin/bash -l
#SBATCH -p compute
#SBATCH -A pv911001
#SBATCH -N 1
#SBATCH --cpus-per-task=1
#SBATCH --time=24:00:00
#SBATCH --job-name=simsem_serial
#SBATCH --array=1-320
#SBATCH --output=logs/simsem_serial_%A_%a.out
module load cray-R/4.4.0
Rscript --vanilla splitsimresult_hpc_serial.R
Note that the array contains 320 jobs. When R reads the array index, it is translated into both the row of conditions and the chunk number. First, Step 2 is modified to account for chunking:
nRep <- 1000
nChunks <- 20
chunkSize <- nRep / nChunks
The number of chunks is defined as 20 and the chunk size is defined as 50. Next, Step 3 is adjusted as follows:
jobid <- as.numeric(Sys.getenv("SLURM_ARRAY_TASK_ID"))
cond_id <- ((jobid - 1) %/% nChunks) + 1
chunk_id <- ((jobid - 1) %% nChunks) + 1
cond <- conds[cond_id, ]
start <- (chunk_id - 1) * chunkSize + 1
end <- chunk_id * chunkSize
repRun <- start:end
The jobid is used to determine both the condition
(cond_id) and the chunk (chunk_id). The
division gives the condition index, and the remainder gives the chunk
index. Next, repRun is defined based on the
chunk_id. Next, Step 7 is adjusted by adding
repRun as follows:
output <- sim(
nRep=nRep,
model=analyzescript,
generate=generatescript,
n=cond$samplesizes,
std.lv=TRUE,
lavaanfun="cfa",
se="boot",
iseed=787221,
outfun=ciextract,
seed = 123321,
repRun = repRun
)
Finally, Step 8 is adjusted to save the result output for each chunk:
name <- paste0("simsem_serial_", cond$p, "_", as.integer(cond$faccor * 10), "_",
as.integer(cond$loading * 10), "_", cond$samplesizes)
outfile <- sprintf("%s_chunk%02d.rds", name, chunk_id)
saveRDS(output, file = outfile)
In the GitHub repository, you will see the following files used in these simulations:
simsem/gh-pages/vignettes/hpc-simulation-simsem
| Name | Script | SLURM |
|---|---|---|
| No Parallel | splitsimresult_hpc_noparallel.R | submit_noparallel.sh |
| Parallel (8) | splitsimresult_hpc_parallel.R | submit_parallel.sh |
| Parallel (16) | splitsimresult_hpc_parallel16.R | submit_parallel16.sh |
| Parallel (32) | splitsimresult_hpc_parallel32.R | submit_parallel32.sh |
| Serialization | splitsimresult_hpc_serial.R | submit_serial.sh |
| Method | Mean Wall Time | Min | Max | Number of Jobs | Normalized Service Hours |
|---|---|---|---|---|---|
| No Parallel | 7:46:24 | 6:57:31 | 9:18:15 | 16 | 0.97 |
| Parallel (8) | 0:59:34 | 0:50:57 | 1:13:39 | 16 | 1.00 |
| Parallel (16) | 0:30:31 | 0:26:24 | 0:38:17 | 16 | 1.02 |
| Parallel (32) | 0:16:36 | 0:14:24 | 0:21:21 | 16 | 1.11 |
| Serialization | 0:27:51 | 0:20:26 | 0:41:08 | 320 | 1.20 |
Parallelization works as expected: increasing the number of processors decreases runtime, with near-linear improvement up to 32 cores. Splitting into 20 chunks is roughly equivalent to using 20 processors. However, it results in higher service usage, suggesting additional overhead from job management and repeated setup. The no-parallel approach uses the lowest service hours but has the longest wall time. Note that 1 service hour corresponds to 128 processor hours.
In other words, if sufficient processors are available, parallel processing is both simpler and more efficient than manual serialization.
If parallel processing is available, it is generally recommended. However, many researchers may not have access to a large number of processors. For example, some HPC systems provide free access for jobs using only a small number of processors (e.g., 4) and limited wall time (e.g., less than 24 hours). Parallel jobs using 4 processors may exceed the wall-time limit. In this case, manually splitting the jobs and running each with 4 processors can be efficient—so serialization is still a reasonable option.
Users might wonder why I developed repRun. Why not
simply use different data-generating seeds for each chunk? The reason is
simple: I want the single run and the split run to produce exactly the
same results—that’s it. If users are not concerned about strict
reproducibility, using different seeds is also a viable option.
The simulation results are not the main focus of this article. The results from the no-parallel, serialization, and parallel processing are identical. Readers may skip this section and go directly to the conclusions.
See pooloutputs.R
to check the result. In this file, first, I repeat the design
conditions, the number of replications (nRep), and chunk
size (nchunks).
p <- c(3, 7)
faccor <- c(0.2, 0.5)
loading <- c(0.5, 0.7)
samplesizes <- c(500, 1000)
nRep <- 1000
nchunks <- 20
conds <- expand.grid(p, faccor, loading, samplesizes)
colnames(conds) <- c("p", "faccor", "loading", "samplesizes")
Next, a for-loop is used to iterate over each row of
conds. The RDS file for each condition and for each method
is extracted as in output_n*. Next, the extra outputs (CI
bounds) are extracted in temp_cibounds_*. The extra output
is combined into a matrix using rbind and the design
conditions are added as columns and attached to cibounds_*
outside the loop.
cibounds_n <- NULL
cibounds_p8 <- NULL
cibounds_p16 <- NULL
cibounds_p32 <- NULL
cibounds_s <- NULL
for(i in 1:nrow(conds)) {
cond <- conds[i,]
cond_expanded <- as.matrix(cond[rep(1, nRep), ])
faccor <- cond$faccor
namerow <- paste0(cond$p, "_", as.integer(cond$faccor * 10), "_",
as.integer(cond$loading * 10), "_", cond$samplesizes)
noparallel <- paste0("outputs//simsem_noparallel_", namerow, ".rds")
output_n <- readRDS(file = noparallel)
temp_cibounds_n <- getExtraOutput(output_n)
temp_cibounds_n <- do.call(rbind, temp_cibounds_n)
temp_cibounds_n <- cbind(temp_cibounds_n, cond_expanded)
cibounds_n <- rbind(cibounds_n, temp_cibounds_n)
parallel8 <- paste0("outputs//simsem_parallel_", namerow, ".rds")
output_p8 <- readRDS(file = parallel8)
temp_cibounds_p8 <- getExtraOutput(output_p8)
temp_cibounds_p8 <- do.call(rbind, temp_cibounds_p8)
temp_cibounds_p8 <- cbind(temp_cibounds_p8, cond_expanded)
cibounds_p8 <- rbind(cibounds_p8, temp_cibounds_p8)
parallel16 <- paste0("outputs//simsem_parallel16_", namerow, ".rds")
output_p16 <- readRDS(file = parallel16)
temp_cibounds_p16 <- getExtraOutput(output_p16)
temp_cibounds_p16 <- do.call(rbind, temp_cibounds_p16)
temp_cibounds_p16 <- cbind(temp_cibounds_p16, cond_expanded)
cibounds_p16 <- rbind(cibounds_p16, temp_cibounds_p16)
parallel32 <- paste0("outputs//simsem_parallel32_", namerow, ".rds")
output_p32 <- readRDS(file = parallel32)
temp_cibounds_p32 <- getExtraOutput(output_p32)
temp_cibounds_p32 <- do.call(rbind, temp_cibounds_p32)
temp_cibounds_p32 <- cbind(temp_cibounds_p32, cond_expanded)
cibounds_p32 <- rbind(cibounds_p32, temp_cibounds_p32)
serial <- paste0("outputs//simsem_serial_", namerow)
temp_cibounds_s <- NULL
for(j in 1:nchunks) {
outfile <- sprintf("%s_chunk%02d.rds", serial, j)
output_s <- readRDS(file = outfile)
temptemp_cibounds_s <- getExtraOutput(output_s)
temp_cibounds_s <- c(temp_cibounds_s, temptemp_cibounds_s)
}
temp_cibounds_s <- do.call(rbind, temp_cibounds_s)
temp_cibounds_s <- cbind(temp_cibounds_s, cond_expanded)
cibounds_s <- rbind(cibounds_s, temp_cibounds_s)
}
Check whether all CI bounds are identical, which they are.
identical(cibounds_n, cibounds_p8, cibounds_p16, cibounds_p32, cibounds_s)
## [1] TRUE
Because they are all identical, only cibounds_n is used
to summarize the outputs. Next, compute the coverage rate for each
replication.
cibounds_n <- as.data.frame(cibounds_n)
cibounds_n$coverage_wald <- with(cibounds_n, waldlower.cov < faccor & faccor < waldupper.cov)
cibounds_n$coverage_perc <- with(cibounds_n, perclower < faccor & faccor < percupper)
cibounds_n$coverage_bca <- with(cibounds_n, bcalower < faccor & faccor < bcaupper)
Next, use aggregate() to find the coverage rate for each
design condition.
sumout <- aggregate(cbind(coverage_wald, coverage_perc, coverage_bca)
~ p + faccor + loading + samplesizes, data=cibounds_n,
FUN=mean)
sumout
## p faccor loading samplesizes coverage_wald coverage_perc coverage_bca
## 1 3 0.2 0.5 500 0.956 0.957 0.959
## 2 7 0.2 0.5 500 0.953 0.952 0.952
## 3 3 0.5 0.5 500 0.957 0.959 0.958
## 4 7 0.5 0.5 500 0.956 0.960 0.960
## 5 3 0.2 0.7 500 0.950 0.951 0.952
## 6 7 0.2 0.7 500 0.948 0.948 0.950
## 7 3 0.5 0.7 500 0.954 0.956 0.956
## 8 7 0.5 0.7 500 0.954 0.951 0.951
## 9 3 0.2 0.5 1000 0.957 0.956 0.955
## 10 7 0.2 0.5 1000 0.948 0.948 0.947
## 11 3 0.5 0.5 1000 0.950 0.948 0.950
## 12 7 0.5 0.5 1000 0.950 0.954 0.954
## 13 3 0.2 0.7 1000 0.942 0.944 0.941
## 14 7 0.2 0.7 1000 0.946 0.945 0.944
## 15 3 0.5 0.7 1000 0.947 0.949 0.947
## 16 7 0.5 0.7 1000 0.948 0.946 0.946
I then create a figure illustrating the coverage rate for each condition. Please see the R script for the code used to generate the graph.
Coverage probability across conditions.
I illustrated different parallel processing options in HPC using
simsem. I also introduced the serialization feature in
simsem, which was recently added in this version. However,
serialization is generally suboptimal compared to parallel processing
using multicore and numProc. I hope this
tutorial helps researchers save time when running large-scale simulation
studies.
This work used computational resources provided by the NSTDA
Supercomputer Center (ThaiSC), Thailand, under project code
pv911001. I thank the support team for maintaining the
system.