Parallel processing

The flipr package uses functions contained in the furrr package for parallel processing. The setting of parallelization has to be done on the user side. We illustrate here how to achieve asynchronous evaluation. We use the future package to set the plan, the parallel package to define a default cluster, and the progressr package to report progress updates.

By setting the desired number of cores, we define the number of background R sessions that will be used to evaluate expressions in parallel. This number is used to set the multisession plan with the function future::plan() and to define a default cluster with parallel::setDefaultCluster(). Then, to enable the visualization of evaluation progress, we can put the code in the progressr::with_progress() function, or more simply set it for all the following code with the progressr::handlers() function. After these settings, flipr functions can be used, as shown in this example.

To show the benefit of parallel processing, we compare here the processing times necessary to evaluate a grid with a plausibility function. First, here is the computation without parallelization.

set.seed(1234)
x <- rnorm(10, 1, 1)
y <- rnorm(10, 4, 1)

null_spec <- function(y, parameters) {
  purrr::map(y, ~ .x - parameters[1])
}
stat_functions <- list(stat_t)
stat_assignments <- list(delta = 1)

pf <- PlausibilityFunction$new(
  null_spec = null_spec,
  stat_functions = stat_functions,
  stat_assignments = stat_assignments,
  x, y
)

pf$set_point_estimate(mean(y) - mean(x), overwrite = TRUE)
pf$set_parameter_bounds(
  point_estimate = pf$point_estimate, 
  conf_level = pf$max_conf_level
)
pf$set_grid(
  parameters = pf$parameters, 
  npoints = 50L
)

tictoc::tic()
pf$evaluate_grid(grid = pf$grid)
time_without_parallelization <- tictoc::toc()
time_without_parallelization 
#> [1] "48.827 sec elapsed"

Computation with parallel processing

By setting the desired number of cores, we define the number of background R sessions that will be used to evaluate expressions in parallel. This number is used to set the multisession plan with the function future::plan() and to define a default cluster with parallel::setDefaultCluster(). Then, to enable the visualization of evaluation progress, we can put the code in the progressr::with_progress() function, or more simply set it for all the following code with the progressr::handlers() function. After these settings, flipr functions can be used, as shown in this example.

ncores <- 4
future::plan(multisession, workers = ncores)
cl <- parallel::makeCluster(ncores)
parallel::setDefaultCluster(cl)
progressr::handlers(global = TRUE)

set.seed(1234)
x <- rnorm(10, 1, 1)
y <- rnorm(10, 4, 1)

null_spec <- function(y, parameters) {
  purrr::map(y, ~ .x - parameters[1])
}
stat_functions <- list(stat_t)
stat_assignments <- list(delta = 1)

pf <- PlausibilityFunction$new(
  null_spec = null_spec,
  stat_functions = stat_functions,
  stat_assignments = stat_assignments,
  x, y
)

pf$set_point_estimate(mean(y) - mean(x), overwrite = TRUE)
pf$set_parameter_bounds(
  point_estimate = pf$point_estimate, 
  conf_level = pf$max_conf_level
)
pf$set_grid(
  parameters = pf$parameters, 
  npoints = 50L
)

tictoc::tic()
pf$evaluate_grid(grid = pf$grid)
time_with_parallelization <- tictoc::toc()

parallel::stopCluster(cl)

It is good practice to shut down the workers with the parallel::stopCluster() function at the end of the code.

time_with_parallelization
#> [1] "15.525 sec elapsed"

This experiment proves that we can save a lot of computation time when using parallel processing, as we gained approximately 33 seconds in this example to evaluate the plausibility function.

Finally, to return to a sequential plan with no progress updates, the following code can be used.

future::plan(sequential)
parallel::setDefaultCluster(NULL)
progressr::handlers(global = FALSE)