---
title: "Parallel processing"
format: 
  html:
    toc: true
    html-math-method: mathjax
vignette: >
  %\VignetteIndexEntry{Parallel processing}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

# load libraries
library(flipr)

# import data
df_parallelization <- readRDS(
  system.file("vignette-data/parallelization-df.rds", package = "flipr")
)

time_without_parallel <- df_parallelization$time_without_parallel
time_without_future <- df_parallelization$time_without_future
time_with_parallel <- df_parallelization$time_with_parallel
time_with_future <- df_parallelization$time_with_future
```

The [**flipr**](https://permaverse.github.io/flipr/) package uses functions
contained in the [**furrr**](https://future.futureverse.org/index.html) package
for parallel processing. The setting of parallelization has to be done on the
user side. We illustrate here how to achieve asynchronous evaluation. We use the
[**future**](https://future.futureverse.org/index.html) package to set the plan,
the **parallel** package to define a default cluster, and the
[**progressr**](https://progressr.futureverse.org/index.html) package to report
progress updates. 

More precisely, setting a default cluster with **parallel** is useful to allow 
parallel computation of the point estimate if an estimation is needed. On the 
other side, setting the plan with 
[**future**](https://future.futureverse.org/index.html) allows parallel 
computation when evaluating the plausibility function. A comparison of
computation time with sequential and parallel computation for those two cases is
done in the following.

## Computation without parallel processing

To show the benefit of parallel processing, we compare here the processing times
necessary to compute a point estimation and to evaluate the grid for a 
plausibility function. First, here is the computation without parallelization.

```{r, eval=FALSE}
set.seed(1234)
x <- rnorm(10, 1, 1)
y <- rnorm(10, 4, 1)

null_spec <- function(y, parameters) {
  purrr::map(y, ~ .x - parameters[1])
}
stat_functions <- list(stat_t)
stat_assignments <- list(delta = 1)

pf <- PlausibilityFunction$new(
  null_spec = null_spec,
  stat_functions = stat_functions,
  stat_assignments = stat_assignments,
  x, y
)
pf$set_nperms(2000)

tic()
pf$set_point_estimate()
time_without_parallel <- toc()$callback_msg
```

```{r}
time_without_parallel
```

```{r, eval=FALSE}
pf$set_parameter_bounds(
  point_estimate = pf$point_estimate, 
  conf_level = pf$max_conf_level
)
pf$set_grid(
  parameters = pf$parameters, 
  npoints = 100L
)

tictoc::tic()
pf$evaluate_grid(grid = pf$grid)
time_without_future <- tictoc::toc()$callback_msg
```

```{r}
time_without_future
```

## Computation with parallel processing

By setting the desired number of cores, we define the number of background R
sessions that will be used to evaluate expressions in parallel. This number is
used to set the multisession plan with the function `future::plan()` and to
define a default cluster with `parallel::setDefaultCluster()`. Then, to enable
the visualization of evaluation progress, we can put the code in the
`progressr::with_progress()` function, or more simply set it for all the
following code with the `progressr::handlers()` function. After these settings,
[**flipr**](https://permaverse.github.io/flipr/) functions can be used, as shown
in this example.

```{r, eval=FALSE}
ncores <- 3
future::plan(future::multisession, workers = ncores)
cl <- parallel::makeCluster(ncores)
parallel::setDefaultCluster(cl)
progressr::handlers(global = TRUE)

set.seed(1234)
x <- rnorm(10, 1, 1)
y <- rnorm(10, 4, 1)

null_spec <- function(y, parameters) {
  purrr::map(y, ~ .x - parameters[1])
}
stat_functions <- list(stat_t)
stat_assignments <- list(delta = 1)

pf <- PlausibilityFunction$new(
  null_spec = null_spec,
  stat_functions = stat_functions,
  stat_assignments = stat_assignments,
  x, y
)
pf$set_nperms(2000)

tic()
pf$set_point_estimate()
time_with_parallel <- toc()$callback_msg
```

```{r}
time_with_parallel
```

```{r, eval=FALSE}
pf$set_parameter_bounds(
  point_estimate = pf$point_estimate, 
  conf_level = pf$max_conf_level
)
pf$set_grid(
  parameters = pf$parameters, 
  npoints = 100L
)

tictoc::tic()
pf$evaluate_grid(grid = pf$grid)
time_with_future <- tictoc::toc()$callback_msg

parallel::stopCluster(cl)
```

It is good practice to shut down the workers with the `parallel::stopCluster()`
function at the end of the code.

```{r}
time_with_future
```

This experiment proves that we can save a lot of computation time when using
parallel processing. Indeed, by setting 3 cores for each of the parallel 
processing tools, we reduced approximately by 3 times the computation time for 
both computing a point estimation and evaluating the plausibility function.

Finally, to return to a sequential plan with no progress updates, the following
code can be used. It also allows to shut down the workers used in [**future**](https://future.futureverse.org/index.html).

```{r, eval=FALSE}
future::plan(future::sequential)
parallel::setDefaultCluster(NULL)
progressr::handlers(global = FALSE)
```
