Skip to content

Commit

Permalink
Move to "main" for caching instead of "master"
Browse files Browse the repository at this point in the history
  • Loading branch information
wlandau-lilly committed Oct 13, 2020
1 parent f9ab03a commit e4cfab4
Show file tree
Hide file tree
Showing 43 changed files with 153 additions and 129 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Development is a community effort, and we encourage participation.

## Code of Conduct

The environment for collaboration should be friendly, inclusive, respectful, and safe for everyone, so all participants must obey [this repository's code of conduct](https://github.com/ropensci/drake/blob/master/CODE_OF_CONDUCT.md).
The environment for collaboration should be friendly, inclusive, respectful, and safe for everyone, so all participants must obey [this repository's code of conduct](https://github.com/ropensci/drake/blob/main/CODE_OF_CONDUCT.md).

## Issues

Expand Down
14 changes: 7 additions & 7 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ These changes invalidate some targets in some workflows, but they are necessary
## Bug fixes

- Remove README.md from CRAN altogether. Also remove all links from the news and vignette. The links trigger too many CRAN notes, which made the automated checks too brittle.
- Serialize formats that need serialization (like "keras") before sending the data from HPC workers to the master process (#989).
- Serialize formats that need serialization (like "keras") before sending the data from HPC workers to the main process (#989).
- Check for custom-formatted files when checking checksums.
- Force fst-formatted targets to plain data frames. Same goes for the new "fst_dt" format.
- Change the meaning and behavior of `max_expand` in `drake_plan()`. `max_expand` is now the maximum number of targets produced by `map()`, `split()`, and `cross()`. For `cross()`, this reduces the number of targets (less cumbersome) and makes the subsample of targets more representative of the complete grid. It also. ensures consistent target naming when `.id` is `FALSE` (#1002). Note: `max_expand` is not for production workflows anyway, so this change does not break anything important. Unfortunately, we do lose the speed boost in `drake_plan()` originally due to `max_expand`, but `drake_plan()` is still fast, so that is not so bad.
Expand All @@ -329,7 +329,7 @@ These changes invalidate some targets in some workflows, but they are necessary
## New features

- Add a new "fst_dt" format for `fst`-powered saving of `data.table` objects.
- Support a custom "caching" column of the plan to select master vs worker caching for each target individually (#988).
- Support a custom "caching" column of the plan to select main vs worker caching for each target individually (#988).
- Make `transform` a formal argument of `target()` so that users do not have to type "transform =" all the time in `drake_plan()` (#993).
- Migrate the documentation website from `ropensci.github.io/drake` to `docs.ropensci.org/drake`.

Expand Down Expand Up @@ -676,8 +676,8 @@ Version 6.2.1 is a hotfix to address the failing automated CRAN checks for 6.2.0
- Add a new `plan_to_code()` function to turn `drake` plans into generic R scripts. New users can use this function to better understand the relationship between plans and code, and unsatisfied customers can use it to disentangle their projects from `drake` altogether. Similarly, `plan_to_notebook()` generates an R notebook from a `drake` plan.
- Add a new `drake_debug()` function to run a target's command in debug mode. Analogous to `drake_build()`.
- Add a `mode` argument to `trigger()` to control how the `condition` trigger factors into the decision to build or skip a target. See the `?trigger` for details.
- Add a new `sleep` argument to `make()` and `drake_config()` to help the master process consume fewer resources during parallel processing.
- Enable the `caching` argument for the `"clustermq"` and `"clustermq_staged"` parallel backends. Now, `make(parallelism = "clustermq", caching = "master")` will do all the caching with the master process, and `make(parallelism = "clustermq", caching = "worker")` will do all the caching with the workers. The same is true for `parallelism = "clustermq_staged"`.
- Add a new `sleep` argument to `make()` and `drake_config()` to help the main process consume fewer resources during parallel processing.
- Enable the `caching` argument for the `"clustermq"` and `"clustermq_staged"` parallel backends. Now, `make(parallelism = "clustermq", caching = "main")` will do all the caching with the main process, and `make(parallelism = "clustermq", caching = "worker")` will do all the caching with the workers. The same is true for `parallelism = "clustermq_staged"`.
- Add a new `append` argument to `gather_plan()`, `gather_by()`, `reduce_plan()`, and `reduce_by()`. The `append` argument control whether the output includes the original `plan` in addition to the newly generated rows.
- Add new functions `load_main_example()`, `clean_main_example()`, and `clean_mtcars_example()`.
- Add a `filter` argument to `gather_by()` and `reduce_by()` in order to restrict what we gather even when `append` is `TRUE`.
Expand All @@ -697,7 +697,7 @@ Version 6.2.1 is a hotfix to address the failing automated CRAN checks for 6.2.0

- Stop earlier in `make_targets()` if all the targets are already up to date.
- Improve the documentation of the `seed` argument in `make()` and `drake_config()`.
- Set the default `caching` argument of `make()` and `drake_config()` to `"master"` rather than `"worker"`. The default option should be the lower-overhead option for small workflows. Users have the option to make a different set of tradeoffs for larger workflows.
- Set the default `caching` argument of `make()` and `drake_config()` to `"main"` rather than `"worker"`. The default option should be the lower-overhead option for small workflows. Users have the option to make a different set of tradeoffs for larger workflows.
- Allow the `condition` trigger to evaluate to non-logical values as long as those values can be coerced to logicals.
- Require that the `condition` trigger evaluate to a vector of length 1.
- Keep non-standard columns in `drake_plan_source()`.
Expand Down Expand Up @@ -796,7 +796,7 @@ to tell the user if the command, a dependency, an input file, or an output file

# Version 5.2.0

- Sequester staged parallelism in backends "mclapply_staged" and "parLapply_staged". For the other `lapply`-like backends, `drake` uses persistent workers and a master process. In the case of `"future_lapply"` parallelism, the master process is a separate background process called by `Rscript`.
- Sequester staged parallelism in backends "mclapply_staged" and "parLapply_staged". For the other `lapply`-like backends, `drake` uses persistent workers and a main process. In the case of `"future_lapply"` parallelism, the main process is a separate background process called by `Rscript`.
- Remove the appearance of staged parallelism from single-job `make()`'s.
(Previously, there were "check" messages and a call to `staged_parallelism()`.)
- Remove some remnants of staged parallelism internals.
Expand Down Expand Up @@ -835,7 +835,7 @@ to tell the user if the command, a dependency, an input file, or an output file
- Fix an elusive `R CMD check` error from building the pdf manual with LaTeX.
- In `drake_plan()`, allow users to customize target-level columns using `target()` inside the commands.
- Add a new `bind_plans()` function to concatenate the rows of drake plans and then sanitize the aggregate plan.
- Add an optional `session` argument to tell `make()` to build targets in a separate, isolated master R session. For example, `make(session = callr::r_vanilla)`.
- Add an optional `session` argument to tell `make()` to build targets in a separate, isolated main R session. For example, `make(session = callr::r_vanilla)`.

# Version 5.1.0

Expand Down
22 changes: 11 additions & 11 deletions R/backend_clustermq.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ drake_backend_clustermq <- function(config) {
config = config,
jobs = config$settings$jobs_preprocess
)
cmq_local_master(config)
cmq_local_main(config)
if (config$queue$empty()) {
return()
}
Expand All @@ -17,17 +17,17 @@ drake_backend_clustermq <- function(config) {
suppressWarnings(cmq_set_common_data(config))
config$counter <- new.env(parent = emptyenv())
config$counter$remaining <- config$queue$size()
cmq_master(config)
cmq_main(config)
}

cmq_local_master <- function(config) {
cmq_local_main <- function(config) {
continue <- TRUE
while (!config$queue$empty() && continue) {
continue <- cmq_local_master_iter(config)
continue <- cmq_local_main_iter(config)
}
}

cmq_local_master_iter <- function(config) {
cmq_local_main_iter <- function(config) {
target <- config$queue$peek0()
if (no_hpc(target, config)) {
config$queue$pop0()
Expand Down Expand Up @@ -60,18 +60,18 @@ cmq_set_common_data <- function(config) {
)
}

cmq_master <- function(config) {
cmq_main <- function(config) {
on.exit(config$workers$finalize())
config$logger$disk("begin scheduling targets")
while (config$counter$remaining > 0) {
cmq_master_iter(config)
cmq_main_iter(config)
}
if (config$workers$cleanup()) {
on.exit()
}
}

cmq_master_iter <- function(config) {
cmq_main_iter <- function(config) {
msg <- config$workers$receive_data()
cmq_conclude_build(msg = msg, config = config)
if (!identical(msg$token, "set_common_data_token")) {
Expand Down Expand Up @@ -151,7 +151,7 @@ cmq_send_target <- function(target, config) {
announce_build(target = target, config = config)
caching <- hpc_caching(target, config)
deps <- NULL
if (identical(caching, "master")) {
if (identical(caching, "main")) {
manage_memory(target = target, config = config, jobs = 1)
deps <- cmq_deps_list(target, config)
}
Expand Down Expand Up @@ -226,13 +226,13 @@ cmq_build <- function(target, meta, deps, spec, config_tmp, config) {
config <- restore_hpc_config_tmp(config_tmp, config)
do_prework(config = config, verbose_packages = FALSE)
caching <- hpc_caching(target, config)
if (identical(caching, "master")) {
if (identical(caching, "main")) {
cmq_assign_deps(deps, config)
} else {
manage_memory(target = target, config = config, jobs = 1)
}
build <- try_build(target = target, meta = meta, config = config)
if (identical(caching, "master")) {
if (identical(caching, "main")) {
build$checksum <- get_outfile_checksum(target, build$value, config)
build <- classify_build(build, config)
build <- serialize_build(build)
Expand Down
6 changes: 3 additions & 3 deletions R/backend_future.R
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ ft_decide_worker <- function(target, protect, config) {

ft_launch_worker <- function(target, meta, protect, config) {
caching <- hpc_caching(target, config)
if (identical(caching, "master")) {
if (identical(caching, "main")) {
manage_memory(target = target, config = config, downstream = protect)
}
DRAKE_GLOBALS__ <- NULL # Avoid name conflicts with other globals.
Expand Down Expand Up @@ -176,7 +176,7 @@ future_build <- function(
}
do_prework(config = config, verbose_packages = FALSE)
build <- try_build(target = target, meta = meta, config = config)
if (identical(caching, "master")) {
if (identical(caching, "main")) {
build$checksum <- get_outfile_checksum(target, build$value, config)
build <- classify_build(build, config)
build <- serialize_build(build)
Expand Down Expand Up @@ -281,7 +281,7 @@ resolve_worker_value <- function(worker, config) {
config = config
)
}
# For `caching = "master"`, we need to conclude the build
# For `caching = "main"`, we need to conclude the build
# and store the value and metadata.
list(
target = target,
Expand Down
6 changes: 3 additions & 3 deletions R/deprecated.R
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ build_drake_graph <- function(
#' @param init_common_values Logical, whether to set the initial `drake`
#' version in the cache and other common values.
#' Not always a thread safe operation, so should only be `TRUE`
#' on the master process
#' on the main process
configure_cache <- function(
cache = drake::get_cache(verbose = verbose),
short_hash_algo = drake::default_short_hash_algo(cache = cache),
Expand Down Expand Up @@ -526,7 +526,7 @@ deprecate_targets_only <- function(targets_only) {
#' @title Load the main example.
#' `r lifecycle::badge("deprecated")`
#' @description The main example lives at
#' <https://github.com/wlandau/drake-examples/tree/master/main>.
#' <https://github.com/wlandau/drake-examples/tree/main/main>.
#' Use `drake_example("main")` to download its code.
#' This function also writes/overwrites
#' the files `report.Rmd` and `raw_data.xlsx`.
Expand Down Expand Up @@ -1291,7 +1291,7 @@ dataset_wildcard <- function() {
#' `drake:::store_outputs()`.
#' @param target Character scalar, name of the target
#' to get metadata.
#' @param config Master internal configuration list produced
#' @param config Top-level internal configuration list produced
#' by [drake_config()].
drake_meta <- function(target, config) {
.Deprecated(
Expand Down
21 changes: 14 additions & 7 deletions R/drake_config.R
Original file line number Diff line number Diff line change
Expand Up @@ -231,9 +231,9 @@
#' To reset the random number generator seed for a project,
#' use `clean(destroy = TRUE)`.
#'
#' @param caching Character string, either `"master"` or `"worker"`.
#' - `"master"`: Targets are built by remote workers and sent back to
#' the master process. Then, the master process saves them to the
#' @param caching Character string, either `"main"` or `"worker"`.
#' - `"main"`: Targets are built by remote workers and sent back to
#' the main process. Then, the main process saves them to the
#' cache (`config$cache`, usually a file system `storr`).
#' Appropriate if remote workers do not have access to the file system
#' of the calling R session. Targets are cached one at a time,
Expand Down Expand Up @@ -345,12 +345,12 @@
#' except from loaded packages.
#'
#' For parallel processing, `drake` uses
#' a central master process to check what the parallel
#' a central main process to check what the parallel
#' workers are doing, and for the affected high-performance
#' computing workflows, wait for data to arrive over a network.
#' In between loop iterations, the master process sleeps to avoid throttling.
#' In between loop iterations, the main process sleeps to avoid throttling.
#' The `sleep` argument to `make()` and `drake_config()`
#' allows you to customize how much time the master process spends
#' allows you to customize how much time the main process spends
#' sleeping.
#'
#' The `sleep` argument is a function that takes an argument
Expand Down Expand Up @@ -537,7 +537,7 @@ drake_config <- function(
session_info = NULL,
cache_log_file = NULL,
seed = NULL,
caching = c("master", "worker"),
caching = c("main", "master", "worker"),
keep_going = FALSE,
session = NULL,
pruning_strategy = NULL,
Expand Down Expand Up @@ -582,6 +582,13 @@ drake_config <- function(
deprecate_arg(makefile_path, "makefile_path")
deprecate_arg(layout, "layout", "spec") # 2019-12-15
deprecate_arg(console_log_file, "console_log_file", "log_make") # 2020-02-08
if (identical(caching, "master")) {
caching <- "main"
warn0(
"caching = \"master\" is deprecated. ",
"Use caching = \"main\" instead."
)
}
# 2020-03-21
if (!is.character(parallelism)) {
warn0("Custom parallel backends in drake are deprecated. Using \"loop\".")
Expand Down
2 changes: 1 addition & 1 deletion R/drake_plan.R
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
#' for details.
#' - `caching`: overrides the `caching` argument of [make()] for each target
#' individually. Possible values:
#' - "master": tell the master process to store the target in the cache.
#' - "main": tell the main process to store the target in the cache.
#' - "worker": tell the HPC worker to store the target in the cache.
#' - NA: default to the `caching` argument of [make()].
#' - `elapsed` and `cpu`: number of seconds to wait for the target to build
Expand Down
9 changes: 8 additions & 1 deletion R/hpc.R
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,14 @@ hpc_caching <- function(target, config) {
if (is.null(out) || is.na(out)) {
out <- config$caching
}
match.arg(out, choices = c("master", "worker"))
if (identical(out, "master")) {
warn0(
"caching = \"master\" is deprecated. ",
"Use caching = \"main\" instead."
)
out <- "main"
}
match.arg(out, choices = c("main", "worker"))
}

hpc_config <- function(config) {
Expand Down
2 changes: 1 addition & 1 deletion R/logger.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ logger <- function(verbose, file = NULL) {
# nocov start
} else if (verbose == 2L) {
# Covered if we run tests without the progress package.
# Part of https://github.com/ropensci/drake/blob/master/inst/testing/cran-checklist.md # nolint
# Part of https://github.com/ropensci/drake/blob/main/inst/testing/cran-checklist.md # nolint
cli_msg(
"Install the progress package to see a progress bar when verbose = 2."
)
Expand Down
8 changes: 4 additions & 4 deletions R/make.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@
#' from getting invalidated unexpectedly.
#'
#' A serious drake workflow should be consistent and reliable,
#' ideally with the help of a master R script.
#' ideally with the help of a main R script.
#' This script should begin in a fresh R session,
#' load your packages and functions in a dependable manner,
#' and then run `make()`. Example:
#' <https://github.com/wlandau/drake-examples/tree/master/gsp>.
#' <https://github.com/wlandau/drake-examples/tree/main/gsp>.
#' Batch mode, especially within a container, is particularly helpful.
#'
#' Interactive R sessions are still useful,
Expand Down Expand Up @@ -155,7 +155,7 @@ make <- function(
session_info = NULL,
cache_log_file = NULL,
seed = NULL,
caching = "master",
caching = "main",
keep_going = FALSE,
session = NULL,
pruning_strategy = NULL,
Expand Down Expand Up @@ -392,7 +392,7 @@ drake_set_session_info <- function(
#' isolate_example("Quarantine side effects.", {
#' if (suppressWarnings(require("knitr"))) {
#' load_mtcars_example() # Get the code with drake_example("mtcars").
#' # Create a master internal configuration list with prework.
#' # Create a main internal configuration list with prework.
#' con <- drake_config(my_plan, prework = c("library(knitr)", "x <- 1"))
#' # Do the prework. Usually done at the beginning of `make()`,
#' # and for distributed computing backends like "future_lapply",
Expand Down
2 changes: 1 addition & 1 deletion R/outdated.R
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
#' # rename targets without having to build them again.
#' # For an example, see
#' # the "Reproducible data recovery and renaming" section of
#' # https://github.com/ropensci/drake/blob/master/README.md.
#' # https://github.com/ropensci/drake/blob/main/README.md.
#' }
#' })
#' }
Expand Down
2 changes: 1 addition & 1 deletion R/r_make.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
#' all the same arguments as [make()] (e.g. `plan` and `targets`).
#' 3. In that same session, run [outdated()]
#' with the `config` argument from step 2.
#' 4. Return the result back to master process
#' 4. Return the result back to main process
#' (e.g. your interactive R session).
#' @export
#' @seealso [make()]
Expand Down
2 changes: 1 addition & 1 deletion R/use_drake.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#' in your data analysis project. For details, read
#' <https://books.ropensci.org/drake/projects.html>
#' @details Files written:
#' 1. `make.R`: a suggested master R script for batch mode.
#' 1. `make.R`: a suggested main R script for batch mode.
#' 2. `_drake.R`: a configuration R script for
#' the [`r_*()`](https://docs.ropensci.org/drake/reference/r_make.html) functions documented at # nolint
#' <https://books.ropensci.org/drake/projects.html#safer-interactivity>. # nolint
Expand Down
2 changes: 1 addition & 1 deletion R/walk_code.R
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@ get_tangled_frags <- function(file) {
})
}

# From https://github.com/duncantl/CodeDepends/blob/master/R/sweave.R#L15
# From https://github.com/duncantl/CodeDepends/blob/3e2e53f5794eea169117bd1b2f96801b813b22fd/R/sweave.R#L15 # nolint
get_tangled_text <- function(doc) {
assert_pkg("knitr")
id <- make.names(tempfile(), unique = FALSE, allow_ = TRUE)
Expand Down
Loading

0 comments on commit e4cfab4

Please sign in to comment.