You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am attempting to re-analyze another group's data, which has an unbalanced set of 21 samples split between two groups. The attempts to run gene aggregation fail with the following traceback:
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
long vectors not supported yet: fork.c:376
Calls: make_sleuth_object ... <Anonymous> -> <Anonymous> -> lapply -> FUN -> sendMaster
No traceback available
Error: is(kal, "kallisto") is not TRUE
In addition: Warning message:
In parallel::mclapply(seq_along(obj_mod$kal), function(i) { :
scheduled core 1 encountered error in user code, all values of the job will be affected
No traceback available
summarizing results
Error in is(obj, "sleuth") : object 'gene.so' not found
Calls: summarize_sleuth_results -> sleuth_results_modified -> stopifnot -> is
11: stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"),
ch), call. = FALSE, domain = NA)
10: stopifnot(is(kal, "kallisto"))
9: summarize_bootstrap(obj$kal[[i]], col, transform)
8: mutate_(.data, .dots = lazyeval::lazy_dots(...))
7: dplyr::mutate(summarize_bootstrap(obj$kal[[i]], col, transform),
sample = cur_samp)
6: FUN(X[[i]], ...)
5: lapply(seq_along(obj$kal), function(i) {
cur_samp <- obj$sample_to_covariates$sample[i]
dplyr::mutate(summarize_bootstrap(obj$kal[[i]], col, transform),
sample = cur_samp)
})
4: sleuth_summarize_bootstrap_col(obj_mod, "scaled_reads_per_base",
transform)
3: sleuth:::gene_summary(ret, aggregation_column, function(x) log2(x +
0.5))
2: sleuth_prep(sample_to_covariates, full_model, target_mapping = target_mapping,
norm_fun_counts = norm_function, norm_fun_tpm = norm_function,
aggregation_column = aggregate_column)
I'm not that familiar with the innards of mclapply, but my understanding is that the job is split among several child processes, and one process pieces all of the data back together to send to the master process using sendMaster. However, to do this, it serializes everything into a raw vector. Because R is 32-bit, only objects <2 GB are able to be serialized without compression. Because I have so many samples, however, I suspect that the final aggregation is too big, causing the error seen above: long vectors not supported yet: fork.c:376. See a discussion here about this issue.
I've modified the code in gene_summary to switch the order of when mclapply is applied (instead of applying it on obj_mod$kal, apply it on each kal set of bootstraps), and this issue went away. I think this solution can scale. If you're interested, I'll send you a pull request with the modified code (after doing the suggested steps in your guidelines to contributing).
The text was updated successfully, but these errors were encountered:
Here is a blog post from r-bloggers discussing ways to reduce memory footprint for mclapply: link. It seems strategy number 2 there is something to consider as well, where we would put the bootstraps sent to mclapply in their own environment, to minimize copying.
Hello sleuth team,
I am attempting to re-analyze another group's data, which has an unbalanced set of 21 samples split between two groups. The attempts to run gene aggregation fail with the following traceback:
I'm not that familiar with the innards of mclapply, but my understanding is that the job is split among several child processes, and one process pieces all of the data back together to send to the master process using
sendMaster
. However, to do this, it serializes everything into a raw vector. Because R is 32-bit, only objects <2 GB are able to be serialized without compression. Because I have so many samples, however, I suspect that the final aggregation is too big, causing the error seen above:long vectors not supported yet: fork.c:376
. See a discussion here about this issue.I've modified the code in gene_summary to switch the order of when mclapply is applied (instead of applying it on obj_mod$kal, apply it on each kal set of bootstraps), and this issue went away. I think this solution can scale. If you're interested, I'll send you a pull request with the modified code (after doing the suggested steps in your guidelines to contributing).
The text was updated successfully, but these errors were encountered: