∆ use of `mclapply` to reduce memory footprint; add "num_cores" option #93

warrenmcg · 2017-03-13T13:51:46Z

Hi,

This is to address issues #80 and #81 that I opened.

For issue #80, gene_summary uses a nested lapply, with the first one using mclapply. This meant that every full bootstrap object was copied to each core. With more than a few samples, this leads to a very large memory footprint (20 samples failed on a machine with 120+ GB of RAM). Using the blogpost I mentioned in #80 as a starting point (link), I switched the use of mclapply to the second nested lapply, so that only individual bootstraps will be sent out to each core. This significantly reduces the memory footprint.

For issue #81, I wanted to take full advantage of a machine that had more than 2 cores, so I simply added the option to gene_summary and then to sleuth_prep, updated the documentation for sleuth_prep, and added test conditions to make sure the choice of num_cores was reasonable, throwing an informative error if it is not.

Finally, I followed your contribution guidelines to make the code lintr clean. In the process of making sure my updated code passed all of your tests, I found that one of your tests ("give a design matrix") had a bug, so I fixed it to make sure it worked.

Hope this helps!
Warren

…dded sleuth_prep option to select number of cores for mclapply

…ons for num_cores to throw informative error

…trix did not have dim names that matched the formula or the sample ids

… of s2c

… option into 'spread_abundance_by' to prevent downstream error when preparing just one sample

warrenmcg added 7 commits March 13, 2017 07:31

modified mclapply usage in gene_summary to reduce memory footprint; a…

932fc47

…dded sleuth_prep option to select number of cores for mclapply

corrected default value for num_cores, and strengthened check conditi…

90b5b99

…ons for num_cores to throw informative error

fixed 'give a design matrix' test, which failed because the design ma…

161d231

…trix did not have dim names that matched the formula or the sample ids

now lintr clean

66cea3a

clean a few lints that I missed

c321f90

added code from pull request pachterlab#71, which drops unused levels…

59ff84c

… of s2c

add in code from pull request pachterlab#92, which adds in 'drop = F'…

62cfab4

… option into 'spread_abundance_by' to prevent downstream error when preparing just one sample

warrenmcg closed this Mar 14, 2017

warrenmcg mentioned this pull request Mar 14, 2017

Issues #80 and #81: reduce 'mclapply' memory footprint; add "num_cores" option #94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

∆ use of `mclapply` to reduce memory footprint; add "num_cores" option #93

∆ use of `mclapply` to reduce memory footprint; add "num_cores" option #93

warrenmcg commented Mar 13, 2017

∆ use of mclapply to reduce memory footprint; add "num_cores" option #93

∆ use of mclapply to reduce memory footprint; add "num_cores" option #93

Conversation

warrenmcg commented Mar 13, 2017

∆ use of `mclapply` to reduce memory footprint; add "num_cores" option #93

∆ use of `mclapply` to reduce memory footprint; add "num_cores" option #93