Joss paper #17

davidegraff · 2021-11-17T21:45:14Z

Description

Provide a brief description of the PR's purpose here.

Todos

Notable points that this PR has either accomplished or will accomplish.

TODO 1

Questions

Question1

Status

Ready to go

codecov · 2021-11-17T21:50:05Z

Codecov Report

Merging #17 (0fecbbe) into main (d182a9d) will not change coverage.
The diff coverage is n/a.

mikemhenry · 2022-01-11T23:33:32Z

paper.md

+```python
+>>> metadata = ps.build_metadata("vina", dict(exhaustivness=32))
+```
+We may also utilize other docking engines in the AutoDock Vina family by specifying the `software` for Vina-type metadata. Here, we use the accelerated optimization routine of QVina for faster docking. Note that we also support `software` values of `"smina"` and `"psovina"` in addition to `"vina"` and `"qvina"`.


Could you add links/references to the different software packages listed here?
cc openjournals/joss-reviews#3950

@mikemhenry references have been added!

rvhonorato · 2022-01-12T13:58:07Z

paper.md

+
+In contrast, the multi-node setup exhibits less ideal scaling \autoref{fig:dist} with a measured speedup approximately 55% that of perfect scaling. We attribute this scaling behavior to hardware-dependent network communication overhead. Distributing a `sleep(5)` function allocated 6 CPU cores per task (to mimic a fairly quick docking simulation) in parallel over differing hardware setups
+led to an approximate 2.5% increase in wall-time relative to the single-node setup each time the number of nodes in the setup was doubled while keeping the total number of CPU cores the same. Such a trend is consistent with network communication being detrimental to scaling behavior. This test also communicated the absolute minimum amount of data over the network, as there were no function arguments or return values. When communicating `CalculationData` objects (approximately 600 bytes in serialized form) over the network, as in `pyscreener`, the drop increased to 6% for each doubling of the total number of nodes. Minimizing the total size of `CalculationData` objects was therefore an explicit implementation goal. Future development will seek to further reduce network communication overhead costs to bring `pyscreener` scaling closer to ideal scaling.
+


This is quite an interesting observation, could you also add what is the configuration of the cluster?

https://www.rc.fas.harvard.edu/about/cluster-architecture/ contains the cluster architecture information. Because I'm not completely familiar with the question, what would be the necessary details to include here?

mikemhenry · 2022-01-13T17:59:05Z

paper.md

+
+![Wall-time of the computational docking of all 1,615 FDA-approved drugs against 5WIU using QVina2 over six CPU cores for a single-node setup with the specified number of CPU cores. (Left) calculated speedup. (Right) wall time in minutes. Bars reflect mean $\pm$ standard deviation over three runs.\label{fig:local}](figures/timing-local.png)
+
+To handle task distribution, `pyscreener` relies on the `ray` library [@moritz_ray_2018] for distributed computation. For multithreaded docking software, `pyscreener` allows a user to specify how many CPU cores to run each individual docking simulation over, running as many docking simulations in parallel as possible for a given number of total CPU cores in the `ray` cluster. To examine the scaling behavior of `pyscreener`, we docked all 1,615 FDA-approved drugs into the active site of the D4 dopamine receptor (PDB ID 5WIU [@wang_d4_2017]) with QVina2 running over 6 CPU cores. We tested both single node hardware setups, scaling the total number of CPU cores on one machine, and multi-node setups, scaling the total number of machines. In the single-node case, `pyscreener` exhibited essentially perfect scaling \autoref{fig:local} as we scaled the size of the `ray` cluster from 6 to 48 CPU cores running QVina over 6 CPU cores.


I'm sure that I'm doing something wrong here (maybe using the wrong flags or backend?) But on a cluster when I run pyscreener --config integration-tests/configs/test_vina.ini -nc 32 it takes 0h 36m 5.93s but when I run pyscreener --config integration-tests/configs/test_vina.ini -nc 64 it takes 1h 12m 9.79s. This is with the vina backend and my hunch is that I'm using the flags incorrectly and instead of parallelizing jobs across the CPUs, I'm trying to give vina N CPUs for each job. I've attached the full output below:

https://gist.github.com/mikemhenry/e0075ee01466145c116f2b8ebeda35b7

cc openjournals/joss-reviews#3950

what's the timing look like for vina with 64 vs 32 CPUs on your computer? I'm unfamiliar with any firm vina scaling benchmarks, but in my experience I get severely diminishing returns beyond 8 CPUs. If you get no benefit doubling the CPUs per job from 32 to 64, then all that would serve to do is limit the number of parallel jobs you're able to run (2 jobs simultaneously in the 32 CPUs/job case vs only 1 in the 64 CPUs/job.)

Okay! I was able to verify the performance claim with a small test:

Total time to dock 200 ligands: 0h 24m 17.38s (7.29s/ligand) # 32 CPU Total time to dock 200 ligands: 0h 13m 5.39s (3.93s/ligand) # 64 CPU

davidegraff added 10 commits November 12, 2021 12:00

adding paper resources

43c5755

editing paper

37f8359

add summary and SoN

892c240

format references

44ec33c

fix refs

45bab68

adding figures

2a0551d

adding impl section

c45a3b5

formatting

514fa9d

formatting again

18c954f

add final paragraph

ad43ddb

mikemhenry reviewed Jan 11, 2022

View reviewed changes

rvhonorato reviewed Jan 12, 2022

View reviewed changes

mikemhenry reviewed Jan 13, 2022

View reviewed changes

davidegraff added 8 commits February 26, 2022 09:52

Merge branch 'main' into joss-paper

e59e683

add additional refs

bac10a6

Merge branch 'main' into joss-paper

4dbc517

add paper compliation action

2852409

fix paper action

1915327

udpate paper formatting

08de7df

final formatting update

c06aa4e

wait there was one more

0fecbbe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joss paper #17

Joss paper #17

davidegraff commented Nov 17, 2021

codecov bot commented Nov 17, 2021 •

edited

Loading

mikemhenry Jan 11, 2022

davidegraff Feb 26, 2022

rvhonorato Jan 12, 2022

davidegraff Jan 13, 2022

mikemhenry Jan 13, 2022 •

edited

Loading

davidegraff Jan 13, 2022

mikemhenry Jan 14, 2022


		In contrast, the multi-node setup exhibits less ideal scaling \autoref{fig:dist} with a measured speedup approximately 55% that of perfect scaling. We attribute this scaling behavior to hardware-dependent network communication overhead. Distributing a `sleep(5)` function allocated 6 CPU cores per task (to mimic a fairly quick docking simulation) in parallel over differing hardware setups
		led to an approximate 2.5% increase in wall-time relative to the single-node setup each time the number of nodes in the setup was doubled while keeping the total number of CPU cores the same. Such a trend is consistent with network communication being detrimental to scaling behavior. This test also communicated the absolute minimum amount of data over the network, as there were no function arguments or return values. When communicating `CalculationData` objects (approximately 600 bytes in serialized form) over the network, as in `pyscreener`, the drop increased to 6% for each doubling of the total number of nodes. Minimizing the total size of `CalculationData` objects was therefore an explicit implementation goal. Future development will seek to further reduce network communication overhead costs to bring `pyscreener` scaling closer to ideal scaling.


		![Wall-time of the computational docking of all 1,615 FDA-approved drugs against 5WIU using QVina2 over six CPU cores for a single-node setup with the specified number of CPU cores. (Left) calculated speedup. (Right) wall time in minutes. Bars reflect mean $\pm$ standard deviation over three runs.\label{fig:local}](figures/timing-local.png)

		To handle task distribution, `pyscreener` relies on the `ray` library [@moritz_ray_2018] for distributed computation. For multithreaded docking software, `pyscreener` allows a user to specify how many CPU cores to run each individual docking simulation over, running as many docking simulations in parallel as possible for a given number of total CPU cores in the `ray` cluster. To examine the scaling behavior of `pyscreener`, we docked all 1,615 FDA-approved drugs into the active site of the D4 dopamine receptor (PDB ID 5WIU [@wang_d4_2017]) with QVina2 running over 6 CPU cores. We tested both single node hardware setups, scaling the total number of CPU cores on one machine, and multi-node setups, scaling the total number of machines. In the single-node case, `pyscreener` exhibited essentially perfect scaling \autoref{fig:local} as we scaled the size of the `ray` cluster from 6 to 48 CPU cores running QVina over 6 CPU cores.

Joss paper #17

Are you sure you want to change the base?

Joss paper #17

Conversation

davidegraff commented Nov 17, 2021

Description

Todos

Questions

Status

codecov bot commented Nov 17, 2021 • edited Loading

Codecov Report

mikemhenry Jan 11, 2022

Choose a reason for hiding this comment

davidegraff Feb 26, 2022

Choose a reason for hiding this comment

rvhonorato Jan 12, 2022

Choose a reason for hiding this comment

davidegraff Jan 13, 2022

Choose a reason for hiding this comment

mikemhenry Jan 13, 2022 • edited Loading

Choose a reason for hiding this comment

davidegraff Jan 13, 2022

Choose a reason for hiding this comment

mikemhenry Jan 14, 2022

Choose a reason for hiding this comment

codecov bot commented Nov 17, 2021 •

edited

Loading

mikemhenry Jan 13, 2022 •

edited

Loading