-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASV benchmarks for MDAnalysis #11
Comments
Thanks @orbeckst for the interest. We're still building the infrastructure, and it'll take some weeks before we can get this working. But it's very useful to know you're interested, I'll probably be contacting you to get your feedback while we decide what hardware we get for this, estimate the usage the infrastructure will get, or to know the exact commands you want to run. |
An update on this. We didn't get the funding to build this, and I don't think anyone will spend the amount of time required for this as a volunteer (not me, who was planning to implement it with the grant I requested). So, I don't see this happening in the short term. |
@datapythonista is there anything related to this proposal worth sharing? We recently had a discussion in the NumPy developer sprint where we touched on this same topic, and are considering banding together with a few projects and paying for some hardware that's needed and the related devops work. The most pressing needs we had are (a) a machine that's guaranteed to have AVX-512, and (b) macOS M1. I imagine quite a few projects have a need for macOS M1 at least. Dealing with TravisCI for Nothing is decided, but at least we'd like to map out the needs across related projects and investigate if it makes sense to participate and put some $ in. |
This might also be worth a separate discussion about infrastructure & CI. If I'm not misremembering there was some mention from @datapythonista about considering applying for CI resources via AWS in the last NF infrastructure meeting? |
There are similar discussions happening within the scientific-python project. We (NetworkX) are building a benchmark suite + CI/CD setup for running these benchmarks "on-demand" on either cloud resources or a machine-in-the-basement. |
Sorry for the delay, I was waiting to hear from OVH to be able to provide an answer. We're finalizing an agreement where OVH will provide 10,000 EUR worth of infrastructure for a year to pandas. We had a discussion with them, and besides their public offer, they seem to have available a decent amount of machines with less common architectures that we could use. For pandas, at this point I think we're happy to just have reliable and consistent (x86_64) infrastructure to run benchmarks over time. At the moment we've got something working in a machine at someone's house, that nobody seems to know exactly how it works. So, what I'd like to try is to set up an OVH dedicated server with Cirun trigger builds from our CI, and see if this is enough to detect regressions. But would be nice to coordinate, and also OVH is interested on supporting the NumFOCUS ecosystem more broadly. They wanted to start small with pandas and also MyBinder and see how it goes, but for what they said, they should be happy to support other projects too. So, if you're not in a rush I'd suggest to wait few weeks until we've got a first version of the basic stuff we want to build for pandas, see how everything works, what's the cost with the OVH platform, and then discuss and see if it make sense to try to escalate the same to more projects and more architectures. Or what the best way to move forward seems to be once we've got the feedback from this. What do you think? |
@orbeckst what is the current status here? I'm putting together a budget for the NF board and can add this if I know a few more details. I know @datapythonista was working with OVH but I think that all goes through pandas these days and not the NF Infra committee. |
From the MDAnalysis side of things, what I wrote originally is still all true. We haven't made any specific plans or budgets. The minimal requirement is an x86_64 machine with a modest number of cores (8 is plenty for our ASV benchmarks) with shell access. |
Do you need bare metal or will a cloud server like AWS EC2 or OVH server work? |
My primary concern would be that the underlying specs don't change so that it is clear when changes in performance are due to changes in code (and not due to changes in hardware). As long as cloud can provide such platform stability, cloud sounds fine to me. |
From the recent project meeting, I understand that NumFOCUS is "hiring DevOps engineer to create an infrastructure platform for projects and manage various NF and project resources". Would that engineer be able to set this up and maintain it for projects? We might be able to pay for the compute time until there is a sponsor for that. I think SciPy's needs are similar to MDAnalysis here. We don't need anything very fancy - just a stable machine with enough memory. We also have old results at https://pv.github.io/scipy-bench/, but it looks like that hasn't been running for the past two years. I'm not sure if we need to combine the results, but maybe we'd like to do a one-time run of old commits to cover the missing time. |
MDAnalysis has been running ASV benchmarks for the last 6 years and published them at https://www.mdanalysis.org/benchmarks/ . Data are held in https://github.com/MDAnalysis/benchmarks . These benchmarks have been running on a local machine. We would like to run the set of benchmarks on NumFOCUS infrastructure.
Key considerations for us
We would also like to upload historical data for comparison https://github.com/MDAnalysis/benchmarks. This would be a one-time process.
cc: @MDAnalysis/coredevs
The text was updated successfully, but these errors were encountered: