Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelism & chunking at convert step #68

Open
ml-evs opened this issue Nov 21, 2024 · 0 comments
Open

Parallelism & chunking at convert step #68

ml-evs opened this issue Nov 21, 2024 · 0 comments

Comments

@ml-evs
Copy link
Collaborator

ml-evs commented Nov 21, 2024

It might be nice to allow for the convert step to be performed in parallel, with each chunk combined at the end.

This should reduce peak memory usage (currently all entries need to fit in memory in the OPTIMADE format) and would also give us better control of concurrency, as for now it seems e.g., the pymatgen CIF reader will happily use all cores and lock up a system.

The only difficult here will be how the properties are then assigned to a structure. We could consider changing this to a two-step process, where first a bare optimade.jsonl is written with all the structures only, and then we loop through that file and add properties where appropriate, writing the results out to a new file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant