Parallelism & chunking at convert step #68

ml-evs · 2024-11-21T13:59:14Z

It might be nice to allow for the convert step to be performed in parallel, with each chunk combined at the end.

This should reduce peak memory usage (currently all entries need to fit in memory in the OPTIMADE format) and would also give us better control of concurrency, as for now it seems e.g., the pymatgen CIF reader will happily use all cores and lock up a system.

The only difficult here will be how the properties are then assigned to a structure. We could consider changing this to a two-step process, where first a bare optimade.jsonl is written with all the structures only, and then we loop through that file and add properties where appropriate, writing the results out to a new file.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelism & chunking at convert step #68

Parallelism & chunking at convert step #68

ml-evs commented Nov 21, 2024

Parallelism & chunking at convert step #68

Parallelism & chunking at convert step #68

Comments

ml-evs commented Nov 21, 2024