Skip to content

Commit

Permalink
Read synapse parameters in a collective safe manner. (#85)
Browse files Browse the repository at this point in the history
## Context
When using `WholeCell` load-balancing, the access pattern when reading
parameters during synapse creation is extremely poor and is the main
reason why we see long (10+ minutes) periods of severe performance
degradation of our parallel filesystem when running slightly larger
simulations on BB5.

Using Darshan and several PoCs we established that the time required to
read these parameters can be reduced by more than 8x and IOps can be
reduced by over 1000x when using collective MPI-IO. Moreover, the
"waiters" where reduced substantially as well. See BBPBGLIB-1070.

Following those finding we concluded that neurodamus would need to use
collective MPI-IO in the future.

We've implemented most of the required changes directly in libsonata
allowing others to benefit from the same optimizations should the need
arise. See,
BlueBrain/libsonata#309
BlueBrain/libsonata#307

and preparatory work:
BlueBrain/libsonata#315
BlueBrain/libsonata#314
BlueBrain/libsonata#298 

By instrumenting two simulations (SSCX and reduced MMB) we concluded
that neurodamus was almost collective. However, certain attributes where
read in different order on different MPI ranks. Maybe due to salting
hashes differently on different MPI ranks.

## Scope
This PR enables neurodamus to use collective IO for the simulation
described above.

## Testing
<!--
Please add a new test under `tests`. Consider the following cases:

1. If the change is in an independent component (e.g, a new container
type, a parser, etc) a bare unit test should be sufficient. See e.g.
`tests/test_coords.py`
2. If you are fixing or adding components supporting a scientific use
case, affecting node or synapse creation, etc..., which typically rely
on Neuron, tests should set up a simulation using that feature,
instantiate neurodamus, **assess the state**, run the simulation and
check the results are as expected.
    See an example at `tests/test_simulation.py#L66`
-->
We successfully ran the reduced MMB simulation, but since SSCX hasn't
been converted to SONATA, we can't run that simulation.

## Review
* [x] PR description is complete
* [x] Coding style (imports, function length, New functions, classes or
files) are good
* [ ] Unit/Scientific test added
* [ ] Updated Readme, in-code, developer documentation

---------

Co-authored-by: Luc Grosheintz <[email protected]>
  • Loading branch information
1uc and Luc Grosheintz authored Jan 29, 2024
1 parent a866ae4 commit 9bff654
Showing 1 changed file with 11 additions and 4 deletions.
15 changes: 11 additions & 4 deletions neurodamus/io/synapse_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,13 @@ class SonataReader(SynapseReader):
}

def _open_file(self, src, population, _):
storage = libsonata.EdgeStorage(src)
try:
from mpi4py import MPI
hdf5_reader = libsonata.make_collective_reader(MPI.COMM_WORLD, False, True)
except ModuleNotFoundError:
hdf5_reader = libsonata.Hdf5Reader()

storage = libsonata.EdgeStorage(src, hdf5_reader=hdf5_reader)
if not population:
assert len(storage.population_names) == 1
population = next(iter(storage.population_names))
Expand Down Expand Up @@ -295,13 +301,13 @@ def _read(attribute, optional=False):
_populate("tgid", self._population.target_nodes(needed_edge_ids) + 1)

# Make synapse index in the file explicit
for name in self.SYNAPSE_INDEX_NAMES:
for name in sorted(self.SYNAPSE_INDEX_NAMES):
_populate(name, needed_edge_ids.flatten())

# Generic synapse parameters
fields_load_sonata = self.Parameters.fields(exclude=self.custom_parameters | compute_fields,
with_translation=self.parameter_mapping)
for (field, sonata_attr, is_optional) in fields_load_sonata:
for (field, sonata_attr, is_optional) in sorted(fields_load_sonata):
_populate(field, _read(sonata_attr, is_optional))

if self.custom_parameters:
Expand All @@ -311,7 +317,8 @@ def _read(attribute, optional=False):
# This has to work for when we call preload() a second/third time
# so we are unsure about which gids were loaded what properties
# We nevertheless can skip any base fields
for field in set(self._extra_fields) - (self.Parameters.all_fields | compute_fields):
extra_fields = set(self._extra_fields) - (self.Parameters.all_fields | compute_fields)
for field in sorted(extra_fields):
now_needed_ids = sorted(set(gid for gid in ids if field not in self._data[gid]))
if needed_ids != now_needed_ids:
needed_ids = now_needed_ids
Expand Down

0 comments on commit 9bff654

Please sign in to comment.