Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outline memory required for tedana to run #267

Closed
2 tasks
jbteves opened this issue Apr 22, 2019 · 8 comments
Closed
2 tasks

Outline memory required for tedana to run #267

jbteves opened this issue Apr 22, 2019 · 8 comments
Labels
documentation issues related to improving documentation for the project hackathon Issues to tackle in the NIH hackathon testing issues related to improving testing in the project
Milestone

Comments

@jbteves
Copy link
Collaborator

jbteves commented Apr 22, 2019

Summary

In several issues and in informal conversations, users have reported large RAM usage. With no formal guidelines, it is difficult for a user to know what to expect from their data. We should outline memory requirements for various data sets, most notably high-resolution data.

Additional Detail

In issues #254 and #144 as well as in formal conversation, users have noted a large RAM usage. While @tsalo has done refactoring that should ameliorate this problem and we should have a release that reduces usage, it would be good to have guidelines for how RAM usage should scale with dataset:

  1. Spatial Resolution
  2. Temporal Resolution
  3. Echo Number

We should bear in mind that peak usage can be problematic since taking all available RAM will lead to system thrashing such as described in #254 (the operating system is spending all of its time retrieving memory from disk since it's run out of RAM, and then the problem cascades into all programs as it struggles to handle its I/O load).

Next Steps

  • Run datasets of varying resolution and length and record their memory usage at different steps
  • Create plots that demonstrate the memory usage over time, dependent on the parameters (like this
@jbteves jbteves added the documentation issues related to improving documentation for the project label Apr 22, 2019
@jbteves
Copy link
Collaborator Author

jbteves commented Apr 22, 2019

@dowdlelt ran a quick test and indicated that the memory usage seems to match expectations, with increased voxel size causing rapid memory consumption and a near-linear scaling with echo number.
faster_than_linear_with_echo_memuse

@jbteves
Copy link
Collaborator Author

jbteves commented Apr 22, 2019

According to @dowdlelt a user-supplied mask can drastically reduce the number of data points required to run data and thus drastically reduce memory requirements; he notes that an AfNI EPI mask reduced a 3mm (resampled to 2.5mm) isotropic data set from 540k voxels to 120k voxels-- a dramatic 80% compression in memory required. This should be ameliorated in the next release due to @tsalo's contributions in #226, especially per a comparison between the nilearn mask and AfNI mask that yielded an ~5k voxel difference.

@jbteves jbteves added this to the documentation milestone May 24, 2019
@jbteves jbteves added hackathon Issues to tackle in the NIH hackathon testing issues related to improving testing in the project labels Oct 29, 2019
@jbteves
Copy link
Collaborator Author

jbteves commented Oct 29, 2019

With more datasets coming in this should be more easily testable.

@dowdlelt
Copy link
Collaborator

As it seems very deterministic (some function of n_voxels x n_echoes), it may be possible to add a check at runtime comparing estimated RAM usage to available ram and to provide a message for users. Or even just show estimated RAM usage, so if things don't work, they can scroll through and see that as a potential problem, you know, for user friendliness.

@jbteves
Copy link
Collaborator Author

jbteves commented Oct 31, 2019

That's a great idea! Probably not too hard to just estimate the required, and add 10% for a buffer amount.

@rmarkello
Copy link
Member

I believe the data are being loaded into memory as float32 (assuming Nifti-1), which means that the number of bytes used will be:

nbytes = (4 * x * y * z * t * e)

If the data are being loaded as float64 then substitute 8 for the 4 in the equation.

The trickier part is determining how many copies of the data are being made in memory during computations... You could use memory-profiler to try and make some estimates, but as a lower bound that's a good start.

@jbteves
Copy link
Collaborator Author

jbteves commented Oct 31, 2019

Yeah, it's also hard to tell memory usage instantaneously. We'll add this to things to look out for on Testing & Validation. I'll try to familiarize myself with this tool, thanks @rmarkello.

@tsalo tsalo changed the title Outline Memory Required for Tedana to run Outline memory required for tedana to run Nov 18, 2019
@stale
Copy link

stale bot commented Feb 16, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !

@stale stale bot added the stale label Feb 16, 2020
@stale stale bot closed this as completed Feb 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation issues related to improving documentation for the project hackathon Issues to tackle in the NIH hackathon testing issues related to improving testing in the project
Projects
None yet
Development

No branches or pull requests

3 participants