Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH, REF] Reduce memory requirements for metric calculation and PCA #345

Merged
merged 5 commits into from
Jul 10, 2019

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented Jun 21, 2019

References #269. I very much doubt that these changes will solve the problem on their own, and I'm not even sure if they'll help much.

Changes proposed in this pull request:

  • Add --lowmem argument to trigger use of IncrementalPCA. Should also trigger other low-memory steps we might implement in the future.
  • Clean up arrays in dependence_metrics when they're no longer used.
  • Operate on masks arrays as much as possible within dependence_metrics.
  • Make mask argument for computefeats2 optional.

@codecov
Copy link

codecov bot commented Jun 21, 2019

Codecov Report

Merging #345 into master will decrease coverage by 0.56%.
The diff coverage is 5.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #345      +/-   ##
==========================================
- Coverage   49.22%   48.66%   -0.57%     
==========================================
  Files          39       39              
  Lines        2139     2166      +27     
==========================================
+ Hits         1053     1054       +1     
- Misses       1086     1112      +26
Impacted Files Coverage Δ
tedana/metrics/kundu_fit.py 29.05% <0%> (-1.01%) ⬇️
tedana/utils.py 56.19% <0%> (-1.1%) ⬇️
tedana/workflows/tedana.py 12.16% <0%> (-0.27%) ⬇️
tedana/io.py 48.42% <0%> (ø) ⬆️
tedana/stats.py 76.27% <50%> (-2.68%) ⬇️
tedana/decomposition/pca.py 15.29% <6.66%> (-1.38%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74be0f7...cf8d0c4. Read the comment docs.

@emdupre
Copy link
Member

emdupre commented Jun 24, 2019

Can we also set copy=False as the default for regular PCA ?

@tsalo
Copy link
Member Author

tsalo commented Jun 24, 2019

Absolutely. Done!

Copy link
Member

@emdupre emdupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small questions but excited to see this in ! From my re-reading I still think copy=False is safe for our use case, but would appreciate someone else (re-) confirming !

F_R2_clmaps[:, i_comp] = utils.threshold_map(
ccimg, min_cluster_size=csize, threshold=fmin, mask=mask,
binarize=True)
binarize=True).astype(bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to explicitly cast to boolean after binarize=True ? Should that casting instead be performed inside utils.threshold_map ?

Copy link
Member Author

@tsalo tsalo Jul 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. It's true that binarized data should be boolean, but nibabel is super-mean and for some reason doesn't work with boolean arrays, so there's an argument to be made for using integers as the default for binarized arrays in threshold_map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through I think the function is only used in these five instances you're updating, so I think we should just update the function, probably ! If we use it in other ways, we can re-consider the default. WDYT of that idea ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Should be good now.

X1 = mumask.T # Model 1
X2 = np.tile(tes, (1, n_data_voxels)) * mumask.T / t2smask.T # Model 2
X1 = mu.T # Model 1
X2 = np.tile(tes, (1, n_voxels)) * mu.T / t2s.T # Model 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for the name change here (i.e., n_data_voxels to n_voxels) ? Did we change the meaning or just the name ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_data_voxels and n_voxels originally corresponded to t2s != 0 and mask, respectively. With the replacement of mask with t2s != 0, they became duplicates of one another.

Copy link
Member

@emdupre emdupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tsalo !

Copy link
Collaborator

@jbteves jbteves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! My only question is whether we have an idea of how much less memory the lowmem uses than the original, and how much memory preventing that copy saves? That way we can report back to users who were having trouble.

@tsalo
Copy link
Member Author

tsalo commented Jul 9, 2019

I have no clue how much (or even if) these changes help. I don't know how to do proper memory profiling to find out either.

@jbteves
Copy link
Collaborator

jbteves commented Jul 9, 2019

There aren't really builtins other than manually running pdb and noting the consumption, unfortunately. There are some packages to do it but there's no reason to run the profile before merging; we may want to take a while to think of a good approach before we publish memory consumption guidelines anyway.

@tsalo tsalo merged commit 54857b0 into ME-ICA:master Jul 10, 2019
@tsalo tsalo deleted the low-mem branch July 10, 2019 14:54
handwerkerd pushed a commit to handwerkerd/tedana that referenced this pull request Sep 11, 2019
…E-ICA#345)

* Add low_mem option and remove unused variables in dependence_metrics.

* Fix IncrementalPCA call.

* Fix verbose file generation.

* Set copy to False in PCA.

* Change how binarize operates in threshold_map.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants