-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH, REF] Reduce memory requirements for metric calculation and PCA #345
Conversation
Codecov Report
@@ Coverage Diff @@
## master #345 +/- ##
==========================================
- Coverage 49.22% 48.66% -0.57%
==========================================
Files 39 39
Lines 2139 2166 +27
==========================================
+ Hits 1053 1054 +1
- Misses 1086 1112 +26
Continue to review full report at Codecov.
|
Can we also set |
Absolutely. Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small questions but excited to see this in ! From my re-reading I still think copy=False
is safe for our use case, but would appreciate someone else (re-) confirming !
tedana/metrics/kundu_fit.py
Outdated
F_R2_clmaps[:, i_comp] = utils.threshold_map( | ||
ccimg, min_cluster_size=csize, threshold=fmin, mask=mask, | ||
binarize=True) | ||
binarize=True).astype(bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to explicitly cast to boolean after binarize=True
? Should that casting instead be performed inside utils.threshold_map
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. It's true that binarized data should be boolean, but nibabel is super-mean and for some reason doesn't work with boolean arrays, so there's an argument to be made for using integers as the default for binarized arrays in threshold_map
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through I think the function is only used in these five instances you're updating, so I think we should just update the function, probably ! If we use it in other ways, we can re-consider the default. WDYT of that idea ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Should be good now.
X1 = mumask.T # Model 1 | ||
X2 = np.tile(tes, (1, n_data_voxels)) * mumask.T / t2smask.T # Model 2 | ||
X1 = mu.T # Model 1 | ||
X2 = np.tile(tes, (1, n_voxels)) * mu.T / t2s.T # Model 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for the name change here (i.e., n_data_voxels
to n_voxels
) ? Did we change the meaning or just the name ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_data_voxels
and n_voxels
originally corresponded to t2s != 0
and mask
, respectively. With the replacement of mask
with t2s != 0
, they became duplicates of one another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @tsalo !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! My only question is whether we have an idea of how much less memory the lowmem
uses than the original, and how much memory preventing that copy saves? That way we can report back to users who were having trouble.
I have no clue how much (or even if) these changes help. I don't know how to do proper memory profiling to find out either. |
There aren't really builtins other than manually running |
…E-ICA#345) * Add low_mem option and remove unused variables in dependence_metrics. * Fix IncrementalPCA call. * Fix verbose file generation. * Set copy to False in PCA. * Change how binarize operates in threshold_map.
References #269. I very much doubt that these changes will solve the problem on their own, and I'm not even sure if they'll help much.
Changes proposed in this pull request:
--lowmem
argument to trigger use of IncrementalPCA. Should also trigger other low-memory steps we might implement in the future.dependence_metrics
when they're no longer used.dependence_metrics
.mask
argument forcomputefeats2
optional.