[ENH, REF] Reduce memory requirements for metric calculation and PCA #345

tsalo · 2019-06-21T17:55:08Z

References #269. I very much doubt that these changes will solve the problem on their own, and I'm not even sure if they'll help much.

Changes proposed in this pull request:

Add --lowmem argument to trigger use of IncrementalPCA. Should also trigger other low-memory steps we might implement in the future.
Clean up arrays in dependence_metrics when they're no longer used.
Operate on masks arrays as much as possible within dependence_metrics.
Make mask argument for computefeats2 optional.

codecov · 2019-06-21T18:00:11Z

Codecov Report

Merging #345 into master will decrease coverage by 0.56%.
The diff coverage is 5.88%.

@@            Coverage Diff             @@
##           master     #345      +/-   ##
==========================================
- Coverage   49.22%   48.66%   -0.57%     
==========================================
  Files          39       39              
  Lines        2139     2166      +27     
==========================================
+ Hits         1053     1054       +1     
- Misses       1086     1112      +26

Impacted Files	Coverage Δ
tedana/metrics/kundu_fit.py	`29.05% <0%> (-1.01%)`	⬇️
tedana/utils.py	`56.19% <0%> (-1.1%)`	⬇️
tedana/workflows/tedana.py	`12.16% <0%> (-0.27%)`	⬇️
tedana/io.py	`48.42% <0%> (ø)`	⬆️
tedana/stats.py	`76.27% <50%> (-2.68%)`	⬇️
tedana/decomposition/pca.py	`15.29% <6.66%> (-1.38%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74be0f7...cf8d0c4. Read the comment docs.

emdupre · 2019-06-24T11:44:13Z

Can we also set copy=False as the default for regular PCA ?

tsalo · 2019-06-24T11:49:54Z

Absolutely. Done!

emdupre

Small questions but excited to see this in ! From my re-reading I still think copy=False is safe for our use case, but would appreciate someone else (re-) confirming !

emdupre · 2019-07-08T14:26:57Z

tedana/metrics/kundu_fit.py

            F_R2_clmaps[:, i_comp] = utils.threshold_map(
                ccimg, min_cluster_size=csize, threshold=fmin, mask=mask,
-                binarize=True)
+                binarize=True).astype(bool)


Why do we need to explicitly cast to boolean after binarize=True ? Should that casting instead be performed inside utils.threshold_map ?

I'm not sure. It's true that binarized data should be boolean, but nibabel ~~is super-mean and for some reason~~ doesn't work with boolean arrays, so there's an argument to be made for using integers as the default for binarized arrays in threshold_map.

Looking through I think the function is only used in these five instances you're updating, so I think we should just update the function, probably ! If we use it in other ways, we can re-consider the default. WDYT of that idea ?

Sounds good. Should be good now.

emdupre · 2019-07-08T14:28:44Z

tedana/metrics/kundu_fit.py

-    X1 = mumask.T  # Model 1
-    X2 = np.tile(tes, (1, n_data_voxels)) * mumask.T / t2smask.T  # Model 2
+    X1 = mu.T  # Model 1
+    X2 = np.tile(tes, (1, n_voxels)) * mu.T / t2s.T  # Model 2


What's the motivation for the name change here (i.e., n_data_voxels to n_voxels) ? Did we change the meaning or just the name ?

n_data_voxels and n_voxels originally corresponded to t2s != 0 and mask, respectively. With the replacement of mask with t2s != 0, they became duplicates of one another.

emdupre

Thanks, @tsalo !

jbteves

LGTM! My only question is whether we have an idea of how much less memory the lowmem uses than the original, and how much memory preventing that copy saves? That way we can report back to users who were having trouble.

tsalo · 2019-07-09T14:52:58Z

I have no clue how much (or even if) these changes help. I don't know how to do proper memory profiling to find out either.

jbteves · 2019-07-09T14:57:17Z

There aren't really builtins other than manually running pdb and noting the consumption, unfortunately. There are some packages to do it but there's no reason to run the profile before merging; we may want to take a while to think of a good approach before we publish memory consumption guidelines anyway.

…E-ICA#345) * Add low_mem option and remove unused variables in dependence_metrics. * Fix IncrementalPCA call. * Fix verbose file generation. * Set copy to False in PCA. * Change how binarize operates in threshold_map.

tsalo added 2 commits June 21, 2019 13:45

Add low_mem option and remove unused variables in dependence_metrics.

292d03d

Fix IncrementalPCA call.

f1a8802

Fix verbose file generation.

8925a6f

Set copy to False in PCA.

3cef791

emdupre reviewed Jul 8, 2019

View reviewed changes

Change how binarize operates in threshold_map.

cf8d0c4

emdupre approved these changes Jul 8, 2019

View reviewed changes

jbteves approved these changes Jul 9, 2019

View reviewed changes

tsalo merged commit 54857b0 into ME-ICA:master Jul 10, 2019

tsalo deleted the low-mem branch July 10, 2019 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH, REF] Reduce memory requirements for metric calculation and PCA #345

[ENH, REF] Reduce memory requirements for metric calculation and PCA #345

tsalo commented Jun 21, 2019

codecov bot commented Jun 21, 2019 •

edited

Loading

emdupre commented Jun 24, 2019

tsalo commented Jun 24, 2019

emdupre left a comment

emdupre Jul 8, 2019

tsalo Jul 8, 2019 •

edited

Loading

emdupre Jul 8, 2019

tsalo Jul 8, 2019

emdupre Jul 8, 2019

tsalo Jul 8, 2019

emdupre left a comment

jbteves left a comment

tsalo commented Jul 9, 2019

jbteves commented Jul 9, 2019

[ENH, REF] Reduce memory requirements for metric calculation and PCA #345

[ENH, REF] Reduce memory requirements for metric calculation and PCA #345

Conversation

tsalo commented Jun 21, 2019

codecov bot commented Jun 21, 2019 • edited Loading

Codecov Report

emdupre commented Jun 24, 2019

tsalo commented Jun 24, 2019

emdupre left a comment

Choose a reason for hiding this comment

emdupre Jul 8, 2019

Choose a reason for hiding this comment

tsalo Jul 8, 2019 • edited Loading

Choose a reason for hiding this comment

emdupre Jul 8, 2019

Choose a reason for hiding this comment

tsalo Jul 8, 2019

Choose a reason for hiding this comment

emdupre Jul 8, 2019

Choose a reason for hiding this comment

tsalo Jul 8, 2019

Choose a reason for hiding this comment

emdupre left a comment

Choose a reason for hiding this comment

jbteves left a comment

Choose a reason for hiding this comment

tsalo commented Jul 9, 2019

jbteves commented Jul 9, 2019

codecov bot commented Jun 21, 2019 •

edited

Loading

tsalo Jul 8, 2019 •

edited

Loading