[ENH] Speed up cluster-extent thresholding function #239

tsalo · 2019-03-21T16:40:46Z

Closes None.

Changes proposed in this pull request:

Use numpy.unique to cluster-extent threshold and binarize maps with spatclust.
Cluster positive and negative values separately.
Rename spatclust to threshold_map and move from model.fit to utils.
Add arguments for binarization (default is yes) and sided-ness of the thresholding (default is to treat positive and negative values the same).

This also clusters positive and negative values separately.

codecov · 2019-03-21T16:56:46Z

Codecov Report

Merging #239 into master will decrease coverage by 0.36%.
The diff coverage is 5.76%.

@@            Coverage Diff             @@
##           master     #239      +/-   ##
==========================================
- Coverage   46.39%   46.02%   -0.37%     
==========================================
  Files          33       33              
  Lines        2037     2049      +12     
==========================================
- Hits          945      943       -2     
- Misses       1092     1106      +14

Impacted Files	Coverage Δ
tedana/model/__init__.py	`100% <ø> (ø)`	⬆️
tedana/model/fit.py	`30.43% <0%> (+2.39%)`	⬆️
tedana/utils.py	`59.63% <6.97%> (-34.4%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1bc0b6...134a1ee. Read the comment docs.

dowdlelt · 2019-03-21T19:01:30Z

I have a brief thought about the separation of positive and negative clustering - it absolutely makes sense if we are looking for clusters of BOLD activity, as then you expect a nice smooth pattern, all in one direction. For things like GRAPPA artifacts, particularly those associated with movement, the artifacts sometimes lead to alternating positive and negative bands or grid like manifestations (likely due to their Fourier origin) - and I worry that this would reduce the rho metric for those.

Perhaps I am misunderstanding the change, however. I also don't have a figure handy for that, so I'm going off of memory. I'll see if I can dig up something tomorrow.

tsalo · 2019-03-21T19:14:06Z

That is an extremely interesting point that relates to my concerns about how we should treat betas in calculating our metrics. When you have a chance to take a look at that issue, please make sure to bring up that point. In this case, though, we just use spatclust to cluster-extent threshold and binarize our F-statistic, Z-statistic, and beta maps, mostly to compare thresholded maps and count "signal" (within-cluster) and "noise" (no cluster) voxels. Although we do look at the S0 F-statistic and beta maps, that won't impact Rho calculation.

In the case of banding, then I don't think that this proposed change will hide that (unless the bands of alternative positive and negative betas are extremely small), since our minimum cluster size is almost always 20 voxels. At most, this will peel off the outlying couple of voxels that might glom onto a larger cluster of a different sign. The resulting differences in the component table are minimal.

tsalo · 2019-03-21T19:41:30Z

I could also update the function to support different kinds of thresholding. We could use 3dClustSim's terminology. To paraphrase 3dClustSim:

3dClustSim does 3 different types of thresholding:
1-sided: thresholded on the positive side
2-sided: positive and negative values above the threshold are included, and then clustered together
bi-sided: where positive values and negative values above the threshold are clustered SEPARATELY

We could then discuss what is most appropriate on a map-by-map basis.

emdupre · 2019-03-22T00:51:25Z

I'll add this to the agenda for tomorrow, at least as a non-verbal update, to get folks to weigh in !

tsalo · 2019-03-22T18:05:01Z

I changed the default to be two-sided (but with the options to use one-sided or bi-sided) and renamed the function to threshold_map. The name spatclust always seemed to imply that it was doing more than just cluster-extent thresholding and binarizing a statistical map (at least to me).

tsalo · 2019-03-28T13:13:18Z

At this point the spatial clustering should be equivalent to the old version (but much faster and with control over binarization and thresholding method). Does anyone have any concerns/thoughts about the method I'm using?

dowdlelt · 2019-03-30T22:25:42Z

Sounds excellent to me! - how much time was spatial clustering contributing to the whole? I know you've done some work with clocking tedana as a whole, but I'm now curious how much each step takes.

tsalo · 2019-03-31T15:57:48Z

I haven't done proper profiling, but based on the runlogs when running with MLE on the 5-echo test dataset on my laptop, with the old method it took ~2:50 to threshold the relevant maps for 159 components, and with these changes it took ~6 seconds. And all of the clmaps pass numpy.array_equal, so that's good.

dowdlelt · 2019-03-31T17:06:41Z

That's a pretty substantial speed up! Awesome!

* Speed up spatclust function. This also clusters positive and negative values separately. * Rename spatclust to threshold_map and add binarize/sided arguments. * Replace manually generated binary structure with function-made one.

Speed up spatclust function.

769fb62

This also clusters positive and negative values separately.

tsalo changed the title ~~[ENH] Speed up spatclust function~~ [ENH] Speed up cluster-extent thresholding function Mar 21, 2019

tsalo requested a review from emdupre March 21, 2019 17:53

Rename spatclust to threshold_map and add binarize/sided arguments.

125718c

Merge branch 'master' into speedup-spatclust

02685d7

tsalo added 2 commits March 30, 2019 14:00

Replace manually generated binary structure with function-made one.

b5d948c

Fix.

134a1ee

tsalo merged commit 3c59b50 into ME-ICA:master Apr 3, 2019

tsalo deleted the speedup-spatclust branch April 4, 2019 00:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Speed up cluster-extent thresholding function #239

[ENH] Speed up cluster-extent thresholding function #239

tsalo commented Mar 21, 2019 •

edited

Loading

codecov bot commented Mar 21, 2019 •

edited

Loading

dowdlelt commented Mar 21, 2019

tsalo commented Mar 21, 2019

tsalo commented Mar 21, 2019

emdupre commented Mar 22, 2019

tsalo commented Mar 22, 2019 •

edited

Loading

tsalo commented Mar 28, 2019

dowdlelt commented Mar 30, 2019 •

edited

Loading

tsalo commented Mar 31, 2019 •

edited

Loading

dowdlelt commented Mar 31, 2019

[ENH] Speed up cluster-extent thresholding function #239

[ENH] Speed up cluster-extent thresholding function #239

Conversation

tsalo commented Mar 21, 2019 • edited Loading

codecov bot commented Mar 21, 2019 • edited Loading

Codecov Report

dowdlelt commented Mar 21, 2019

tsalo commented Mar 21, 2019

tsalo commented Mar 21, 2019

emdupre commented Mar 22, 2019

tsalo commented Mar 22, 2019 • edited Loading

tsalo commented Mar 28, 2019

dowdlelt commented Mar 30, 2019 • edited Loading

tsalo commented Mar 31, 2019 • edited Loading

dowdlelt commented Mar 31, 2019

tsalo commented Mar 21, 2019 •

edited

Loading

codecov bot commented Mar 21, 2019 •

edited

Loading

tsalo commented Mar 22, 2019 •

edited

Loading

dowdlelt commented Mar 30, 2019 •

edited

Loading

tsalo commented Mar 31, 2019 •

edited

Loading