Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Speed up cluster-extent thresholding function #239

Merged
merged 5 commits into from
Apr 3, 2019

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented Mar 21, 2019

Closes None.

Changes proposed in this pull request:

  • Use numpy.unique to cluster-extent threshold and binarize maps with spatclust.
  • Cluster positive and negative values separately.
  • Rename spatclust to threshold_map and move from model.fit to utils.
  • Add arguments for binarization (default is yes) and sided-ness of the thresholding (default is to treat positive and negative values the same).

This also clusters positive and negative values separately.
@codecov
Copy link

codecov bot commented Mar 21, 2019

Codecov Report

Merging #239 into master will decrease coverage by 0.36%.
The diff coverage is 5.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #239      +/-   ##
==========================================
- Coverage   46.39%   46.02%   -0.37%     
==========================================
  Files          33       33              
  Lines        2037     2049      +12     
==========================================
- Hits          945      943       -2     
- Misses       1092     1106      +14
Impacted Files Coverage Δ
tedana/model/__init__.py 100% <ø> (ø) ⬆️
tedana/model/fit.py 30.43% <0%> (+2.39%) ⬆️
tedana/utils.py 59.63% <6.97%> (-34.4%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1bc0b6...134a1ee. Read the comment docs.

@tsalo tsalo changed the title [ENH] Speed up spatclust function [ENH] Speed up cluster-extent thresholding function Mar 21, 2019
@tsalo tsalo requested a review from emdupre March 21, 2019 17:53
@dowdlelt
Copy link
Collaborator

I have a brief thought about the separation of positive and negative clustering - it absolutely makes sense if we are looking for clusters of BOLD activity, as then you expect a nice smooth pattern, all in one direction. For things like GRAPPA artifacts, particularly those associated with movement, the artifacts sometimes lead to alternating positive and negative bands or grid like manifestations (likely due to their Fourier origin) - and I worry that this would reduce the rho metric for those.

Perhaps I am misunderstanding the change, however. I also don't have a figure handy for that, so I'm going off of memory. I'll see if I can dig up something tomorrow.

@tsalo
Copy link
Member Author

tsalo commented Mar 21, 2019

That is an extremely interesting point that relates to my concerns about how we should treat betas in calculating our metrics. When you have a chance to take a look at that issue, please make sure to bring up that point. In this case, though, we just use spatclust to cluster-extent threshold and binarize our F-statistic, Z-statistic, and beta maps, mostly to compare thresholded maps and count "signal" (within-cluster) and "noise" (no cluster) voxels. Although we do look at the S0 F-statistic and beta maps, that won't impact Rho calculation.

In the case of banding, then I don't think that this proposed change will hide that (unless the bands of alternative positive and negative betas are extremely small), since our minimum cluster size is almost always 20 voxels. At most, this will peel off the outlying couple of voxels that might glom onto a larger cluster of a different sign. The resulting differences in the component table are minimal.

@tsalo
Copy link
Member Author

tsalo commented Mar 21, 2019

I could also update the function to support different kinds of thresholding. We could use 3dClustSim's terminology. To paraphrase 3dClustSim:

3dClustSim does 3 different types of thresholding:
1-sided: thresholded on the positive side
2-sided: positive and negative values above the threshold are included, and then clustered together
bi-sided: where positive values and negative values above the threshold are clustered SEPARATELY

We could then discuss what is most appropriate on a map-by-map basis.

@emdupre
Copy link
Member

emdupre commented Mar 22, 2019

I'll add this to the agenda for tomorrow, at least as a non-verbal update, to get folks to weigh in !

@tsalo
Copy link
Member Author

tsalo commented Mar 22, 2019

I changed the default to be two-sided (but with the options to use one-sided or bi-sided) and renamed the function to threshold_map. The name spatclust always seemed to imply that it was doing more than just cluster-extent thresholding and binarizing a statistical map (at least to me).

@tsalo
Copy link
Member Author

tsalo commented Mar 28, 2019

At this point the spatial clustering should be equivalent to the old version (but much faster and with control over binarization and thresholding method). Does anyone have any concerns/thoughts about the method I'm using?

@dowdlelt
Copy link
Collaborator

dowdlelt commented Mar 30, 2019

Sounds excellent to me! - how much time was spatial clustering contributing to the whole? I know you've done some work with clocking tedana as a whole, but I'm now curious how much each step takes.

@tsalo
Copy link
Member Author

tsalo commented Mar 31, 2019

I haven't done proper profiling, but based on the runlogs when running with MLE on the 5-echo test dataset on my laptop, with the old method it took ~2:50 to threshold the relevant maps for 159 components, and with these changes it took ~6 seconds. And all of the clmaps pass numpy.array_equal, so that's good.

@dowdlelt
Copy link
Collaborator

That's a pretty substantial speed up! Awesome!

@tsalo tsalo merged commit 3c59b50 into ME-ICA:master Apr 3, 2019
emdupre pushed a commit that referenced this pull request Apr 3, 2019
* Speed up spatclust function.

This also clusters positive and negative values separately.

* Rename spatclust to threshold_map and add binarize/sided arguments.

* Replace manually generated binary structure with function-made one.
@tsalo tsalo deleted the speedup-spatclust branch April 4, 2019 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants