Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address divide by zero when getelbow called with only one component #808

Closed
handwerkerd opened this issue Sep 22, 2021 · 0 comments · Fixed by #756
Closed

Address divide by zero when getelbow called with only one component #808

handwerkerd opened this issue Sep 22, 2021 · 0 comments · Fixed by #756
Labels
bug issues describing a bug or error found in the project effort: low Theoretically less than a day's work impact: medium Improves code/documentation functionality for some users priority: medium Should get addressed soon

Comments

@handwerkerd
Copy link
Member

handwerkerd commented Sep 22, 2021

Summary

This is a continuation of issue #752 and PR #760. That discussion focused on removing a really uninformative error when getelbow was called on an empty array. I now have a situation where getelbow is called on only one component and it runs, but has a divide by zero warning.

The basic solution to this is extremely easy: The error goes away if getelbow is only run with a minimum of 2 components. I'm creating an issue rather than a PR because there may be some relevant discussion here. One can calculate an elbow of a curve with only two values, but that's essentially meaningless. If we're setting a threshold for a number of components than than which we shouldn't calculate an elbow, what should it be? 3 is mathematically plausible, but still non-ideal. I'm leaning towards "5" while acknowledging that's arbitrary.

Additional Detail

The block of code where I get the divide by zero error is:

kappas_nonsig = comptable.loc[comptable["kappa"] < f01, "kappa"]
if not kappas_nonsig.size:
LGR.warning(
"No nonsignificant kappa values detected. "
"Only using elbow calculated from all kappa values."
)
kappas_nonsig_elbow = np.nan
else:
kappas_nonsig_elbow = getelbow(kappas_nonsig, return_val=True)

The error goes away by replacing if not kappas_nonsig.size with if kappas_nonsig.size<=1 but the question is whether it should be something like <=5 instead.

The last PR also added a check and warning in getelbow. We can adjust that too to throw a warning if the number of components are under a threshold rather than just empty:

if not arr.size:
raise ValueError(
"Empty array detected during elbow calculation. "
"This error happens when getelbow is incorrectly called on no components. "
"If you see this message, please open an issue at "
"https://github.com/ME-ICA/tedana/issues with the full traceback and any data "
"necessary to reproduce this error, so that we create additional data checks to "
"prevent this from happening."
)

As far as I can tell, getelbow is called in two situations. (1) On most or all of the components. In that situation, this will never be an issue because, if there are only 5 total components, then tedana will have bigger problems. (2) A subset of components where that second elbow threshold is used in conjunction with a threshold on all components. The current approach was to just ignore that second threshold when there are zero components.

Another approach is that the second threshold on non-significant components can only lower the kappa elbow and result in accepting more components and T2* signal. Excluding this option if there are less than X components means that denoising may be slightly more aggressive.

Next Steps

  • Decide on the appropriate threshold for setting an elbow to nan rather than using just a few components to calculate an elbow.
    Options:

    1. If there's only one component, set that value as the elbow
    2. Set the minimum to 2, which means the code will run as is and the elbow won't actually be an elbow, but will be somewhere around those two value
    3. Set a minimum closer to 5 where an elbow calculation has a chance of being meaningful

    I originally was leaning towards iii, but I might be overthinking this, and ii might be simplest.

  • Once a decision is made, implement this.

@handwerkerd handwerkerd added bug issues describing a bug or error found in the project priority: medium Should get addressed soon effort: low Theoretically less than a day's work impact: medium Improves code/documentation functionality for some users labels Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issues describing a bug or error found in the project effort: low Theoretically less than a day's work impact: medium Improves code/documentation functionality for some users priority: medium Should get addressed soon
Projects
None yet
1 participant