Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decision tree step incorrectly assumes a sorted component table #923

Closed
handwerkerd opened this issue Jan 5, 2023 · 2 comments · Fixed by #924
Closed

Decision tree step incorrectly assumes a sorted component table #923

handwerkerd opened this issue Jan 5, 2023 · 2 comments · Fixed by #924
Labels
bug issues describing a bug or error found in the project effort: low Theoretically less than a day's work impact: medium Improves code/documentation functionality for some users priority: medium Should get addressed soon

Comments

@handwerkerd
Copy link
Member

Summary

There is a step in the decision tree where the first X remaining components should be used for something where the components are sorted by variance. MEICA sorted components by variance but tedana does not. That means tedana is incorrectly using a semi-random subset of components instead of the highest variance components.

Additional Detail

These are the key lines

# Find components to ignore
# Ignore high variance explained, poor decision tree scored components
new_varex_lower = stats.scoreatpercentile(
comptable.loc[unclf[:num_acc_guess], "variance explained"], LOW_PERC
)

This will be fixed in #756 but we might want to also fix in the current code so that pre and post decision-tree- modularization results will perfectly match.

This is a bug, but it will not affect the denoised time series. The new_varex_lower threshold is used to decide if components are accepted or ignored With the current version, it's possible for more components than intended to end up ignored but they'll still be retained in the denoised time series.

Next Steps

  • Confirm this is a real bug
  • I'm fairly sure MEICA sorted components with the highest variance being component 0 so the highest variance components should be retained (the comment in the code might be wrong). Look back at some MEICA outputs to confirm this correct
  • Decide if we want to fix this in the current Main so that it will match the modularized output.
@handwerkerd handwerkerd added bug issues describing a bug or error found in the project priority: medium Should get addressed soon effort: low Theoretically less than a day's work impact: medium Improves code/documentation functionality for some users labels Jan 5, 2023
@tsalo
Copy link
Member

tsalo commented Jan 5, 2023

This sounded familiar, so I searched around and found #295. Do you know if that PR fixed the issue?

@handwerkerd
Copy link
Member Author

I did also remember having this issue before. Thank you for connecting these two. You solved the same issue in a different part of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issues describing a bug or error found in the project effort: low Theoretically less than a day's work impact: medium Improves code/documentation functionality for some users priority: medium Should get addressed soon
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants