-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not zero-pad indices in BIDSPath entities anymore; do not warn about prepending missing dots for extensions #1215
Conversation
Hmmm it looks like this was by design that integers were excluded, there's even a test (failing in this case) that |
Given that it's the |
Also if you pass |
I agree:
|
Nice, okay, I thought I was alone in thinking this was a bit silly and that everyone else labeled their subjects this way for a minute. If the consensus is that this is that the labels should be less constrained, I'm happy to add that to the PR. |
@alexrockhill We better double-check check the BIDS specs before we proceed here |
Note that in BIDS runs are indices (i.e must be a digit) and subjects, sessions (and most other entities) are labels and can be alphanumeric. https://bids-specification.readthedocs.io/en/latest/appendices/entities.html#sub https://bids-specification.readthedocs.io/en/latest/appendices/entities.html#run This partly explains why you can assume: run-01 --> 1 but not sub-01 --> 1 because you could also have sub-1 Hope this helps a bit. |
I think @Remi-Gau is on the right track. It's why we disabled integers so that users are explicit and don't mix |
@alexrockhill I think if you pass strings, no alterations will take place. So this would be the easiest workaround I suppose? |
I understand the principle but I don't understand the use case. Who labels their subjects My preference would be to allow integers and cast them to strings since The |
I'd actually go further if it was entirely my design choice and I would match That would also mean Lastly, I'd add we could store them all per BIDS recommendations with the two digit zero fill which I wouldn't mind at all. i.e. If you pass The only issue is that it wouldn't work for datasets with |
I went to see how it would be implementing and saw this
To me, this is crazy. Why are you raising an error on a user who passes |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1215 +/- ##
==========================================
- Coverage 97.61% 97.60% -0.02%
==========================================
Files 40 40
Lines 8685 8682 -3
==========================================
- Hits 8478 8474 -4
- Misses 207 208 +1 ☔ View full report in Codecov by Sentry. |
Just wanted to add, I'm not trying to force this PR through, this is just my vision and addresses what I perceive to be pain points in the use case. Happy to discuss or close if people don't want to discuss and to just keep things how they are. |
I agree our API is hard to use. |
I was just trying to explain what might have been a possible reason. It's important not to cause a regression bug. I'll let you figure out the best path forward with @hoechenberger ! |
Allowing different ways to be valid means adding more magic to the code. You need to multiply test to support 1 or str 01 and also tests of if both are subjects etc. You create more corner cases in the code and complexity is exponential with the number of options. Any bad default you point out was maybe a natural default for someone with a different prior and this person did not complain here. I am not trying to force the status quo here but be mindful of these arguments.
|
Totally, I asked why it was that way so the explanation is very helpful! I think this would be backward compatible except if someone upgrades in the middle of converting a dataset which they aren't supposed to do. It will silently return |
@agramfort agreed but there are only two cases so it seems manageable. And we already do this for |
I'm not decided whether I want to change anything :) I just wanted to say that I think our API is not very easy to use :) |
I agree that it's important to keep in mind what @agramfort said here: #1215 (comment) If it were just me, I'd maybe improve the documentation about this, but I wouldn't start adding more "magic conversion" behind the scenes. The present behavior has been in place for a very long time and we rarely get issues about it (we did however, admittedly, get one about it not too long ago: https://mne.discourse.group/t/remove-automatic-zero-padding-when-specifing-run-in-bidspath/7865/4?u=sappelhoff). |
Also we kept on removing "magic" over the past years, because it would cause all sorts of problems |
My point was just that it makes sense to be consistent and have this magic everywhere ( Again, per the discourse use case comment/issue, people try and pass So, if it's about removing magic, then maybe just a straight cast to string for integers with no modification. I think people are going to pass integers though and throwing them an error doesn't seem helpful since the information is fundamentally there. EDIT: I think no zero-padding is a totally reasonable solution and we can adjust |
So now all the magic is removed from To do (in follow-up PRs):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this PR break any working code? I see you touching tests. Can you see if it breaks eg the mne-bids-pipeline by running the tests locally for on this branch?
I'm getting a bunch of
|
Could someone with a working version of |
An easy way if you don't want to set it up yourself would be to open a PR to mne-bids-pipeline where you change https://github.com/mne-tools/mne-bids-pipeline/blob/main/pyproject.toml#L49 Something like |
Looks like it doesn't break anything in |
LGTM? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I said in #1215 (comment), I am not convinced that we should make this change. So @hoechenberger or @agramfort should perhaps make a verdict on this, as they seemed more open for it.
Unfortunately I currently don't have the time to really think through the implications of this change, so I'd like to withhold my vote for now |
Oh I was under the impression we talked it through and came to an intermediate compromise everyone was happy with. I.e. remove magic zero padding for all fields (it was on run before) and allow int for all fields but just cast it to string. That includes no magic and changes current behavior in a small way: i.e. run=1 just goes to run='1' not run='01'. |
Also worth noting, I'm really not opinionated as to whether |
I'll take some time and think through this within the next days, sorry for causing some frustration here. |
No worries, take your time, not frustrated by you, just a bit annoyed by how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have gone over the proposed changes and adjusted the changelog and code as such:
- we will no longer zero-pad indices ... so run=1 will be run=1 instead of run=01
- we will now silently auto-prepend missing dots from extensions
I decided against accepting "int" for all possible inputs to BIDSPath. Having looked at it, we have a policy of accepting ints for BIDS entity-<index>
entities, and strs for BIDS entity-<label>
entities -- and I think that's clear and reasonable, and I would like to keep it as such.
The present changes solve some issues (like documented in: https://mne.discourse.group/t/remove-automatic-zero-padding-when-specifing-run-in-bidspath/7865?u=sappelhoff) without causing too much of a code base disruption.
This is good to go from my side as soon as tests pass. Please feel free to merge @hoechenberger
Thanks for getting back @sappelhoff , the changes look good and fix most of the pain points I was running into. I understand the index/label distinction and the motivation between trying to have nice categories. However, in practice, I have observed that even for labels, people tend to use integers more than not (e.g. I'll defer to your judgment though, thanks for discussing. Maybe it's an issue better taken upstream in the BIDS specification that that there even be this distinction in the first place since I feel it's a bit more aspirational than actually adopted, if I really care that much. |
that's true, and that's also okay, because labels may be "alphanumeric" ... whereas indices are "non-negative integers". (https://bids-specification.readthedocs.io/en/stable/common-principles.html#definitions) You are right that people use Gonna merge now! :-) Thanks for the discussion, I think the changes here improve the user experience! |
Thanks everyone!! |
Sure, I think that's a good place to leave the discussion for later but I'll just add a last comment that because task=1 makes no sense and is never used and sub=1 is the most common use case (note 1 not '01' from what I've seen from the community at large, BIDS core devs tend to use '01' which I think is a bit of a disconnect with the broader community) so it doesn't make sense to me that those are the same category although I agree alphanumeric is a fitting umbrella category, I wouldn't call sub a label in the way that task is since sub tends to be an index in practice. Which is to say it's a totally understandable and workable difference in opinions and maybe I'll just save this thread for when this viewpoint wins out in 5 years :) |
PR Description
I often have subject ids like
1
and2
and I don't always store them as strings. I think it would be nice ifmne_bids
just handled this to allow for integer subject ids, runs, sessions, acquisitions etc.Merge checklist
Maintainer, please confirm the following before merging.
If applicable: