Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not zero-pad indices in BIDSPath entities anymore; do not warn about prepending missing dots for extensions #1215

Merged
merged 14 commits into from
Feb 22, 2024

Conversation

alexrockhill
Copy link
Contributor

@alexrockhill alexrockhill commented Jan 10, 2024

PR Description

I often have subject ids like 1 and 2 and I don't always store them as strings. I think it would be nice if mne_bids just handled this to allow for integer subject ids, runs, sessions, acquisitions etc.

Merge checklist

Maintainer, please confirm the following before merging.
If applicable:

  • All comments are resolved
  • This is not your own PR
  • All CIs are happy
  • PR title starts with [MRG]
  • whats_new.rst is updated
  • New contributors have been added to CITATION.cff
  • PR description includes phrase "closes <#issue-number>"

@alexrockhill
Copy link
Contributor Author

Hmmm it looks like this was by design that integers were excluded, there's even a test (failing in this case) that subject=1 fails. I find this a bit annoying because I don't understand the need for IDs like "01" on modern computing systems, it seems like a design choice that some but not all people make that can lead to Y2K phenomena if for instance you end up collecting more than 100 or 1000 subjects when you started with a "00" or "000" system. Is there a rationale for blocking users from doing this? To me it seems overly pedantic.

@alexrockhill
Copy link
Contributor Author

image

Given that it's the "01" way on the homepage of BIDS, I guess this is fairly used but I don't recall seeing an explanation of why you need to format them this way. With the key-value tags in BIDS, this seems superfluous and from an era when strings were hardcoded to a certain length which is a brittle design choice for long identifiers..

@alexrockhill
Copy link
Contributor Author

Also if you pass run=1 it automatically gets mapped to "01" which seems very odd and unexpected.

@hoechenberger
Copy link
Member

I agree:

  • we should accept integers and simply convert them to strings
  • we should not prepend zeros (zfill)
  • if a user wants to zfill, they can do this manually and pass the resulting string

@alexrockhill
Copy link
Contributor Author

Nice, okay, I thought I was alone in thinking this was a bit silly and that everyone else labeled their subjects this way for a minute. If the consensus is that this is that the labels should be less constrained, I'm happy to add that to the PR.

@hoechenberger
Copy link
Member

@alexrockhill We better double-check check the BIDS specs before we proceed here
Otherwise I'm +1

@alexrockhill
Copy link
Contributor Author

@Remi-Gau
Copy link

Note that in BIDS runs are indices (i.e must be a digit) and subjects, sessions (and most other entities) are labels and can be alphanumeric.

https://bids-specification.readthedocs.io/en/latest/appendices/entities.html#sub

https://bids-specification.readthedocs.io/en/latest/appendices/entities.html#run

This partly explains why you can assume:

run-01 --> 1

but not

sub-01 --> 1 because you could also have sub-1

Hope this helps a bit.

@jasmainak
Copy link
Member

I think @Remi-Gau is on the right track. It's why we disabled integers so that users are explicit and don't mix sub-01 and sub-1

@hoechenberger
Copy link
Member

@alexrockhill I think if you pass strings, no alterations will take place. So this would be the easiest workaround I suppose?

@alexrockhill
Copy link
Contributor Author

alexrockhill commented Jan 11, 2024

I think @Remi-Gau is on the right track. It's why we disabled integers so that users are explicit and don't mix sub-01 and sub-1

I understand the principle but I don't understand the use case. Who labels their subjects 1 and 01?

My preference would be to allow integers and cast them to strings since subject=1 and subject="1" seem like reasonable subject identifiers to me and would mean the same subject. Since it's just annoying in code to cast everything e.g. subject=str(subject) and so it seems convenient to do this within mne_bids to make it as usable as possible. In general, I will say the annoying thing about BIDS is that it can sometimes feel like death by a million paper cuts where when you try and use it you get caught up on a bunch of little things like this and each one alone is rather silly to complain about but the end result is that people think it's more of a pain than it is. For example, in this context, figuring out if each tag is an index or a label (i.e. accepts int or string), having to remember if extension=".edf" is right or extension="edf" is right or having to use extension and suffix even though if you pass datatype to read_raw_bids so the file type is able to be inferred (I know internally it's this way because BIDSPath isn't just used for read_raw_bids but also modified to find sidecars but that doesn't really matter from an end-user perspective). Anyway, the point being, this is one of several pain points that I think, especially for write_raw_bids and read_raw_bids is pretty mission-critical for adoption so that people don't run into an error that isn't one that is essential to ensuring that the dataset contains all the necessary information and give up. If something is only recommended by BIDS, I don't think you should throw an error in mne_bids. A warning maybe but an error is overly pedantic in my opinion.

The run one is just a bug, that seems pretty simple. You should either not be allowed to pass integers or run=1 should be cast to run="1" but definitely not run="01".

@alexrockhill
Copy link
Contributor Author

alexrockhill commented Jan 11, 2024

I'd actually go further if it was entirely my design choice and I would match subject=1 to subject="01" and only throw an error at run time if there are both subject="01" and subject="1". To me, it seems way more practical and common of a use case to have incrementing subject indices and I think you'd actually want an error when you try and make subject="1" and subject="01" that seems like a situation that should not be allowed, it's too confusing, so actually it would be better to throw the error during writing.

That would also mean run="01" or run="1" or run="1" would be irrelevant, they would be detected as incrementing indices and mapped to integers and tested for uniqueness. A bit of a reiteration, but I really think run="01" and run="1' both being allowed does not make a lot of sense; different experimenters might have these different opinions on zero filling and enter them differently in a free fill entry box and cause issues which seems like a very real use case.

Lastly, I'd add we could store them all per BIDS recommendations with the two digit zero fill which I wouldn't mind at all. i.e. If you pass subject=1 it gets stored as subject="01" but then when you go to read with subject=1 you still get your subject. I think that's actually very maintainable since there are two cases 1) alpha-numeric and 2) numeric only so in 1) you just leave it and in 2) everything gets cast to two digit zero padded. I really like that solution.

The only issue is that it wouldn't work for datasets with subject="01" and subject="1" but I think that's such an implausible case that as a user, I'd like an error for a dataset like that so I can check what is going on.

@alexrockhill
Copy link
Contributor Author

I went to see how it would be implementing and saw this

warn(
    f'extension should start with a period ".", but got: '
    f'"{extension}". Prepending "." to form: ".{extension}". '
    f"This will raise an exception starting with MNE-BIDS 0.12.",
    category=FutureWarning,
)
kwargs["extension"] = f".{extension}"
# Uncomment in 0.12, and remove above code:
#
# raise ValueError(
#     f'Extension must start wie a period ".", but got: '
#     f'{extension}'
# )

To me, this is crazy. Why are you raising an error on a user who passes extension="edf"? This almost seems antagonistic to a well-intentioned person who wants to use BIDS. I don't mean to be overly critical or criticize design choices on a personal level, mne_bids is great and a lot of other people have put some fantastic work into this, there's just a few choices like this that I really don't understand from the perspective of the basic approach. I thought the ethos of BIDS was to standardize the essential things and then encourage wide adoption by being flexible otherwise and that ethos would carry down to generally trying to be as not pedantic as possible.

Copy link

codecov bot commented Jan 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (87eea28) 97.61% compared to head (6c48c6d) 97.60%.
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1215      +/-   ##
==========================================
- Coverage   97.61%   97.60%   -0.02%     
==========================================
  Files          40       40              
  Lines        8685     8682       -3     
==========================================
- Hits         8478     8474       -4     
- Misses        207      208       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexrockhill
Copy link
Contributor Author

Just wanted to add, I'm not trying to force this PR through, this is just my vision and addresses what I perceive to be pain points in the use case. Happy to discuss or close if people don't want to discuss and to just keep things how they are.

@hoechenberger
Copy link
Member

I agree our API is hard to use.

@jasmainak
Copy link
Member

I was just trying to explain what might have been a possible reason. It's important not to cause a regression bug. I'll let you figure out the best path forward with @hoechenberger !

@agramfort
Copy link
Member

agramfort commented Jan 11, 2024 via email

@alexrockhill
Copy link
Contributor Author

Totally, I asked why it was that way so the explanation is very helpful!

I think this would be backward compatible except if someone upgrades in the middle of converting a dataset which they aren't supposed to do.

It will silently return subject="01" if you have a dataset with subject="01" and subject="1" which I think is reasonable, again I hope this isn't a use case. But otherwise, this PR just takes integers and casts them to the BIDS recommendation which is the best of both worlds, I think (i.e. I just say subject=1 but that gets translated to subject="01" per BIDS recommendation). This obviates the need for the user to even think about zero padding which I would argue they probably don't want to and just want things to work and give readable results.

@alexrockhill
Copy link
Contributor Author

alexrockhill commented Jan 11, 2024

@agramfort agreed but there are only two cases so it seems manageable.

And we already do this for run which causes unexpected results but if it were used everywhere it would be expected.

@hoechenberger
Copy link
Member

I'm not decided whether I want to change anything :) I just wanted to say that I think our API is not very easy to use :)

@sappelhoff
Copy link
Member

sappelhoff commented Jan 12, 2024

I agree that it's important to keep in mind what @agramfort said here: #1215 (comment)

If it were just me, I'd maybe improve the documentation about this, but I wouldn't start adding more "magic conversion" behind the scenes. The present behavior has been in place for a very long time and we rarely get issues about it (we did however, admittedly, get one about it not too long ago: https://mne.discourse.group/t/remove-automatic-zero-padding-when-specifing-run-in-bidspath/7865/4?u=sappelhoff).

@hoechenberger
Copy link
Member

Also we kept on removing "magic" over the past years, because it would cause all sorts of problems

@alexrockhill
Copy link
Contributor Author

alexrockhill commented Jan 12, 2024

My point was just that it makes sense to be consistent and have this magic everywhere (subject as well as run) if we want to enforce the recommendation or to have it nowhere.

Again, per the discourse use case comment/issue, people try and pass ints and by not handling it consistently, we're making them dive into these kinds of discussions zero padding etc. that are going to make people think BIDS is too much of a work-in-progress for them to use currently.

So, if it's about removing magic, then maybe just a straight cast to string for integers with no modification.

I think people are going to pass integers though and throwing them an error doesn't seem helpful since the information is fundamentally there.

EDIT: I think no zero-padding is a totally reasonable solution and we can adjust print_dir_tree etc. as well to order subjects, runs etc. more sensibly with s = int(s) if s.isdigit() else s.

@alexrockhill
Copy link
Contributor Author

alexrockhill commented Jan 12, 2024

So now all the magic is removed from run and split as well which solves the discourse issue but will be slightly backward incompatible because if someone has run=1 in their code, they'll have to change it to run="01" to access an already-converted dataset.

To do (in follow-up PRs):

  • Remove matches to sidecars in read_raw_bids so that it doesn't have to be over-specified
  • Order print_dir_tree / others ? more sensibly for non-zero-padded entries

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this PR break any working code? I see you touching tests. Can you see if it breaks eg the mne-bids-pipeline by running the tests locally for on this branch?

mne_bids/path.py Show resolved Hide resolved
@alexrockhill
Copy link
Contributor Author

alexrockhill commented Jan 23, 2024

I'm getting a bunch of pydantic import errors when I went to test mne_bids_pipeline...

Traceback (most recent call last):
  File "/home/alexrockhill/software/anaconda3/envs/mne/bin/pytest", line 8, in <module>
    sys.exit(console_main())
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/_pytest/config/__init__.py", line 190, in console_main
    code = main()
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/_pytest/config/__init__.py", line 148, in main
    config = _prepareconfig(args, plugins)
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/_pytest/config/__init__.py", line 329, in _prepareconfig
    config = pluginmanager.hook.pytest_cmdline_parse(
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/pluggy/_hooks.py", line 265, in __call__
    return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/pluggy/_manager.py", line 80, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/pluggy/_callers.py", line 55, in _multicall
    gen.send(outcome)
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/_pytest/helpconfig.py", line 103, in pytest_cmdline_parse
    config: Config = outcome.get_result()
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/pluggy/_result.py", line 60, in get_result
    raise ex[1].with_traceback(ex[2])
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/pluggy/_callers.py", line 39, in _multicall
    res = hook_impl.function(*args)
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1060, in pytest_cmdline_parse
    self.parse(args)
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1348, in parse
    self._preparse(args, addopts=addopts)
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1231, in _preparse
    self.pluginmanager.load_setuptools_entrypoints("pytest11")
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/pluggy/_manager.py", line 287, in load_setuptools_entrypoints
    plugin = ep.load()
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/importlib/metadata/__init__.py", line 171, in load
    module = import_module(match.group('module'))
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168, in exec_module
    exec(co, module.__dict__)
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/napari/utils/__init__.py", line 2, in <module>
    from napari.utils.colormaps import Colormap
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168, in exec_module
    exec(co, module.__dict__)
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/napari/utils/colormaps/__init__.py", line 2, in <module>
    from napari.utils.colormaps.colormap import Colormap
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168, in exec_module
    exec(co, module.__dict__)
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/napari/utils/colormaps/colormap.py", line 9, in <module>
    from napari.utils.events import EventedModel
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168, in exec_module
    exec(co, module.__dict__)
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/napari/utils/events/__init__.py", line 17, in <module>
    from napari.utils.events.evented_model import EventedModel
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168, in exec_module
    exec(co, module.__dict__)
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/napari/utils/events/evented_model.py", line 66, in <module>
    class EventedMetaclass(main.ModelMetaclass):
  File "/home/alexrockhill/software/anaconda3/envs/mne/lib/python3.10/site-packages/pydantic/_migration.py", line 306, in wrapper
    raise AttributeError(f'module {module!r} has no attribute {name!r}')
AttributeError: module 'pydantic.main' has no attribute 'ModelMetaclass'

@alexrockhill
Copy link
Contributor Author

Could someone with a working version of mne-bids-pipeline test this and confirm that it doesn't break anything please?

@larsoner
Copy link
Member

larsoner commented Feb 1, 2024

An easy way if you don't want to set it up yourself would be to open a PR to mne-bids-pipeline where you change pyproject.toml to tell it to install your branch of mne-bids:

https://github.com/mne-tools/mne-bids-pipeline/blob/main/pyproject.toml#L49

Something like mne-bids[full] @ git+https://github.com/alexrockhill/mne-bids@int rather than just mne-bids[full] should do it.

@alexrockhill
Copy link
Contributor Author

Looks like it doesn't break anything in mne-bids-pipeline so looks good to go possibly

@alexrockhill
Copy link
Contributor Author

LGTM?

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I said in #1215 (comment), I am not convinced that we should make this change. So @hoechenberger or @agramfort should perhaps make a verdict on this, as they seemed more open for it.

@hoechenberger
Copy link
Member

hoechenberger commented Feb 12, 2024

Unfortunately I currently don't have the time to really think through the implications of this change, so I'd like to withhold my vote for now

@alexrockhill
Copy link
Contributor Author

Oh I was under the impression we talked it through and came to an intermediate compromise everyone was happy with. I.e. remove magic zero padding for all fields (it was on run before) and allow int for all fields but just cast it to string. That includes no magic and changes current behavior in a small way: i.e. run=1 just goes to run='1' not run='01'.

@alexrockhill
Copy link
Contributor Author

alexrockhill commented Feb 12, 2024

Also worth noting, I'm really not opinionated as to whether subject/run=1 goes to subject/run="01" or subject/run='1', I think both are fine but I am opinionated that they be the same. Also, I made this PR because I find mne_bids quite annoying that it doesn't accept subject=1 (and extension='edf' not being allowed is also very annoying).

@sappelhoff
Copy link
Member

I'll take some time and think through this within the next days, sorry for causing some frustration here.

@alexrockhill
Copy link
Contributor Author

No worries, take your time, not frustrated by you, just a bit annoyed by how mne_bids works but that's what PRs are for!

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone over the proposed changes and adjusted the changelog and code as such:

  • we will no longer zero-pad indices ... so run=1 will be run=1 instead of run=01
  • we will now silently auto-prepend missing dots from extensions

I decided against accepting "int" for all possible inputs to BIDSPath. Having looked at it, we have a policy of accepting ints for BIDS entity-<index> entities, and strs for BIDS entity-<label> entities -- and I think that's clear and reasonable, and I would like to keep it as such.

The present changes solve some issues (like documented in: https://mne.discourse.group/t/remove-automatic-zero-padding-when-specifing-run-in-bidspath/7865?u=sappelhoff) without causing too much of a code base disruption.

This is good to go from my side as soon as tests pass. Please feel free to merge @hoechenberger

@sappelhoff sappelhoff changed the title [BUG] Allow int subject etc Do not zero-pad indices in BIDSPath entities anymore; do not warn about prepending missing dots for extensions Feb 22, 2024
@sappelhoff sappelhoff added this to the 0.15 milestone Feb 22, 2024
@alexrockhill
Copy link
Contributor Author

Thanks for getting back @sappelhoff , the changes look good and fix most of the pain points I was running into.

I understand the index/label distinction and the motivation between trying to have nice categories. However, in practice, I have observed that even for labels, people tend to use integers more than not (e.g. sub-1, ses-1). I feel like if people actually did use non-index labels, this wouldn't even be up for discussion because it would be very clear that of course you call your subject A, B, C etc. but that just isn't how the community has spontaneously organized, people picked indices.

I'll defer to your judgment though, thanks for discussing. Maybe it's an issue better taken upstream in the BIDS specification that that there even be this distinction in the first place since I feel it's a bit more aspirational than actually adopted, if I really care that much.

@sappelhoff
Copy link
Member

However, in practice, I have observed that even for labels, people tend to use integers more than not

that's true, and that's also okay, because labels may be "alphanumeric" ... whereas indices are "non-negative integers".

(https://bids-specification.readthedocs.io/en/stable/common-principles.html#definitions)

You are right that people use sub-<label> as sub-01 often. But I have rarely seen task-<label> to be task-01 --> I think we don't really have a problem here. We could make a big PR and add the option to have int for each "label-entity", but I classify this as a high effort, low impact task.

Gonna merge now! :-)

Thanks for the discussion, I think the changes here improve the user experience!

@sappelhoff sappelhoff merged commit e41b517 into mne-tools:main Feb 22, 2024
17 of 18 checks passed
@hoechenberger
Copy link
Member

Thanks everyone!!

@alexrockhill
Copy link
Contributor Author

Sure, I think that's a good place to leave the discussion for later but I'll just add a last comment that because task=1 makes no sense and is never used and sub=1 is the most common use case (note 1 not '01' from what I've seen from the community at large, BIDS core devs tend to use '01' which I think is a bit of a disconnect with the broader community) so it doesn't make sense to me that those are the same category although I agree alphanumeric is a fitting umbrella category, I wouldn't call sub a label in the way that task is since sub tends to be an index in practice. Which is to say it's a totally understandable and workable difference in opinions and maybe I'll just save this thread for when this viewpoint wins out in 5 years :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants