Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added pathlib support to datasets/utils.py #8200

Merged
merged 15 commits into from
Jan 16, 2024

Conversation

ahmadsharif1
Copy link
Contributor

More work related to #8120 -- this time for all of datasets/utils.py.

Copy link

pytorch-bot bot commented Jan 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8200

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 09119e3 with merge base a00a72b (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@ahmadsharif1 ahmadsharif1 requested review from pmeier and NicolasHug and removed request for pmeier January 9, 2024 17:02
from_path: Union[str, pathlib.Path],
to_path: Optional[Union[str, pathlib.Path]] = None,
remove_finished: bool = False,
) -> pathlib.Path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, forcing the return type to be a Path when it previously was a str is a BC-breaking change. It doesn't really matter for _decompress because it's private (leading underscore), but it's an issue here with extract_archive which is technically public.

(Side note: we didn't intend for any of the stuff in datasets/utils.py to ever be public, but that's another story).

We have 2 options here:

  • always return a str. This is sub-ideal but simpler. And since we never intended those utils to be public anyway, it's kinda OK - we should deprecate them and make them private eventually, but that's for another day.
  • always return the same type as from_path. This is the best UX, and requires a bit more logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I have done the second option (though technically speaking that also changes behavior from the public API).

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahmadsharif1 , one minor comment below, I'll let you decide before merging

torchvision/datasets/utils.py Show resolved Hide resolved
@adamjstewart adamjstewart mentioned this pull request Feb 27, 2024
facebook-github-bot pushed a commit that referenced this pull request Mar 19, 2024
Reviewed By: vmoens

Differential Revision: D55062771

fbshipit-source-id: a336ddeb8c1984fdb857f7b4583b0b464999a9dc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants