Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathlib support #8120

Closed
adamjstewart opened this issue Nov 18, 2023 · 11 comments
Closed

Pathlib support #8120

adamjstewart opened this issue Nov 18, 2023 · 11 comments

Comments

@adamjstewart
Copy link
Contributor

adamjstewart commented Nov 18, 2023

🚀 The feature

Add support for pathlib.Path in addition to str for any functions taking files or directories as input.

According to PEP-519, the type hints would look like:

Union[str, bytes, os.PathLike[str], os.PathLike[bytes]]

Motivation, pitch

TorchGeo is planning on adding support for this. However, we use a lot of utility functions like download_url that only accept str at the moment.

This was also requested in #7721 for a more limited scope, so users are clearly trying to use this.

It seems that most of the code in torchvision/prototype already uses pathlib? Maybe it's just a matter of waiting until those prototypes become the default. But I don't see any progress yet for porting the utility functions we use.

Alternatives

Our workaround for now is to convert all Paths to strings before calling download_url, but that kind of defeats the purpose of supporting Paths if we always immediately convert them to strings.

Additional context

Cross linking to microsoft/torchgeo#1616 and microsoft/torchgeo#1731

@pioneerHitesh

P.S. We would be willing to contribute this for the utilities we use in TorchGeo, but probably don't have the time/energy to contribute this for all of torchvision. Let me know what you think.

cc @pmeier

@NicolasHug
Copy link
Member

Thanks for the feature request @adamjstewart . This sounds reasonable, we could add support for pathlib.Path where it matters.

we use a lot of utility functions like download_url

Just a heads up on this: while our download utils are effectively public, we really don't recommend that you rely on them in your library: those were introduced a long time ago as public but they should have been private from the beginning. We might deprecate and make them private in the future, so please migrate away from them if you can.

I acknowledge that there's a lack of canonical download utils in pytorch in general, and this has led to each domain library implementing their own solution for this... But torchvision isn't the place for such a util and our stuff shouldn't be considered a reference. (There are related discussions in pytorch/pytorch#68320 (comment)).

@adamjstewart
Copy link
Contributor Author

Yeah, I liked the idea behind a shared resource like torchdata for download/extraction, but it seems like torchdata is dead. We might just add a new dependency on a non-torch library if we need it. We need support for a lot of weird extraction utilities like RAR and zip deflate so I'd like to avoid maintaining that code.

Let me poke around and see what I can find. If there isn't anything, maybe I'll make my own library.

@adamjstewart
Copy link
Contributor Author

I found:

Still looking for a good checksum tool. But it's a shame to have to add 2–3 new deps that I also need to track.

@mlkimmins
Copy link

mlkimmins commented Feb 26, 2024

Hi~ New to wanting to contribute to pytorch, I was working on my project and found myself wanting this feature too and found this issue. Seems relatively straightforward. If so inclined, @adamjstewart do you mind sharing what you had in mind after your Nov 2023 comment? Maybe I can take a stab at it if this isn't being actively worked on.

@adamjstewart
Copy link
Contributor Author

@NicolasHug is this already implemented by #8196 and #8200?

@mlkimmins
Copy link

mlkimmins commented Feb 27, 2024

oh fancy. so i'll see it soon in new versions :D should we close this one?
-edit: ok so this is at least in download utils, guess the next is seeing if we can push this into other functions that may use paths

@adamjstewart
Copy link
Contributor Author

FWIW, I personally only care about the download utils. So happy to close this if it's complete, or happy to leave it open if @NicolasHug wants to track pathlib adoption for the entire repo.

@NicolasHug
Copy link
Member

@mlkimmins thank you for your interest.

The main places where we still need to add Pathlib support are:

  • the public io stuff in particular read_file, read_image, write_*. You can leave out the video stuff though.
  • All of the datasets' root parameters.

Feel free to submit a PR for any of these! Preferably in small-ish chunks of 3-4 APIs at a time, to make it easier to review. Thank you!

@NicolasHug
Copy link
Member

hi @mlkimmins , I'm curious if you had a chance to submit a PR for this yet? If not, I might have to take care of it myself if you don't mind: our branch cut is on Monday and we'd need to address this before then so that it can make it into the next release

@mlkimmins
Copy link

Hi @NicolasHug ! I didn't see the previous message 2 wks ago, so no code, so you should go ahead and do it!

@NicolasHug
Copy link
Member

Closed by #8314 and #8120, thanks for opening the issue @adamjstewart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants