Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose on Hub the public methods of the registration API #6364

Merged
merged 4 commits into from
Aug 8, 2022

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Aug 3, 2022

Expose the new get_model, get_model_weights and list_models functions to hubconf.py so that they can be used via TorchHub.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about exposing get_model() or list_model() considering some of these models aren't actually compatible with torchhub.

Torchhub already has hub.list() which will list the available (and compatible) models. Similarly get_model is somewhat redundant with hub.load().

@NicolasHug
Copy link
Member

NicolasHug commented Aug 3, 2022

I'm also not keen on exposing Beta APIs through torchhub, considering there is no programmatic way for users to know about that classification status.

(EDIT: sorry I misclicked, didn't mean to close)

@NicolasHug NicolasHug closed this Aug 3, 2022
@NicolasHug NicolasHug reopened this Aug 3, 2022
@datumbox
Copy link
Contributor Author

datumbox commented Aug 3, 2022

Thanks for the input. I think there are lots of different ideas here:

  • get_model() and list_model() are indeed quite similar to the native hub.list() and hub.load(), so I agree it's potentially less interesting.
  • Beta APIs are already exposed on TorchHub for years, so this shouldn't be a blocker for this. For example all models exposed under torchvision.models.segmentation are in beta.

I've updated the PR to remove get_model() and list_model() but kept get_model_weights as it can be quite useful for listing all the available weights of a specific model. For example it will allow the users to fetch all available weights programatically, something that the existing get_weight() which is already exposed doesn't allow to do:

import torch

weight_enum = torch.hub.load("pytorch/vision", "get_model_weights", name="resnet50")
print([weight for weight in weight_enum])

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beta APIs are already exposed on TorchHub for years, so this shouldn't be a blocker for this. For example all models exposed under torchvision.models.segmentation are in beta.

I agree that the original goal was to release torchvision.models.segmentation as Beta. But very little was done to properly raise awareness about the status of these APIs; this is true for models.detection as well. We have always treated them as a stable API so far, with very limited BC breaking changes. So I'm not sure this is directly comparable to the current situation.

@@ -115,7 +115,7 @@ def get_weight(name: str) -> WeightsEnum:
W = TypeVar("W", bound=WeightsEnum)


def get_model_weights(model: Union[Callable, str]) -> W:
def get_model_weights(name: Union[Callable, str]) -> W:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what the reason is to change this? name is less accurate to describe this parameter IMO, since it can also be a callable / model builder. For example

from torchvision.models import resnet50
weights = get_model_weights(name=resnet50)

looks unnatural.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The load() method already a has a model parameter. Keeping it as model leads to TypeError:

TypeError: load() got multiple values for argument 'model'

Given that the specific parameter can be either string or callable, I think it's OK. If you have a better name, let me know.

@datumbox
Copy link
Contributor Author

datumbox commented Aug 3, 2022

@NicolasHug I need the specific method exposed on TorchHub to offer the functionality of listing the available weights directly to the users. Currently I'm working on a blogpost that covers all these details and my intention is to collect quickly feedback from the community and promote if all is good.

@NicolasHug
Copy link
Member

NicolasHug commented Aug 3, 2022

I'm concerned about breaking users code without them being notified in advance that this may happen.

Thinking about this more though: even if get_model_weights() was stable, we'd still be able to change it before the next release. And this may still impact users because by default torchhub will download the main branch, not a stable release. So in that regard, whether it's stable or beta doesn't make much difference for hub users, UNTIL the next release. Things change once the next release is published though, because at this point we would be committed to BC for stable APIs.

As a result, I'm OK to merge this PR as long as you commit to removing get_model_weights() from this file before the next release, if it's still in Beta stage by then.

@datumbox
Copy link
Contributor Author

datumbox commented Aug 3, 2022

My plan is to write a blogpost that exposes the users to the new API and get feedback. I also plan to reach out to downstream frameworks and get them to adopt it or get their thoughts. So this is the reason why I don't want to rush something to Stable yet. Of course, if the feedback is positive and no major issues are highlighted I would love to promote it to Stable. But this is something to be seen as there are still rough edges (see @nairbv's input on naming at #6330 (comment) or the necessary parameter name change discussed at #6364 (comment)). I agree we have enough time to test it and that's why I'm introducing it as early as possible to allow time for testing.

Concerning committing to remove the get_model_weights() prior the release, that's unfortunately something that I can't agree upfront. It depends on the nature of changes we intend to make, the feedback we received and the rate of adoption of the API. This is a decision to be made when we are close to the release and after having the feedback of the users at hand; it's not a commitment I'm willing to make upfront without any information. We should cross that bridge when we get there, I hope that is reasonable.

@jdsgomes
Copy link
Contributor

jdsgomes commented Aug 3, 2022

I'm concerned about breaking users code without them being notified in advance that this may happen.

Thinking about this more though: even if get_model_weights() was stable, we'd still be able to change it before the next release. And this may still impact users because by default torchhub will download the main branch, not a stable release. So in that regard, whether it's stable or beta doesn't make much difference for hub users, UNTIL the next release. Things change once the next release is published though, because at this point we would be committed to BC for stable APIs.

As a result, I'm OK to merge this PR as long as you commit to removing get_model_weights() from this file before the next release, if it's still in Beta stage by then.

Just my two cents. If torchhub relies on main by definition our API stability commitments need to be more relaxed as you pointed out. But I would go even further and say that this applies even after the release. In that case I see that given community feedback this can change (or not) before the next release, but at the same time if the API is not stable by then I dont see the need for a strong commitment in removing it. I might have miss-interpreted the problem. happy to hear your thoughts.

@NicolasHug
Copy link
Member

NicolasHug commented Aug 3, 2022

My plan is to write a blogpost that exposes the users to the new API and get feedback. I also plan to reach out to downstream frameworks and get them to adopt it or get their thoughts. So this is the reason why I don't want to rush something to Stable yet. Of course, if the feedback is positive and no major issues are highlighted I would love to promote it to Stable. But this is something to be seen as there are still rough edges (see @nairbv's input on naming at #6330 (comment) or the necessary parameter name change discussed at #6364 (comment)). I agree we have enough time to test it and that's why I'm introducing it as early as possible to allow time for testing.

I'm sorry, but I do not wish to re-hash this discussion. You are aware of my position about all these points, and still you decided to disregard it. It's not immediately relevant for this PR, so I'd rather not go back to this.

@jdsgomes sorry I'm not sure I completely understand your position, and why you believe having this in torchhub as beta after a release is the same as before the release? In my mind, BC breaking changes prior to a stable release are expected, but BC breaking changes after a release are not.

@jdsgomes
Copy link
Contributor

jdsgomes commented Aug 3, 2022

My plan is to write a blogpost that exposes the users to the new API and get feedback. I also plan to reach out to downstream frameworks and get them to adopt it or get their thoughts. So this is the reason why I don't want to rush something to Stable yet. Of course, if the feedback is positive and no major issues are highlighted I would love to promote it to Stable. But this is something to be seen as there are still rough edges (see @nairbv's input on naming at #6330 (comment) or the necessary parameter name change discussed at #6364 (comment)). I agree we have enough time to test it and that's why I'm introducing it as early as possible to allow time for testing.

I'm sorry, but I do not wish to re-hash this discussion. You are aware of my position about all these points, and still you decided to disregard it. It's not immediately relevant for this PR, so I'd rather not go back to this.

@jdsgomes sorry I'm not sure I completely understand your position, and why you believe having this in torchhub as beta after a release is the same as before the release? In my mind, BC breaking changes prior to a stable release are expected, but BC breaking changes after a release are not.

I agree that BC changes after release are not expected (for the entire library). I just think that the API exposed to torchhub is a special case since torch does not not depend on a specific version but on the main.

@NicolasHug
Copy link
Member

It's a bit special indeed. If we were to be super restrictive, we would only add stable APIs in hubconf.py very close to a release, when we're sure that we won't change them. This is a bit restrictive though, I wouldn't go that far (but I wouldn't fight against it either).

The next best thing we can do to prevent breaking users code without their consent, is to not release beta APIs in hub. This is my preferred way. But from my rambling in #6364 (comment), I'm forced to admit out of intellectual honesty that what we already do is akin to including an API as Beta temporarily and promoting it to stable right before a release.

So, in order to continue honouring the strong BC contract we've always had with users, I'm OK with merging the PR if we either:

  • promote get_model_weights() to stable before the release
  • remove get_model_weights() from hubconf.py before the release (I would much prefer not to, tough)
  • Add a remove-able warning indicating that get_model_weights() is beta and subject to change

@datumbox
Copy link
Contributor Author

datumbox commented Aug 3, 2022

I think it's important to remember that these discussions happen in public. I am not happy to make a commitment before hand on how we will progress with an API prior getting input from the community. I also find troublesome to agree with merging the PR but conditioning the approval to a public commitment on specific directions. As I mentioned earlier, I want to keep the options open and act based on the feedback of the users. When I approach downstream libraries/frameworks to get their input and ask them to put the work to adopt the API, I need to be honest with them and willing to make drastic change to the API if that's what they need. That does not align with making me committing to one direction or another at this point. What I propose is communicate this API as Beta, listen to the feedback from the users and release as stable when we are in good position to avoid BC breaks.

@NicolasHug
Copy link
Member

I also find troublesome to agree with merging the PR but conditioning the approval to a public commitment on specific directions

I'm sure there's a misunderstanding that we'll be able to clarify eventually, but I'm surprised by this because this is something we've always been doing, implicitly or explicitly. Implicitly because it's always understood that we can merge partial PRs of a more global work as long as we commit to finishing that work prior to a release (e.g. when submitting different PRs for model arch, model tests, model weights, etc.). And explicitly e.g. in this past interaction between the both of us, where we publicly acknowledged that we would have to revert something if we couldn't align on some (user-facing) decision prior to a release: #5001 (comment) and #5001 (comment)

Re: getting feedback from community and downstream users. I think this is great and you've proven to do an amazing job at this with the multi-weight API. In this instance, this is perfectly achievable without keeping these APIs as Beta. Or, more relevant to this PR: without publishing get_model_weights() to torchhub.

@jdsgomes
Copy link
Contributor

jdsgomes commented Aug 4, 2022

How can we resolve this?

To summarise my opinion, regarding this PR only (I think will help to focus on this particular PR to narrow down the discussion): we are exposing a beta API via torchhub. I think this is ok since torchhub depends on the main branch, and not on a stable release. Therefore I am happy to approve. I am not against the idea of adding a warning, but wouldn't block the PR for that as we don't do this in other beta APIs.

@datumbox
Copy link
Contributor Author

datumbox commented Aug 4, 2022

Right now this is soft-blocking me for progressing my work. I am not sure why this PR is not approved already given that there are no technical concerns. I feel very uncomfortable that I'm being asked to commit to any course of action I believe is incorrect without the feedback of the community and I believe this is a very untimely discussion. Adding warnings, discussing changes of policies over what constitutes Beta and if going forwards beta methods can be exposed via TorchHub are not discussions that should be done in this PR nor block it. I think it's worth reviewing our code review processes; let's do this offline.

@NicolasHug
Copy link
Member

This PR, like its parent #6333, may seem anecdotal from an outside point of view. But I do believe that if they are to be merged as-is 1, they would set a milestone and a precedent for the future direction of this project. In particular when it comes to our policy regarding breaking changes.

The pytorch ecosystem has a long-standing history of preventing any kind of BC-breaking change. This is part of what made it successful, and very senior engineers have strongly advocated for essentially never doing any kind of BC breakage.

Personally, I do believe that breaking changes are necessary for a project to move forward, and avoid crippling technical debt - which started being an issue in torchvision. This is why, ever since I joined this project, I've been advocating for a clear deprecation policy. Eventually, pytorch core did most of that work for us, and @datumbox and I pushed to adopt the same deprecation policy in torchvision (see our previous release notes).

While breaking changes are necessary, they're always inconvenient for users though, in the short term at least. This is why we need BC breaks to always be expected and as unsurprising as possible.

When we release something as Beta, it's often because we expect to make some BC-breaking changes in the future (not just, but this is what is relevant in our case here). The problem is: there's no reasonable way for users to know which APIs are Beta. Up until recently, we didn't even document this at all. I was the one pushing for documenting this (with #6165 and various internal posts), but I still believe that documenting Beta APIs is still not even the bare minimum. We need programmatic ways of letting users know that their code might break, otherwise we're not fulfilling our contract of least-surprising behaviour principle. This has been a long-standing feature request from pytorch core BTW, so it's not just my opinion pytorch/pytorch#72317.

For these reasons, I believe we should mark APIs as Beta with extreme sparsity. It should only be used in well-defined, exceptional cases. Otherwise, we would be opening the door to frequent, unexpected breaking changes, which directly contradicts the original (and current!) philosophy of the torch ecosystem 2.

In the specific case of #6333, I do not believe that it qualifies as Beta: there are no clear expected BC-breakage, and the potential changes that were identified already can be done either before the next release, or while preserving BC. #6333 sets the precedent that we're OK to carry-out breaking changes without deprecation and without a strong obvious reason. Because it has strong implications on the future of this project, this is something I believe would deserve a concerted discussion with the whole team.

For this specific PR now: I'm just trying to prevent the bleeding. I don't believe releasing these as Beta is a good thing, and releasing it on torchhub makes it worse because torchhub users have even less of a chance to be aware that these APIs are Beta (because the probability that they'll check the docs is even lower, considering the point of torchhub is to avoid users the hassle of installing the package).

I hope it's clear that this PR and #6333 go far, far beyond any sort of technical concern we may have. They're about the project's future regarding policies, and mitigating users expectations w.r.t. breaking changes.

I'm happy to discuss this and code-review process more offline.

Footnotes

  1. https://github.com/pytorch/vision/pull/6333 has already been merged

  2. I also understand that the constraints we operate in can still be problematic sometimes, especially for us in the domains where we also have to address some internal needs. I can see this causing extra stress for engineers especially project leads, and I've been raising awareness about this issue internally.

@datumbox
Copy link
Contributor Author

datumbox commented Aug 4, 2022

@NicolasHug I'll save you from the lecture on the importance of BC and having a clear deprecation policy. You were here when we both advocated for that and we took that journey together. The discussion you want to have is interesting and worth having. We should call a meeting, discuss pros/cons and consider adopting it.

What's wrong here is soft-blocking a PR with sound features and improvements on main because it might bias the discussion for or against the policy you would like to introduce. What you should do is approve the PR and we can discuss those matters separately on our weekly meetings.

@NicolasHug
Copy link
Member

What's wrong here is soft-blocking a PR with sound features and improvements on main because it might bias the discussion for or against the policy you would like to introduce

Sorry, I don't really understand the meaning this sentence. It seems like you believe that I have a hidden agenda, or that there's a new policy that I want to introduce: I don't. I'm just trying to prevent this project from being too liberal about breaking changes. I hope it's clear that there's nothing personal here.

I'll save you from the lecture on the importance of BC and having a clear deprecation policy

My comment above wasn't just meant for you, so I apologise if it sounded patronizing. It was also for @jdsgomes with whom I had chatted briefly internally, for clarifying my position to @nairbv, and to provide more context to whoever might be reading this discussion.

@datumbox
Copy link
Contributor Author

datumbox commented Aug 4, 2022

Sorry, I don't really understand the meaning this sentence.

Always assume positive intense if there is lack of clarity. I'm not accusing you of having a hidden agenda. We have already discussed previously this proposal of yours of adding warnings in all the Beta APIs. You have a draft PR at #6173.

My comment above wasn't just meant for you, so I apologise if it sounded patronizing.

Nothing to apologise. My point was that I understand the importance of BC and of having a deprecation policy and we both advocated and pushed for that previously.

My point again for all this situation is that the concerns are valid and if there are policies that should be adopted, we should discuss it. But this doesn't justify blocking work that aligns with existing practices. This PR doesn't violate any of existing rules. If our current policies are not good enough, we should discuss that but it also sets a very bad precedence blocking ongoing work without having agreed new policies.

@nairbv
Copy link
Contributor

nairbv commented Aug 4, 2022

Let's discuss this offline and try to reach consensus internally on how liberally we want to use our beta classification, so we don't confuse users, and can communicate this clearly.

For reference, the classification is documented here for TorchVision https://pytorch.org/vision/main/, or equivalent description in PyTorch here https://pytorch.org/blog/pytorch-feature-classification-changes/.

I'm not sure if the discussion blocks this particular PR, but regardless of if it does or not, the question seems to be broader than this specific change.

I'm confident there's some solution we can come up with that would satisfy all stakeholders -- to keep things moving, not appear to over commit, nor undercommit on the level of stability we communicate to users, with tradeoffs for what we put in nightlies and in the release and in torchhub.

@datumbox
Copy link
Contributor Author

datumbox commented Aug 4, 2022

I think there are 2 points we discuss here. Should this feature be Beta and should we expose the Beta features via TorchHub.

Why this feature is Beta?

From the 2 references provided by @nairbv:

[PyTorch policy] In the case of a Beta level features, the value add, similar to a Stable feature, has been proven and the feature generally works and is documented. This feature is tagged as Beta because the API may change based on user feedback, because the performance needs to improve or because coverage across operators is not yet complete.

[TorchVision policy] Features are tagged as Beta because the API may change based on user feedback, because the performance needs to improve, or because coverage across operators is not yet complete. For Beta features, we are committing to seeing the feature through to the Stable classification. We are not, however, committing to backwards compatibility.

The new Model Registration API is believed to be stable enough and works relatively well but the API can change depending on user feedback. Some rough edges exist and that's why it's marked as Beta at #6333. This state needs to be communicated to our users and aligns with our current process. As mentioned earlier, the intention is to collect feedback ASAP make any necessary changes and try to release it in v0.14.

Exposing it via TorchHub

The default behaviour of TorchHub is to use the main branch which contains in-development features. As @jdsgomes put it:

we are exposing a beta API via torchhub. I think this is ok since torchhub depends on the main branch, and not on a stable release.

In addition to that, exposing Beta APIs via TorchHub is an established process:

vision/hubconf.py

Lines 56 to 63 in 6ca9c76

from torchvision.models.segmentation import (
deeplabv3_mobilenet_v3_large,
deeplabv3_resnet101,
deeplabv3_resnet50,
fcn_resnet101,
fcn_resnet50,
lraspp_mobilenet_v3_large,
)

From the above, I believe it's unreasonable to block this PR as it perfectly aligns with existing policies of TorchVision and PyTorch. The decision of how this feature should be released is premature and should be discussed once we collect feedback. I also feel that the precedence of blocking PRs that align with already established processes is extremely dangerous. We need to hold ourselves to specific standards and agreements already made; if we think those standards are not high enough we should revise the policies but shouldn't block on-going work prior agreeing to such policy update.

Copy link
Contributor

@jdsgomes jdsgomes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Just as a note: this PR started a very import discussion on what should be in Beta, what should be exposed in torchub, etc. We will follow up on these discussions as they are very important in shaping the way TV develops.

@datumbox
Copy link
Contributor Author

datumbox commented Aug 8, 2022

Agreed. There are definitely valid points raised by @NicolasHug that we should discuss and figure out. Let's keep this moving and stay committed to make our policies crisper.

Thanks everyone for their input.

@datumbox datumbox merged commit c72b284 into pytorch:main Aug 8, 2022
@datumbox datumbox deleted the hub/expose_functions branch August 8, 2022 15:01
@github-actions
Copy link

github-actions bot commented Aug 8, 2022

Hey @datumbox!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

@yil8
Copy link

yil8 commented Aug 16, 2022

Can't use torch.hub.load from torchvision==0.13.0, since hubconf.py from main branch is doing

from torchvision.models import get_model_weights, get_weight

which is different from torchvision==0.13.0 hubconf.py.

Error:

    model = torch.hub.load("pytorch/vision", self.model_name, **self.kwargs)
/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/hub.py:540: in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/hub.py:566: in _load_local
    hub_module = _import_module(MODULE_HUBCONF, hubconf_path)
/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/hub.py:89: in _import_module
    spec.loader.exec_module(module)
<frozen importlib._bootstrap_external>:850: in exec_module
    ???
<frozen importlib._bootstrap>:228: in _call_with_frames_removed
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    dependencies = ["torch"]
    
>   from torchvision.models import get_model_weights, get_weight
E   ImportError: cannot import name 'get_model_weights' from 'torchvision.models' (/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torchvision/models/__init__.py)

facebook-github-bot pushed a commit that referenced this pull request Aug 23, 2022
)

Summary:
* Expose on Hub the public methods of the registration API

* Limit methods and update docs.

Reviewed By: datumbox

Differential Revision: D38824222

fbshipit-source-id: 33e6980010457d10995fff6e915ced52bfdb0a81
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants