-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow passing numpy.ndarray to SetFitHead.predict_proba #207
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jegork,
Thank you so much for fixing this issue. I left some suggestions and the core idea is moving the full converting part to predict_proba
. Just some suggestions, I'm happy to hear your feedback and discuss a better solution.
To add on to @blakechi's comments, I think this PR is a good place to discuss the Lines 308 to 314 in fa1021d
Because the Optionally, we may implement a Another related issue that I had noted is the type hinting for the
So, the current type hints should be Would love to hear your thoughts.
|
Now I'm a fan of @tomaarsen proposed solution! 😂 Great proposal! Ya, by doing so it's much cleaner and it makes more sense in term of integrating with Just add on a bit, we may also add a check to see which head we are using right now to determine it's It may depends on which side we prefer more, but considering the consistency of the outputs, we may want to always convert the outputs to one type (either And lastly, it will be great if you (@jegork) can fix the typing issue in this PR as @tomaarsen mentioned :) |
Hey @blakechi, @tomaarsen thanks for your input! Indeed seems like this is a more coherent solution. Do you want me to implement your idea as part of this PR? @blakechi, the typing issue that is mentioned appears in |
If we want to go with the approach of removing |
@tomaarsen I have implemented your suggestion, would be nice if you could take a look. However, it seems that the method signature that you suggested is incorrect (
I decided to keep it as is for now, if you think it is good to add also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for fixing these methods and their types @jegork and welcome to the 🤗 community!
The PR looks great as it is, and we can later include support for predicting on a single string if there's a strong request from the community.
I've left one nit and you'll need to merge the latest changes from main
to resolve the merge conflicts
Also thank you to @blakechi and @tomaarsen for fantastic feedback on this PR 🔥
Thanks for your reply! @lewtun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are failing for me locally. I get 8 errors like so:
FAILED tests/test_trainer.py::SetFitTrainerTest::test_raise_when_metric_value_is_invalid - TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
This is caused by a detail that @blakechi already mentioned earlier in #207 (comment):
Just add on a bit, we may also add a check to see which head we are using right now to determine it's True or False for
SentenceTransformer.encode
'sconvert_to_numpy
since thesklearn
's head can't accept atorch.Tensor
.
In short, the issue originates here now:
embeddings = self.model_body.encode(
x_test, normalize_embeddings=self.normalize_embeddings, convert_to_tensor=True
)
outputs = self.model_head.predict(embeddings)
The self.model_body
can produce GPU Tensors, while the current self.model_head
cannot accept a torch Tensor if it is a simple LogisticRegression
. Well, it can, it will internally perform a np.asarray(...)
call, which happens to work for a tensor on the CPU, but not if the tensor is on the GPU. In the latter case, Torch will tell you to perform .cpu()
or .detach().cpu()
first.
This all is indicative that we need a much larger overhaul of the two head implementations: there's way too many calls to isinstance
that should not be needed. I'll try to plan out a design.
Hi folks, just coming back to this after my vacation - apologies for the delay! @tomaarsen are you still getting errors on the tests locally? I just checked out this branch and it seems to run fine :) If it's green on your end, I think we can merge this! |
Welcome back! I'm afraid that the test failures persist. See my review for details on the bug. The CI doesn't catch it because it doesn't have a GPU. |
Thanks for the clarification @tomaarsen ! Let's try to sort out this messy two-head business and see if there's a clean way we can deal with these types of inconsistencies between |
Previously, the embeddings were always Tensors, which doesn't work with the Logistic Regression head, in particular if the tensors are on CUDA.
I've merged main into this PR to update it. Furthermore, I've resolved the aforementioned type conversion issues by relying on I still believe that we would benefit from refactoring, but I recognize that we should already push out this fix before a potential refactor is ready. I believe this PR is quite important, for several issues have been made to report the bug first reported in #207 (comment). I'd like to merge this ASAP. |
Would be nice if we can have it merged, thank you for your assistance @tomaarsen ! |
I'll merge it if you think it looks good now! @jegork Others are also welcome to review it :) |
@tomaarsen looks good to me! |
Thanks for this @jegork! Apologies for the delay in tackling this. |
Hello!
Currently, when using SetFitModel.predict_proba("test") leads to:
This happens because the SentenceTransformer.encode returns numpy.ndarray rather than tensors, so I have moved numpy.ndarray to torch.Tensor conversion from SetFitHead.predict to .predict_proba (as predict calls predict_proba internally)
Feel free to add better solutions, as this one has to call
isinstance(x_test, torch.Tensor)
both in .predict and .predict_proba (so that .predict returns same datatype as in the parameter)