-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Difference in embedding values for Alibaba-NLP/gte-Qwen2-1.5B-instruct between vLLM and huggingface methods #11801
Comments
I'm pretty sure this is the reason for the discrepency, since vLLM doesn't apply prompt template by default. You have to figure out what is the template in Sentence-Transformers and apply that manually before passing the prompt to vLLM. |
Thanks and I agree that it will have an impact on the user query. However I tried to encode the word |
I think there is something different in the way you use SentenceTransformers. This example script yields consistent results
Output:
|
Thank you for the valuable clarifications 😀 It seems that removing Very strange though. What is |
Oh, I just remembered this issue: huggingface/transformers#34882 You should pass the same value of |
Thanks for the solution! It is now consistent across both methods. 😀 |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
When I serve the Alibaba-NLP/gte-Qwen2-1.5B-instruct model using vllm and compare the embedding values with HuggingFace, the resulting similarity scores seem to be very different.
First I serve the model using:
python -m vllm.entrypoints.openai.api_server --dtype auto --tensor-parallel-size 1 --enforce-eager --model gte-Qwen2-1.5B-instruct/snapshots/c6c1b92f4a3e1b92b326ad29dd3c8433457df8dd --gpu-memory-utilization 0.85 --task embed
Then on the client side, after initializing, I do the following:
When I follow the HuggingFace method using Sentence Transformers, it looks like this:
You can clearly see the similarity scores are different. What is the solution to address this problem?
Also notice that the query embeddings have
prompt_name="query"
enabled during the encoding. How to do the same for vLLM?Thanks for your feedback in advance! :)
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: