-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Can't Use RunAI Model Streamer When Streaming Into More Than 1 GPU - Pickling Error #11819
Closed
1 task done
Labels
bug
Something isn't working
Comments
I have created a fix #11825, can you try it out? |
Looking into it |
Pickling issue is fixed, but there is a new error |
|
1 task
Created new issue for this new error: #11858 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
Here is what I encountered when trying to load into 2 GPUs on my EC2 through Vllm
It gave me this
Can't pickle <class 'botocore.client.S3'>: attribute lookup S3 on botocore.client failed
It was setup following this guide: https://docs.vllm.ai/en/stable/serving/runai_model_streamer.html
AWS Credential was set through environment variables of
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
andAWS_SESSION_TOKEN
Command line used:
vllm serve s3://llama/llama-3.1-8B --load-format runai_streamer --tensor-parallel-size 2 --model-loader-extra-config '{"concurrency":2}'
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: