You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-v $MODEL_PATH:$MODEL_PATH
-e DEVICE=cuda:1
-e NCCL_DEBUG=INFO
docker.io/vectorchai/scalellm:latest --logtostderr --model_path=$MODEL_PATH --model_id=$MODEL_ID --model_type=Yi
I20231129 08:13:34.992501 7 main.cpp:135] Using devices: cuda:1
W20231129 08:13:34.993809 7 args_overrider.cpp:132] Overwriting model_type from llama to Yi
I20231129 08:13:34.993916 7 engine.cpp:91] Initializing model from: /data4/candowu/modelscope/01ai/Yi-34B-Chat-4bits
W20231129 08:13:34.993944 7 model_loader.cpp:162] Failed to find tokenizer.json, use tokenizer.model instead. Please consider using fast tokenizer for better performance.
I20231129 08:13:35.245934 7 engine.cpp:98] Initializing model with dtype: Half
I20231129 08:13:35.245993 7 engine.cpp:107] Initializing model with ModelArgs: [model_type: Yi, dtype: float16, hidden_size: 7168, hidden_act: silu, intermediate_size: 20480, n_layers: 60, n_heads: 56, n_kv_heads: 8, vocab_size: 64000, rms_norm_eps: 1e-05, layer_norm_eps: 0, rotary_dim: 0, rope_theta: 5e+06, rope_scaling: 1, rotary_pct: 1, max_position_embeddings: 4096, bos_token_id: 1, eos_token_id: 2, use_parallel_residual: 0, attn_qkv_clip: 0, attn_qk_ln: 0, attn_alibi: 0, alibi_bias_max: 0, no_bias: 0, residual_post_layernorm: 0], QuantArgs: [quant_method: awq, bits: 4, group_size: 128, desc_act: 0, true_sequential: 0]
terminate called after throwing an instance of 'c10::Error'
what(): The NVIDIA driver on your system is too old (found version 11080). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.
Exception raised from device_count_impl at ../c10/cuda/CUDAFunctions.cpp:53 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7f2c0dc6e38b in /app/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xbf (0x7f2c0dc68f3f in /app/lib/libc10.so)
frame #2: c10::cuda::device_count_ensure_non_zero() + 0x18c (0x7f2c0e0535dc in /app/lib/libc10_cuda.so)
The text was updated successfully, but these errors were encountered:
Thank you for reporting this issue. It appears that an upgrade of your NVIDIA driver to version 525.* is necessary. Our image was built with PyTorch 2.* and CUDA 12.1, which requires a minimum driver version of 525.*.
Please note that the CUDA version is not a concern in this case, as the Docker image does not utilize CUDA. Upgrading your NVIDIA driver should resolve the issue.
We are thrilled to share that ScaleLLM has expanded its compatibility to include both CUDA 11.8 and CUDA 12.1. I've just released a new version specifically for this purpose. You can check it out here: New Release for CUDA 11.8 Support.
NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8
export MODEL_PATH=Yi-34B-Chat-4bits
01ai/Yi-34B-Chat-4bits $ export MODEL_ID=01-ai/Yi-34B-Chat-4bits
01ai/Yi-34B-Chat-4bits $ docker run -it --gpus=all --net=host --shm-size=1g \
The text was updated successfully, but these errors were encountered: