-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] 无法使用多卡评测 #1828
Comments
开启tp的时候虽然日志写了ON GPU0, ... 7,但是我查看资源发现还是只有一张卡在实际跑呢. |
在配置文件中可以直接配置分片逻辑,在外部直接执行 |
请问这是DP开启8吧,那么如果我的模型上到70B或33B的情况下如何开启TP呢,因为我在python run.py 添加--max-num-workers 8 --hf-num-gpus 8好像都没用呀 |
我在配置文件中加入了 |
TP在 model 的 config 中设置,可参考 |
您是指run_cfg=dict(num_gpus)更改吗,我之前这里改动也是没用的,另外您上一条回复的NumWorkerPartitioner我会报错呀 |
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
环境是昇腾卡NPU
{'CUDA available': False,
'GCC': 'gcc (GCC) 7.3.0',
'MMEngine': '0.9.1',
'OpenCV': '4.8.0',
'PyTorch': '2.1.0',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 10.2\n'
' - C++ Version: 201703\n'
' - Intel(R) MKL-DNN v3.1.1 (Git Hash '
'64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: NO AVX\n'
' - Build settings: BLAS_INFO=open, '
'BUILD_TYPE=Release, '
'CXX_COMPILER=/opt/rh/devtoolset-10/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER '
'-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-unused-function -Wno-unused-result '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wno-psabi '
'-Wno-error=pedantic -Wno-error=old-style-cast '
'-Wno-invalid-partial-specialization '
'-Wno-unused-private-field '
'-Wno-aligned-allocation-unavailable '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Werror=cast-function-type '
'-Wno-stringop-overflow, LAPACK_INFO=open, '
'TORCH_DISABLE_GPU_ASSERTS=ON, '
'TORCH_VERSION=2.1.0, USE_CUDA=OFF, '
'USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, '
'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
'USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=ON, '
'USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, '
'USE_OPENMP=ON, USE_ROCM=OFF, \n',
'Python': '3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:53:27) '
'[GCC 9.4.0]',
'TorchVision': '0.16.0',
'lmdeploy': "not installed:No module named 'lmdeploy'",
'numpy_random_seed': 2147483648,
'opencompass': '0.3.9+',
'sys.platform': 'linux',
'transformers': '4.43.2'}
Reproduces the problem - code/configuration sample
from mmengine.config import read_base
with read_base():
work_dir = "outputs/Llama3_1_and_Qwen2_5-7B-Instruct"
datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), [])
models = sum([v for k, v in locals().items() if k.endswith('_model')], [])
Reproduces the problem - command or script
python run.py configs/eval_OC15_llama3.1_qwen2_custom_gen.py --max-num-workers 8 --num-gpus 8
Reproduces the problem - error message
Other information
虽然知道7B大小的模型只用单卡即可,但为什么指定分布在8卡上运行实际上只有单卡在运行呢?另外DP指定为8貌似也没用
The text was updated successfully, but these errors were encountered: