Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to interrupt when using multi-GPU training in webui. 在webui使用多GPU训练时无法中断 #3978

Closed
1 task done
injet-zhou opened this issue May 30, 2024 · 0 comments · Fixed by #3987
Closed
1 task done
Labels
solved This problem has been already solved

Comments

@injet-zhou
Copy link
Contributor

Reminder

  • I have read the README and searched the existing issues.

Reproduction

  1. luanch webui using: CUDA_VISIBLE_DEVICES='0,1' llamafactory-cli webui
  2. fill the model name or model path
  3. click Start button

Chinese

  1. 使用CUDA_VISIBLE_DEVICES='0,1' llamafactory-cli webui启动webui
  2. 填写模型名称或者模型路径
  3. 点击启动按钮

Expected behavior

Training terminated successfully. 正常终止训练

System Info

  • current llama-factory reversion's commit id: 97346c1
  • OS: Ubuntu 22.04
  • transformers version: 4.41.1
  • Platform: Linux-5.15.0-43-generic-x86_64-with-glibc2.35
  • Python version: 3.10.14
  • Huggingface_hub version: 0.23.2
  • Safetensors version: 0.4.3
  • Accelerate version: 0.30.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: Yes

Others

No response

@hiyouga hiyouga added the pending This problem is yet to be addressed label May 30, 2024
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants