You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Flash attention of both implementations from the original one or the torch.nn.functional.scaled_dot_production from pytorch2.0 cannot be integrated with LLaMA pipeline parallelism training.
ds_report output
-------------------------------------------------- │+-----------------------------------------------------------------------------+
DeepSpeed C++/CUDA extension op report │| Processes: |
-------------------------------------------------- │| GPU GI CI PID Type Process name GPU Memory |
NOTE: Ops not installed will be just-in-time (JIT) compiled at │| ID ID Usage |
runtime if needed. Op compatibility means that your system │|=============================================================================|
meet the required dependencies to JIT install the op. │| 0 N/A N/A 1880877 C python3 2525MiB |
-------------------------------------------------- │| 1 N/A N/A 1879942 C ...s/pytorch_scse/bin/python 40461MiB |
JIT compiled ops requires ninja │| 2 N/A N/A 1030504 C ...envs/wespeaker/bin/python 31351MiB |
ninja .................. [OKAY] │| 4 N/A N/A 1663 C ...s/pytorch_scse/bin/python 40461MiB |
-------------------------------------------------- │| 5 N/A N/A 1030505 C ...envs/wespeaker/bin/python 31351MiB |
op name ................ installed .. compatible │| 6 N/A N/A 2442536 C ...s/pytorch_scse/bin/python 40461MiB |
-------------------------------------------------- │+-----------------------------------------------------------------------------+
[WARNING] async_io requires the dev libaio .so object and headers but these were not found. │(base) fangkai@scsehg:~$
[WARNING] async_io: please install the libaio-dev package with apt │
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. │
async_io ............... [NO] ....... [NO] │
cpu_adagrad ............ [NO] ....... [OKAY] │
cpu_adam ............... [NO] ....... [OKAY] │
fused_adam ............. [NO] ....... [OKAY] │
fused_lamb ............. [NO] ....... [OKAY] │
quantizer .............. [NO] ....... [OKAY] │
random_ltd ............. [NO] ....... [OKAY] │
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 │
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible │
sparse_attn ............ [NO] ....... [NO] │
spatial_inference ...... [NO] ....... [OKAY] │
transformer ............ [NO] ....... [OKAY] │
stochastic_transformer . [NO] ....... [OKAY] │
transformer_inference .. [NO] ....... [OKAY] │
-------------------------------------------------- │
No CUDA runtime is found, using CUDA_HOME='/cm/shared/apps/cuda11.6/toolkit/11.6.0' │
DeepSpeed general environment info: │
torch install path ............... ['/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch'] │
torch version .................... 2.0.0+cu117 │
deepspeed install path ........... ['/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/deepspeed'] │
deepspeed info ................... 0.9.5, unknown, unknown │
torch cuda version ............... 11.7 │
torch hip version ................ None │
nvcc version ..................... 11.6 │
deepspeed wheel compiled w. ...... torch 2.0, cuda 11.7
Describe the bug
Flash attention of both implementations from the original one or the torch.nn.functional.scaled_dot_production from pytorch2.0 cannot be integrated with LLaMA pipeline parallelism training.
ds_report output
Screenshots
The error information is as follows:
System info (please complete the following information):
Launcher context
deepspeed launcher
The code for implementing flash attention in my own project is as follows:
The text was updated successfully, but these errors were encountered: