CUDA error with INT 8 inference #1787
Unanswered
gsujankumar
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to get started with implementing INT 8 inference on Deepspeed. But I am running into few CUDA that I find hard to debug.
Code:
I am interested in implementing INT8 inference with GPT2 styled models, the code I am running is the following:
Setup:
I am running this with
I noticed few bugs blocking INT8 inference and I made the following changes to the source code:
deepspeed/runtime/weight_quantizer.py
as
is_mlp
was not defineddeepspeed/runtime/weight_quantizer.py
deepspeed/ops/inference/transformer_inference.py
The output
While the code runs error free with
dtype=torch.float
anddtype=torch.half
I am running into errors withdtype=torch.int8
running
CUDA_VISIBLE_DEVICES=1 CUDA_LAUNCH_BLOCKING=1 deepspeed gpt_example.py
results in the following output:Beta Was this translation helpful? Give feedback.
All reactions