-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support bfloat16 LoRa Adaptors #403
Comments
You can replace the converted_weights = torch.concatenate(
converted_weights,
dim=0).unsqueeze(0).to(dtype=str_dtype_to_torch(dtype)).cpu().numpy() by converted_weights = torch_to_numpy(
torch.concatenate(
converted_weights,
dim=0).unsqueeze(0).to(dtype=str_dtype_to_torch(dtype)).cpu()) and import |
How can i pass BFLOAT adapter weights to the backend? How could I build the pbtext message in "preprocessing" for example to be of bfloat16 datatype? What updates do i need to do in the config.pbtxt |
The issue of converting LoRA adapter is fixed in latest main branch. For running bfloat16 on backend, you should only change the |
I would like to manage loading the lora weights on the first call to that adapter in my preprocessing model.py I am not sure how to package the weights as bfloat16 in order to send them to the Seems like the python backend cannot handle this datatype |
If you hope to send bf16 data between two models and think triton does not support it, please ask in the triton repo. They are more familiar for this part. |
I have a Mistral7B model with fine-tuned LoRa weights with datatype bfloat16.
I ran into issues when attempting to use my adaptors which were compiled for bfloat16
Running the following command in order to convert them to the
.npy
format which allows me to follow the example:Results inthe following error
likely a limitation of numpy not natively supporting bfloat16
I went ahead and converted to float32 instead just to continue testing (and hoping that precision was maintained)
....
I had some hope that in triton I would be able to still use BFLOAT16 because I see it as a support datatype BF16
When I load the models and configs and send them to the backend (which was compiled for bfloat16)
After calling in triton-inference-server i get the following error
[TensorRT-LLM][ERROR] Assertion failed: Expected lora weights to be the same data type as base model (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/loraUtils.cpp:66)
Is there any way for me to pass in the adaptor weights as bfloat16?
I noticed that TYPE_BF16 is listed here
But it does not seem like pb_utils.triton_string_to_numpy("TYPE_BF16") is able to use it (since numpy does not have bfloat16)
Expected behavior
Documented example of using bfloat models with LoRa adaptors.
actual behavior
Examples of FP16.
The text was updated successfully, but these errors were encountered: