-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel(R) Core(TM) Ultra 5 125H NPU so slowly? #1084
Comments
very slow when I using npu
|
Hi @mnlife , note - when you measure time end-to-end, you also include the model compile time on NPU - so it may show higher overall number than for CPU and GPU. Please wait the new OpenVINO release and driver update, I hope @TolyaTalamanov will help you get better results. |
Very much looking forward to your reply |
@dmatveev Does performance of LLM on NPU rely on "remote tensors" feature? I also observed that the performance on NPU is worse than CPU. |
Hi, Has this issue been resolved? |
Hi @mnlife , OpenVINO 2024.5 was released recently with much improved performance for LLMs on NPU. Could you try again? Please follow the recommendations in the documentation https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide-npu.html . In particular: update the NPU driver, make sure to export the model with symmetric quantization, and add This is an example for exporting the model:
Also note the comment above about including compilation time. To compare inference performance specifically, you can measure time before and after inference, e.g.
Using model caching as explained in the docs will speed up model compilation and therefore also improve overall duration of the script. |
when I using Intel(R) Core(TM) Ultra 5 125H to test, npu is so slowly?
this is benchmark result
The text was updated successfully, but these errors were encountered: