-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
posts/intel-pytorch-extension-tutorial/native-ubuntu/ #38
Comments
the mamba part doesn't work for me and does not recognize mamba |
Hi @Danyal-sab, Thanks for pointing that out. I forgot to include the line after running the Mambaforge install script to initialize Mamba. I updated the post with the missing line. You can run the following commands to initialize Mamba and relaunch the current bash shell to apply the changes: ~/mambaforge/bin/mamba init
bash I saw you posted another comment earlier, but it got deleted before I could respond. Did you resolve your previous issue? |
Hi @cj-mills, Thanks for your speedy reply. And thanks for your help. And yes, I posted about a minor issue with executing the orders for "Install oneAPI Base Toolkit" section, where the second line gave typing error, I resolved it by simply removing the \s at the end of lines and after that it performed right. that's why I deleted the post. Thanks again for your great help |
Hi @cj-mills, is there any tool for monitoring the Arc gpu memory usage (like nvidia-smi for nvidia) I checked a few tools like "intel_gpu_top", intel Vtune and intel GPA, but they either they weren't compatible with ubuntu 23.04 or they don't offer monitoring gpu memory usage. is there other tools we possibly can use? |
Hi @Danyal-sab, The only one I know of is the Unfortunately, you would need to compile the tool from the source code. Also, it does not seem fully functional on my system, as it does not show any running processes: $ sudo sysmon
=====================================================================================
GPU 0: Intel(R) Arc(TM) A770 Graphics PCI Bus: 0000:03:00.0
Vendor: Intel(R) Corporation Driver Version: 1.3.26241 Subdevices: 0
EU Count: 512 Threads Per EU: 8 EU SIMD Width: 8 Total Memory(MB): 15473.6
Core Frequency(MHz): 2000.0 of 2400.0 Core Temperature(C): unknown
=====================================================================================
Running Processes: unknown
=====================================================================================
GPU 1: Intel(R) UHD Graphics 750 PCI Bus: 0000:00:02.0
Vendor: Intel(R) Corporation Driver Version: 1.3.26241 Subdevices: 0
EU Count: 32 Threads Per EU: 7 EU SIMD Width: 8 Total Memory(MB): 25360.9
Core Frequency(MHz): 350.0 of 1300.0 Core Temperature(C): unknown
=====================================================================================
Running Processes: unknown |
great tutorial that helped me a lot setting up an environment for ML / DL with Arc GPU. It really saved my life and I hope to read more of this type of excellent materials. Thanks again, much appreciated |
I really appreciate your work! I am trying to set the GPU up for SciKit monkey-patch: https://github.com/intel/scikit-learn-intelex but I am struggling to go beyond the CPU acceleration. I have no idea how to 1. List the device and 2. to point to that device. Do you have any experience with that? |
Hi @psmgeelen, I have not tried Intel's Scikit-learn extension, so I don't know if it even supports Arc GPUs. The DPC++ compiler runtime does support Arc GPUs, meaning it should work in theory. Have you tried the example code for performing computations on the GPU in the extension's documentation? Based on the example code, the Arc GPU should be the "gpu:0" device, assuming it is the only discrete GPU installed on the system. The integrated graphics should be the "gpu:1" device. |
Hi @cj-mills, Then I started running heavier codes close to the limit of the a770 and then the crashes stopped for the day. |
@cj-mills , I have, and it's not finding the device for whatever reason..I created a ticket at intelex here: uxlfoundation/scikit-learn-intelex#1357 (comment) |
Hi @cj-mills, |
@Danyal-sab I have not tested the YOLOX training code with the previous extension because the code requires torchvision 0.15+ (which requires PyTorch 2.0+). I updated the tutorial because everything I tested that worked with the previous extension version still works with the new version, and the current Ubuntu LTS now ships with a kernel that supports ARC GPUs. |
@cj-mills, from torchtnt.utils import get_module_summary Thanks again for your great help |
@Danyal-sab |
@cj-mills, |
@Danyal-sab It sounds like a similar performance difference to not having the IPEX_XPU_ONEDNN_LAYOUT environment variable set. I don't know if that's related to your issue, but maybe try setting that environment variable to 0 and 1 to see if it impacts performance. It might also just be a bad driver update. Can you roll back to the previous driver version? |
@cj-mills, |
@Danyal-sab, I briefly swapped in the Arc card a couple of weeks ago, and the training notebooks that worked in the previous versions no longer produced usable models. It was the same issue I described here, but it occurred even with the baseline image classification notebook. I think I tried with Python 3.9, 3.10, and 3.11, and I had the same issue with all of them. I did not have time to investigate, so I held off making a post about it. |
@cj-mills, |
@Danyal-sab, |
@cj-mills, |
@cj-mills, |
Hi @cj-mills and everone,
But it does not make the things better. Thanks in advance! |
Hi @contryboy, I have not had a chance to investigate the source of the issue, but I plan to give it another shot when the next |
Hi @cj-mils, [1] https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/examples.html#float32 |
@contryboy Nice! It would certainly be more convenient for me if they resolved the issue for the next release. |
Hi @cj-mills, |
@Danyal-sab, I will when I have enough time to swap my Arc card into my desktop and test the latest version. I've been too busy with work projects lately to swap out my NVIDIA card. |
Many thanks @cj-mills, |
hello,it's glad to see that torch 2.5 preview has been released,with native xpu support |
hi,@cj-mills |
Have you encountered the 4gb issue noted in intel/intel-extension-for-pytorch#325 I cannot use the bigger models due to this issue even though the arc770 has 16 gb. |
I have not encountered the 4GB issue, but that might just be a matter of what models I've tested. I have not tried to replicate the issue on my card. I can try when I have some time. |
Hey @vampireLibrarianMonk, I finally had time to set up a separate computer with the A770 and run the test case in the GitHub issue you linked to. Running the test case using Intel's Pytorch extension produces the following x = torch.rand(46000, 46000, dtype=torch.float32, device='xpu')
Using the preview version x = torch.rand(46000, 46000, dtype=torch.float32, device='xpu')
x = torch.rand(33000, 32600, dtype=torch.float32, device='xpu') ---------------------------------------------------------------------------
OutOfMemoryError Traceback (most recent call last)
Cell In[4], line 1
----> 1 x = torch.rand(33000, 32600, dtype=torch.float32, device='xpu')
OutOfMemoryError: XPU out of memory. Tried to allocate 4.01 GiB. GPU 0 has a total capacity of 15.11 GiB. Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory. |
This issue is currently being worked on via intel/intel-extension-for-pytorch#325 Updates to the discussion are happening at least weekly but no concrete solution yet. |
Christian Mills - Getting Started with Intel’s PyTorch Extension for Arc GPUs on Ubuntu
This tutorial provides a step-by-step guide to setting up Intel’s PyTorch extension on Ubuntu to train models with Arc GPUs.
https://christianjmills.com/posts/intel-pytorch-extension-tutorial/native-ubuntu/
The text was updated successfully, but these errors were encountered: