Training a model on MBP M1 extremely slow #7308

lesleypotters · 2022-04-06T08:15:22Z

Search before asking

I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Training

Bug

Hi all,

I am working on a MBP M1 in a PyTorch environment with torchvision 0.12.0 and torch 1.11.0 (as recommended). I am trying yolov5 out with a wheat detection dataset. When running:
python train.py --img 1024 --batch 8 --epochs 100 --data wheat.yaml --cfg models/yolov5s.yaml --name wm
it indeed starts to train for 100 epochs, but the expected time is about an hour per epoch. I find it suspicious that no gpu memory is allocated gpu_mem 0G, although I have to say I am a newby to yolov5 and MBP M1.

This is a prtscr:

What could I change to improve? I have tried with --device cpubut to no avail. Any other options? Thanks!

This is my python detect.py output (which, if I am correct, is ok):

Environment

YOLOv5 🚀 v6.1-105-gd257c75 torch 1.11.0 CPU
Setup complete ✅ (8 CPUs, 16.0 GB RAM, 156.2/926.4 GB disk)
Python 3.8.13
torch 1.11.0
torchvision 0.12.0
OS: macOS Monterey 12.3

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

github-actions · 2022-04-06T08:16:06Z

👋 Hello @lesleypotters, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email [email protected].

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher · 2022-04-06T09:03:03Z

@lesleypotters base M1 is pretty slow, Pro and Max are much faster but still not as fast as a CUDA GPU. Yes your times look right. This is essentially just fast CPU training, the Neural Engine is not being used by PyTorch, but is being used for CoreML exported models.

gpu_mem displays CUDA memory usage only.

See my Reddit post here: https://www.reddit.com/r/MachineLearning/comments/tbj4lf/comment/i083o5s/?utm_source=share&utm_medium=web2x&context=3

lesleypotters · 2022-04-06T15:11:15Z

@glenn-jocher Many thanks for your answer, very helpful.
Can we expect PyTorch to use the Neural Engine at some point (I should ask them).

Can I add --include coreml in the training code that I provided? Thanks again!

glenn-jocher · 2022-04-06T15:13:48Z

@lesleypotters yes the PyTorch team is working on M1 support, no current timeline available though.

Export is something you do after training has completed. See TFLite, ONNX, CoreML, TensorRT Export tutorial for details.

YOLOv5 Tutorials

Train Custom Data 🚀 RECOMMENDED
Tips for Best Training Results ☘️ RECOMMENDED
Weights & Biases Logging 🌟 NEW
Roboflow for Datasets, Labeling, and Active Learning 🌟 NEW
Multi-GPU Training
PyTorch Hub ⭐ NEW
TFLite, ONNX, CoreML, TensorRT Export 🚀
Test-Time Augmentation (TTA)
Model Ensembling
Model Pruning/Sparsity
Hyperparameter Evolution
Transfer Learning with Frozen Layers ⭐ NEW
Architecture Summary ⭐ NEW

Good luck 🍀 and let us know if you have any other questions!

lesleypotters · 2022-04-06T15:17:04Z

Ok great, thanks for the instant support! I will close this thread.

sphrak · 2022-08-07T21:06:15Z

@glenn-jocher is this the stuff we need to get faster training speed with yolov5? https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/

also: would it require a lot of work to get yolov5 run with this you think? or would it be enough to bump pytorch to v1.12?

https://github.com/pytorch/pytorch/releases/tag/v1.12.0

glenn-jocher · 2022-08-08T14:24:50Z

@sphrak use python universal2 installer for ARM devices and torch nightly if you expect to use MPS

sphrak · 2022-08-09T17:08:36Z

@glenn-jocher thanks, ~~but is that for training or is it only detection phase?~~

I think i got it i just pass --device mps to either detect.py or train.py on nightly and then it seems to be using the mps backend.

glenn-jocher · 2022-08-09T18:00:30Z

@sphrak yes both, but full MPS support is not working yet due to unsupported pytorch aten ops. Regardless running YOLOv5 with an ARM python version will significantly speed up performance on M1/M2 devices vs Intel CPU speeds (but not as much as full MPS support).

suveerudayashankara · 2023-01-11T04:07:31Z

but if is use --device mps for train.py it is showing not implemented error,but working fine for detect.py
is their something i am missing

jasonrichdarmawan · 2023-01-31T07:44:00Z

@glenn-jocher thanks, ~~but is that for training or is it only detection phase?~~

I think i got it i just pass --device mps to either detect.py or train.py on nightly and then it seems to be using the mps backend.

running python detect.py --device mps --weights yolov7-e6e.pt --img-size 1280 --source 0 throws error.

(pytorch) jason@Jasons-Mac-mini yolov7 % python detect.py --device
 mps --weights yolov7-e6e.pt --img-size 1280 --source 0
Namespace(weights=['yolov7-e6e.pt'], source='0', img_size=1280, conf_thres=0.25, iou_thres=0.45, device='mps', view_img=False, save_txt=False, save_conf=False, nosave=False, classes=None, agnostic_nms=False, augment=False, update=False, project='runs/detect', name='exp', exist_ok=False, no_trace=False)
Traceback (most recent call last):
  File "/Volumes/T7Touch/Learn/yolov7/detect.py", line 196, in <module>
    detect()
  File "/Volumes/T7Touch/Learn/yolov7/detect.py", line 30, in detect
    device = select_device(opt.device)
  File "/Volumes/T7Touch/Learn/yolov7/utils/torch_utils.py", line 71, in select_device
    assert torch.cuda.is_available(), f'CUDA unavailable, invalid device {device} requested'  # check availability
AssertionError: CUDA unavailable, invalid device mps requested

OliveCHU · 2023-02-22T12:37:03Z

@glenn-jocher thanks, ~~but is that for training or is it only detection phase?~~
I think i got it i just pass --device mps to either detect.py or train.py on nightly and then it seems to be using the mps backend.

running python detect.py --device mps --weights yolov7-e6e.pt --img-size 1280 --source 0 throws error.

(pytorch) jason@Jasons-Mac-mini yolov7 % python detect.py --device
 mps --weights yolov7-e6e.pt --img-size 1280 --source 0
Namespace(weights=['yolov7-e6e.pt'], source='0', img_size=1280, conf_thres=0.25, iou_thres=0.45, device='mps', view_img=False, save_txt=False, save_conf=False, nosave=False, classes=None, agnostic_nms=False, augment=False, update=False, project='runs/detect', name='exp', exist_ok=False, no_trace=False)
Traceback (most recent call last):
  File "/Volumes/T7Touch/Learn/yolov7/detect.py", line 196, in <module>
    detect()
  File "/Volumes/T7Touch/Learn/yolov7/detect.py", line 30, in detect
    device = select_device(opt.device)
  File "/Volumes/T7Touch/Learn/yolov7/utils/torch_utils.py", line 71, in select_device
    assert torch.cuda.is_available(), f'CUDA unavailable, invalid device {device} requested'  # check availability
AssertionError: CUDA unavailable, invalid device mps requested

Hi, did you solve the problem? I received same error here, running python train.py --device mps --data vehicle/data.yaml --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml
I use a Macbook pro with M1 chip. and trying to train yolo7 using custom data.
my torch version is 2.0.0.dev20230131
checked mps by torch.backends.mps.is_available() and it return True

Mopele · 2023-03-05T10:36:49Z

Same with me! I Also try to train yolov7 with mps and also have confirmed that mps is available, but I get the same error

il-nietos · 2023-03-13T18:51:04Z

When I compare the yolov7/utils/torch_utils.py (on the left) and yolov5/utils/torch_utils.py (on the right), the v7 doesn't seem to have an option for mps. Has anyone had any luck configuring these?

Crear12 · 2023-05-20T03:21:39Z

When I compare the yolov7/utils/torch_utils.py (on the left) and yolov5/utils/torch_utils.py (on the right), the v7 doesn't seem to have an option for mps. Has anyone had any luck configuring these?

I tried to modify it and eventually could get “mps” activated, but tbh it’s not worthy because there are more incompatibility issues like old ones used float64 which mps doesn’t support. It’s endless. I would suggest just use latest ones.

glenn-jocher · 2023-05-20T07:05:49Z

@Crear12 thank you for sharing your experience with modifying the torch_utils.py file to activate MPS on YOLOv7. It's good to know that while you were able to modify the file to activate MPS, you have also encountered incompatibility issues which made the whole process not worth the effort. For those who want to use MPS, it is recommended to use the latest versions and updates of YOLOv5 and PyTorch. Thank you again for sharing your experience!

ez4bk · 2023-06-11T13:15:50Z

Using PyTorch Nightly version could solve this problem with adding the flag device=mps
yolo train data=data.yaml epochs=100 batch=64 device=mps model=yolov8n.pt
Everything is working fine for me, and the M1Pro GPU significantly speeds up the training process.

glenn-jocher · 2023-11-15T01:50:38Z

@ez4bk That's great news! Thank you for sharing your solution with us. It's good to hear that using the PyTorch Nightly version with the device=mps flag has resolved the issue and significantly sped up the training process on your M1Pro GPU. Your input will undoubtedly be helpful to others facing similar challenges. Thank you for your contribution!

lesleypotters added the bug Something isn't working label Apr 6, 2022

lesleypotters closed this as completed Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training a model on MBP M1 extremely slow #7308

Training a model on MBP M1 extremely slow #7308

lesleypotters commented Apr 6, 2022 •

edited

Loading

github-actions bot commented Apr 6, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Apr 6, 2022 •

edited

Loading

lesleypotters commented Apr 6, 2022

glenn-jocher commented Apr 6, 2022 •

edited by UltralyticsAssistant

Loading

lesleypotters commented Apr 6, 2022

sphrak commented Aug 7, 2022

glenn-jocher commented Aug 8, 2022

sphrak commented Aug 9, 2022 •

edited

Loading

glenn-jocher commented Aug 9, 2022

suveerudayashankara commented Jan 11, 2023

jasonrichdarmawan commented Jan 31, 2023

OliveCHU commented Feb 22, 2023

Mopele commented Mar 5, 2023

il-nietos commented Mar 13, 2023

Crear12 commented May 20, 2023

glenn-jocher commented May 20, 2023

ez4bk commented Jun 11, 2023

glenn-jocher commented Nov 15, 2023

Training a model on MBP M1 extremely slow #7308

Training a model on MBP M1 extremely slow #7308

Comments

lesleypotters commented Apr 6, 2022 • edited Loading

Search before asking

YOLOv5 Component

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

github-actions bot commented Apr 6, 2022 • edited by UltralyticsAssistant Loading

Requirements

Environments

Status

glenn-jocher commented Apr 6, 2022 • edited Loading

lesleypotters commented Apr 6, 2022

glenn-jocher commented Apr 6, 2022 • edited by UltralyticsAssistant Loading

YOLOv5 Tutorials

lesleypotters commented Apr 6, 2022

sphrak commented Aug 7, 2022

glenn-jocher commented Aug 8, 2022

sphrak commented Aug 9, 2022 • edited Loading

glenn-jocher commented Aug 9, 2022

suveerudayashankara commented Jan 11, 2023

jasonrichdarmawan commented Jan 31, 2023

OliveCHU commented Feb 22, 2023

Mopele commented Mar 5, 2023

il-nietos commented Mar 13, 2023

Crear12 commented May 20, 2023

glenn-jocher commented May 20, 2023

ez4bk commented Jun 11, 2023

glenn-jocher commented Nov 15, 2023

lesleypotters commented Apr 6, 2022 •

edited

Loading

github-actions bot commented Apr 6, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Apr 6, 2022 •

edited

Loading

glenn-jocher commented Apr 6, 2022 •

edited by UltralyticsAssistant

Loading

sphrak commented Aug 9, 2022 •

edited

Loading