TRANSFER LEARNING EXAMPLE #106

glenn-jocher · 2019-02-22T14:51:34Z

This guide explains how to train your data with YOLOv3 using Transfer Learning. Transfer learning can be a useful way to quickly retrain YOLOv3 on new data without needing to retrain the entire network. We accomplish this by starting from the official YOLOv3 weights, and setting each layer's .requires_grad field to false that we do not want to calculate gradients for and optimize.

Before You Start

Update (Python >= 3.7, PyTorch >= 1.3, etc.) and install requirements.txt dependencies.
Clone repo: git clone https://github.com/ultralytics/yolov3
Download COCO: bash yolov3/data/get_coco2017.sh

Transfer Learning

1. Download pretrained weights from our Google Drive folder that you want to use to transfer learn, and place them in yolov3/weights/.

2. Update *.cfg file (optional). Each YOLO layer has 255 outputs: 85 outputs per anchor [4 box coordinates + 1 object confidence + 80 class confidences], times 3 anchors. If you use fewer classes, reduce filters to filters=[4 + 1 + n] * 3, where n is your class count. This modification should be made to the layer preceding each of the 3 YOLO layers. Also modify classes=80 to classes=n in each YOLO layer, where n is your class count.

3. Train.

python3 train.py --data coco1cls.data --cfg yolov3-spp-1cls.cfg --weights weights/yolov3-spp.pt --transfer

Run the above code to transfer learn on COCO, or specify your own data as --data data/custom.data (See https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data).

If you created a custom *.cfg file, specify it as --cfg custom.cfg.

You can observe in the Model Summary (using model_info(model, report='full') in train.py) that only the 3 YOLO layers have their gradients activated now (all other layers are frozen for duration of training):

Reproduce Our Environment

To access an up-to-date working environment (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled), consider a:

GCP Deep Learning VM with $300 free credit offer: See our GCP Quickstart Guide
Google Colab Notebook with 12 hours of free GPU time: Google Colab Notebook
Docker Image from https://hub.docker.com/r/ultralytics/yolov3. See Docker Quickstart Guide

The text was updated successfully, but these errors were encountered:

jw-pyo · 2019-02-27T06:00:10Z

Hi @glenn-jocher , I have a question about this. I want to change the configuration of yolo layers(remove some layer, change the number of filters, etc..) and apply transfer learning. In this case, is it possible to use transfer learning using the official weight? If it's possible, could you give me the way or just a keyword about this?

glenn-jocher · 2019-02-27T12:59:38Z

@jw-pyo you can do anything you want, but you have to do it, we can't "give you a way". Recommend you visit our tutorials to get started, and the PyTorch tutorials for more general customization questions.

https://docs.ultralytics.com/yolov5/tutorials/train_custom_data
https://github.com/ultralytics/yolov3/wiki/Example:-Transfer-Learning
https://pytorch.org/tutorials/

hac135 · 2019-03-23T11:34:00Z

I hava a problem, I want to train some new classes and pictures using transfer learning.
but my classes number=7. so if I use darknet53.conv.74 as pretrained model, it doesn't work !
what should I do

jw-pyo · 2019-03-23T11:43:57Z

@hac135 If you want to use pretrained model as transfer learning but your own model has different shape, what I know is just copying the weights which are same shape with pretrained model, and about layers of different shape, you just manually initialize the corresponding layer.

glenn-jocher · 2019-03-23T12:40:47Z

@hac135 most people don't realize this, and it's not the recommended method to go about things, but you can technically use the existing YOLOv3 architecture (and hence the pretrained yolov3.pt) to train any model with n<=80 classes with no changes. The unused conf outputs will learn to simply default to zero, and the rest of the unused outputs (the box and class conf associated with those unused classes) will no longer matter.

For example, our single class tutorial operates just as well with no modifications to the cfg file:
https://github.com/ultralytics/yolov3/wiki/Example:-Train-Single-Class

It's not clean and its not optimal, but it works.

hac135 · 2019-03-24T08:03:54Z

@hac135 most people don't realize this, and it's not the recommended method to go about things, but you can technically use the existing YOLOv3 architecture (and hence the pretrained yolov3.pt) to train any model with n<=80 classes with no changes. The unused conf outputs will learn to simply default to zero, and the rest of the unused outputs (the box and class conf associated with those unused classes) will no longer matter.

For example, our single class tutorial operates just as well with no modifications to the cfg file:
https://github.com/ultralytics/yolov3/wiki/Example:-Train-Single-Class

It's not clean and its not optimal, but it works.

Thank you ! it did works!

hac135 · 2019-03-24T08:06:13Z

@hac135 If you want to use pretrained model as transfer learning but your own model has different shape, what I know is just copying the weights which are same shape with pretrained model, and about layers of different shape, you just manually initialize the corresponding layer.

that's a good suggestion, thanks

glenn-jocher · 2019-05-14T20:37:54Z

@shahidammer try training from scratch, and observe your training results in results.txt.

glenn-jocher · 2019-05-14T20:54:28Z

@shahidammer please note that most technical problems are due to:

Your changes to the default repository. If your issue is not reproducible in a fresh git clone version of this repository we can not debug it. Before going further run this code and ensure your issue persists:

sudo rm -rf yolov3  # remove exising repo
git clone https://github.com/ultralytics/yolov3 && cd yolov3 # git clone latest
python3 detect.py  # verify detection
python3 train.py  # verify training (a few batches only)
# CODE TO REPRODUCE YOUR ISSUE HERE

Your custom data. If your issue is not reproducible with COCO data we can not debug it. Visit our Custom Training Tutorial for exact details on how to format your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of training and testing data.
Your environment. If your issue is not reproducible in a GCP Quickstart Guide VM we can not debug it. Ensure you meet the requirements specified in the README: Unix, MacOS, or Windows with Python >= 3.7, Pytorch >= 1.0, etc.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

parul19 · 2019-05-15T07:09:41Z

@hac135 most people don't realize this, and it's not the recommended method to go about things, but you can technically use the existing YOLOv3 architecture (and hence the pretrained yolov3.pt) to train any model with n<=80 classes with no changes. The unused conf outputs will learn to simply default to zero, and the rest of the unused outputs (the box and class conf associated with those unused classes) will no longer matter.
For example, our single class tutorial operates just as well with no modifications to the cfg file:
https://github.com/ultralytics/yolov3/wiki/Example:-Train-Single-Class
It's not clean and its not optimal, but it works.

Thank you ! it did works!

i want to retain the existing classes and add new class i.e total of 80+1=81 class in coco dataset.Please tell me how to do it using transfer learning

glenn-jocher · 2019-05-18T10:20:48Z

@parul19 you create a new 81 class cfg. Follow the directions in the example above.

sooonism · 2019-05-29T08:28:32Z

Do we still need COCO dataset if we only do transfer-learning?

glenn-jocher · 2019-05-29T14:05:27Z

@sooonism you need whatever dataset you want to train on.

Santhosh1509 · 2019-08-26T07:34:17Z

@glenn-jocher
I am interested in extracting the vehicles on the road. So my interested Motorbike Bicycle Bus Car and truck.

I have a vehicle that is not truck but is being detected as truck. I have collected the new data for this vehicle in COCO format. I want to this add a new class to the existing pre trained network.

Planning to

load the final layer weights of the truck to this new class
alter the conf file according and start training

My question is how do i, do it?

glenn-jocher · 2019-08-26T11:29:17Z

@Santhosh1509 well I would start by reviewing the examples in the wiki, such as the custom training tutorial:
https://github.com/ultralytics/yolov3/wiki

Santhosh1509 · 2019-08-26T14:10:14Z

@glenn-jocher Need your opinion on this. I just saw a post called transfer learning tutorial for SSD using keras.

Its mentioned in

Option 1: Just ignore the fact that we need only 8 classes

This would work, and it wouldn't even be a terrible option. Since only 8 out of the 80 classes would get trained, the model might get gradually worse at predicting the other 72 clases in the second paragraph.

So I feel, even if i could some how train as i mentioned above for a particular new class, the prediction for the other classes might get affected.

Is my approach, right? Is there an alternative way where I could preserve the prediction of the other classes introducing this new class in the same neural network? I feel it needs to be trained from scratch then. What do you think?

glenn-jocher · 2019-08-26T14:35:12Z

@Santhosh1509 training normally will produce the best results. Transfer learning produces mediocre results quickly.

Santhosh1509 · 2019-08-28T12:55:59Z

@glenn-jocher How do I get to know the training loss,training accuracy,validation loss and validation accuracy ?

All i get is this during training

Please guide how do I tune my hyper parameters with this data that is being displayed here?

I could have increase the batch size I have more memory on the GPU

I do not understand the comment on these line

parser.add_argument('--epochs', type=int, default=273) # 500200 batches at bs 16, 117263 images = 273 epochs

parser.add_argument('--batch-size', type=int, default=32) # effective bs = batch_size * accumulate = 16 * 4 = 64

parser.add_argument('--accumulate', type=int, default=2, help='batches to accumulate before optimizing')

PS: latest training image

obj and cls values decreasing, is it good for this training?

glenn-jocher · 2019-08-28T14:10:42Z

@Santhosh1509 all of the information you mention is recorded in results.txt. You can plot this with from utils.utils import *; plot_results(). You should use batch_size 64 accumulate 1 if possible, if not compensate with smaller batch sizes and larger accumulation counts, i.e. batch_size 32 accumulate 2.

obj and cls are training losses, they are supposed to decrease during training. See #392 for hyperparameter evolution, and explore the open issues for answers to your questions.

Santhosh1509 · 2019-08-29T05:20:28Z

@glenn-jocher This is what is stored in results.txt

obj cls total targets , I am confused as to how these relate to training loss,training accuracy,validation loss and validation accuracy

Don't we have a graph which is easy to visualize, rather than just numbers.

Something like this

Now we can use even tensor board support inside pytorch to visualize the values

As the name mentions HYPERPARAMETER EVOLUTION is to plot those not how these (training loss,training accuracy,validation loss and validation accuracy) changed per epoch

glenn-jocher · 2019-08-29T10:04:11Z

@Santhosh1509 Tensorboard logs automatically in this repo if you have it installed. See #435

Santhosh1509 · 2019-08-30T08:59:19Z

@glenn-jocher Please explain how obj cls total targets being displayed here relate to training loss,training accuracy,validation loss and validation accuracy?

I can only relate terms
P -> Precision
R -> Recall
mAP -> mean Average Precision
F1 ->F1 score

glenn-jocher · 2019-08-31T13:01:18Z

@glenn-jocher accuracy is a classification metric, it is not used here. The metrics displayed during training are training losses and the number of targets per batch.

Santhosh1509 · 2019-08-31T14:28:45Z

@glenn-jocher obj or cls which one of these is training loss and what does the other terms mean because both of them decrease during training.

glenn-jocher · 2019-08-31T15:00:19Z

object loss and class loss. training loss is the total of all training losses.

glenn-jocher · 2020-05-03T20:11:30Z

@joel5638 no, you call it once before training to convert your last.pt into a backbone.pt file ready to be used as pretrained weights for future trainings:

yolov3/utils/utils.py

Lines 607 to 619 in 5d42cc1

    
           def create_backbone(f='weights/last.pt'):  # from utils.utils import *; create_backbone() 
        
               # create a backbone from a *.pt file 
        
               x = torch.load(f, map_location=torch.device('cpu')) 
        
               x['optimizer'] = None 
        
               x['training_results'] = None 
        
               x['epoch'] = -1 
        
               for p in x['model'].values(): 
        
                   try: 
        
                       p.requires_grad = True 
        
                   except: 
        
                       pass 
        
               torch.save(x, 'weights/backbone.pt')

joel5638 · 2020-05-03T20:35:04Z

@glenn-jocher perfect will do that. Thank u so much

glenn-jocher · 2020-05-04T19:02:11Z

@joel5638 you should update your repo, image plotting has been updated to show predictions and ground truth jpgs. See #1114

If objects are not labelled correctly in your ground truth jpg then you have a labelling problem.

glenn-jocher · 2020-05-04T19:58:10Z

@joel5638 can you paste your test_gt.jpg and test_pred.jpg here?

glenn-jocher · 2020-05-04T21:16:44Z

@joel5638 ah, it looks like it's working well! Remember there is NMS, so if the person and the face are largely occupying the same region one or the other may be suppressed. You could try it on zidane.jpg to compare, as in that photo the faces and the persons do not occupy similar areas, the way Bush does above.

joel5638 · 2020-05-04T21:25:23Z

@glenn-jocher perfect.
Will try it on zidane.jpg once the training is complete.

glenn-jocher · 2020-05-05T18:46:41Z

@joel5638 --transfer flag is deprecated, you may have been using an older version of the repo before. Basically you no longer need it. Simply train normally, specifying the --weights you want to start from (but making a backbone from them first!).

Your command will technically work, but it is not recommended, as hyps with schedules like the learning rate will asssume 273/373 epochs are complete, which is not the case. So just create a backbone first, and then use a normal training command:

python train.py --data ... --weights weights/backbone.pt --cfg ...

glenn-jocher · 2020-05-05T18:56:01Z

@joel5638 just the same as before. For example to create weights/backbone.pt from weights/last.pt #106 (comment)

from utils.utils import *; create_backbone(f='weights/last.pt')

glenn-jocher · 2020-05-05T20:15:09Z

@Works fine for me. Remember this is a python command, so you run it from a python console. If you are trying to run it from a bash prompt, you need to encapsulate the command in quotes appropriately.

glenn-jocher · 2020-05-05T20:38:09Z

@joel5638 from the ubuntu terminal you run the same command, but as python -c "", so this should work:

python -c "from utils.utils import *; create_backbone(f='weights/last.pt')"

glenn-jocher · 2020-05-05T20:46:45Z

@joel5638 I'm not sure. The command works fine for me. You could try omitting the argument, as it's the default argument anyways. Maybe the single quotes is causing problems.

EDIT: updated image

joel5638 · 2020-05-05T21:29:13Z

@glenn-jocher got it. Its with the quotes. Thank you

joel5638 · 2020-05-06T16:09:10Z

@emmbertelen

try this in the command. this works

python3 -c "from utils.utils import *; create_backbone(f='weights/last.pt')"

glenn-jocher · 2020-05-06T20:05:52Z

@joel5638 that's odd. I scanned my screen with iDetection and all people are picked up fine. Maybe your dataset is too small, or if you are using tiny you should switch to the default yolov3-spp.cfg with the default pretrained weights. Are you training with all default settings? You should also post your results.png.

glenn-jocher · 2020-05-06T23:24:02Z

@joel5638 looks fine. If you want to add a class, like face, be aware you need to train all the existing classes plus your new class. If you're already doing this, then you may just need a larger dataset or longer training.

renosatyaadrian · 2020-05-18T03:05:57Z

Hi @glenn-jocher, I want to ask you about the transfer learning. Is it possible to train a new dataset to increase the prediction of a model? The model I am using here is YOLOv3. In this case for an example of increasing the prediction of the motorbike class by adding a dataset that consists of a few motorbikes images.

I have tried training with new 44 images (34 training - 10 validation), but the result for detection in total was decreased. Is there anything wrong with my training step or my dataset?
I'm using this code
!python train.py --data data/coco16.data --cfg cfg/yolov3.cfg --weight weights/yolov3.pt --epoch 300 --batch-size 8

This one is the result of using yolov3.weights. There is 47 total detection,

and this is the result of using yolov3-transf.weights (after trained). This result has slightly decreased to 37 in total,

glenn-jocher · 2020-05-18T18:48:27Z

@renosatyaadrian I very seriously doubt that you would expect to improve upon a model trained on perhaps thousands of images of motorbikes by training it on 34 images and then expecting it to generalize better.

If you come back with a dataset of 3400 images then perhaps.

github-actions · 2020-08-01T00:23:58Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

AymenKermiche · 2020-08-12T09:15:06Z

@glenn-jocher i want to apply the transfer learning fine-tuning(add 2 or 3 layers on YOLOv3 architecture and train it on my costum dataset) on darknet . Any help Bro and thanks a lot

RadoslawDebosz · 2020-10-21T18:09:40Z

Hi,
I have a problem with test results. I've trained yolo v3 -spp with default settings from pretrained weights on coco datset (transfer learning).
My dataset:
1000 images of city traffic from view of camera's
about 12k objects (cars, persons, trucks, buses)
train /valid split 0.8 / 0.2

and I freezed only last layers.

The results of training after 100 epochs :

And Then I runned test.py on validation set and I've got test results:

And on the training set:

I have 2 questions :

Do you know why the test results are other (worse) than train results on the charts (Preciosion, Recall, F1, mAP)?
I dont know why e.g. the recall on the trainig chart start from value ~0.7 and then decrease near to 0 after first epochs and then increase during learning again to the ~0.7. It looks like the model ignore the pretrained data and learn from the scratch.

dbkalaria · 2021-07-26T07:06:44Z

@kairavpatel the transfer learning tutorial is not recommended. We do not recommend freezing any layers, as this will nearly always result in poorer performance. To train from pretrained weights (recommended) use python3 train.py --weights ...

@glenn-jocher I'm bit confused between transfer learning and training on pretrained weights. Means isn't both are the same like in transfer learning we are using pretrained model weights and freeze the layers weights and fine tune the model. So in training from pretrained model are you just using the weights without freezing any layers and update the weights as training progresses.

glenn-jocher · 2021-07-26T13:21:23Z

@Dhruv312 its just wording.

Basically freezing will always lead to worse results on large datasets.

Utsabab · 2024-03-21T20:35:40Z

@kairavpatel the transfer learning tutorial is not recommended. We do not recommend freezing any layers, as this will nearly always result in poorer performance. To train from pretrained weights (recommended) use python3 train.py --weights ...

@glenn-jocher I'm bit confused between transfer learning and training on pretrained weights. Means isn't both are the same like in transfer learning we are using pretrained model weights and freeze the layers weights and fine tune the model. So in training from pretrained model are you just using the weights without freezing any layers and update the weights as training progresses.

@kairavpatel I get how you are confused with this cause I was in the same position. Let me explain:

When we say using pre-trained weights, the assumption is that our new custom data has images with labels that are also available in COCO. There are no new class labels in our new dataset, therefore, we can use pre-trained weights from COCO for our desired class.

Transfer learning is done when we have new class in our custom dataset. The COCO pre-trained model will not be able to detect it. Therefore, we will have to train the model with COCO + custom dataset with updated class labeling (0-79 COCO labels, 80-n new labels from custom data). This is transfer learning where pre-trained weights can be utilized to make predictions in custom dataset with new class labels. To speed up the process layer freezing is done for faster training time, however, mAP is reduced almost every time.

Hope this explanation helps!

glenn-jocher · 2024-03-22T03:02:43Z

@Utsabab Great question! In the context of using pretrained weights, the idea is to leverage what the model has already learned without specifically freezing any layers. This means we update the weights across all layers based on the new data, which usually gives a better performance because the model can adapt more flexibly to the new task.

Transfer learning, as often discussed, involves modifying or extending the existing model architecture to better fit new data, which can include freezing certain layers to not update during training. Essentially, using pretrained weights and not freezing layers allows the whole model to adjust and learn from the new data, while freezing layers in transfer learning is more about fine-tuning or adapting the model to new, possibly related tasks.

So, when training with pretrained weights the command is simple:

python3 train.py --weights yolov3.pt --data yourdata.yaml

And there's no need to explicitly freeze layers unless you have a very specific case where you believe it's necessary. 🤓 Hope that clarifies things!

glenn-jocher self-assigned this Feb 22, 2019

glenn-jocher added the tutorial label Mar 29, 2019

glenn-jocher pinned this issue May 29, 2019

glenn-jocher unpinned this issue Jul 30, 2019

glenn-jocher pinned this issue Aug 7, 2019

github-actions bot added the Stale Stale and schedule for closing soon label Aug 1, 2020

github-actions bot closed this as completed Aug 6, 2020

shayanalibhatti mentioned this issue Sep 16, 2020

Guidance regarding transfer learning with COCO to add additional classes ultralytics/yolov5#980

Closed

glenn-jocher reopened this Oct 8, 2020

github-actions bot closed this as completed Oct 14, 2020

glenn-jocher mentioned this issue Nov 3, 2020

Why not freeze layers for finetuning? ultralytics/yolov5#1264

Closed

TRANSFER LEARNING EXAMPLE #106

TRANSFER LEARNING EXAMPLE #106

Comments

glenn-jocher commented Feb 22, 2019 • edited Loading

Before You Start

Transfer Learning

Reproduce Our Environment

jw-pyo commented Feb 27, 2019

glenn-jocher commented Feb 27, 2019 • edited Loading

hac135 commented Mar 23, 2019

jw-pyo commented Mar 23, 2019

glenn-jocher commented Mar 23, 2019 • edited Loading

hac135 commented Mar 24, 2019

hac135 commented Mar 24, 2019

glenn-jocher commented May 14, 2019

glenn-jocher commented May 14, 2019 • edited Loading

parul19 commented May 15, 2019

glenn-jocher commented May 18, 2019

sooonism commented May 29, 2019

glenn-jocher commented May 29, 2019

Santhosh1509 commented Aug 26, 2019 • edited Loading

glenn-jocher commented Aug 26, 2019

Santhosh1509 commented Aug 26, 2019 • edited Loading

glenn-jocher commented Aug 26, 2019

Santhosh1509 commented Aug 28, 2019 • edited Loading

glenn-jocher commented Aug 28, 2019 • edited Loading

Santhosh1509 commented Aug 29, 2019 • edited Loading

glenn-jocher commented Aug 29, 2019

Santhosh1509 commented Aug 30, 2019

glenn-jocher commented Aug 31, 2019

Santhosh1509 commented Aug 31, 2019

glenn-jocher commented Aug 31, 2019

glenn-jocher commented May 3, 2020

joel5638 commented May 3, 2020

glenn-jocher commented May 4, 2020 • edited Loading

glenn-jocher commented May 4, 2020

glenn-jocher commented May 4, 2020

joel5638 commented May 4, 2020

glenn-jocher commented May 5, 2020

glenn-jocher commented May 5, 2020 • edited Loading

glenn-jocher commented May 5, 2020

glenn-jocher commented May 5, 2020

glenn-jocher commented May 5, 2020 • edited Loading

joel5638 commented May 5, 2020

joel5638 commented May 6, 2020 • edited Loading

glenn-jocher commented May 6, 2020

glenn-jocher commented May 6, 2020

renosatyaadrian commented May 18, 2020

glenn-jocher commented May 18, 2020

github-actions bot commented Aug 1, 2020

AymenKermiche commented Aug 12, 2020

RadoslawDebosz commented Oct 21, 2020

dbkalaria commented Jul 26, 2021

glenn-jocher commented Jul 26, 2021

Utsabab commented Mar 21, 2024

glenn-jocher commented Mar 22, 2024

glenn-jocher commented Feb 22, 2019 •

edited

Loading

glenn-jocher commented Feb 27, 2019 •

edited

Loading

glenn-jocher commented Mar 23, 2019 •

edited

Loading

glenn-jocher commented May 14, 2019 •

edited

Loading

Santhosh1509 commented Aug 26, 2019 •

edited

Loading

Santhosh1509 commented Aug 26, 2019 •

edited

Loading

Santhosh1509 commented Aug 28, 2019 •

edited

Loading

glenn-jocher commented Aug 28, 2019 •

edited

Loading

Santhosh1509 commented Aug 29, 2019 •

edited

Loading

glenn-jocher commented May 4, 2020 •

edited

Loading

glenn-jocher commented May 5, 2020 •

edited

Loading

glenn-jocher commented May 5, 2020 •

edited

Loading

joel5638 commented May 6, 2020 •

edited

Loading