Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRANSFER LEARNING EXAMPLE #106

Closed
glenn-jocher opened this issue Feb 22, 2019 · 82 comments
Closed

TRANSFER LEARNING EXAMPLE #106

glenn-jocher opened this issue Feb 22, 2019 · 82 comments
Assignees
Labels
Stale Stale and schedule for closing soon

Comments

@glenn-jocher
Copy link
Member

glenn-jocher commented Feb 22, 2019

This guide explains how to train your data with YOLOv3 using Transfer Learning. Transfer learning can be a useful way to quickly retrain YOLOv3 on new data without needing to retrain the entire network. We accomplish this by starting from the official YOLOv3 weights, and setting each layer's .requires_grad field to false that we do not want to calculate gradients for and optimize.

Before You Start

  1. Update (Python >= 3.7, PyTorch >= 1.3, etc.) and install requirements.txt dependencies.
  2. Clone repo: git clone https://github.com/ultralytics/yolov3
  3. Download COCO: bash yolov3/data/get_coco2017.sh

Transfer Learning

1. Download pretrained weights from our Google Drive folder that you want to use to transfer learn, and place them in yolov3/weights/.

2. Update *.cfg file (optional). Each YOLO layer has 255 outputs: 85 outputs per anchor [4 box coordinates + 1 object confidence + 80 class confidences], times 3 anchors. If you use fewer classes, reduce filters to filters=[4 + 1 + n] * 3, where n is your class count. This modification should be made to the layer preceding each of the 3 YOLO layers. Also modify classes=80 to classes=n in each YOLO layer, where n is your class count.
screenshot 2019-02-21 at 19 40 01

3. Train.

python3 train.py --data coco1cls.data --cfg yolov3-spp-1cls.cfg --weights weights/yolov3-spp.pt --transfer

Run the above code to transfer learn on COCO, or specify your own data as --data data/custom.data (See https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data).

If you created a custom *.cfg file, specify it as --cfg custom.cfg.

You can observe in the Model Summary (using model_info(model, report='full') in train.py) that only the 3 YOLO layers have their gradients activated now (all other layers are frozen for duration of training):

Screenshot 2019-09-12 at 12 25 22

Reproduce Our Environment

To access an up-to-date working environment (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled), consider a:

@glenn-jocher glenn-jocher self-assigned this Feb 22, 2019
@jw-pyo
Copy link

jw-pyo commented Feb 27, 2019

Hi @glenn-jocher , I have a question about this. I want to change the configuration of yolo layers(remove some layer, change the number of filters, etc..) and apply transfer learning. In this case, is it possible to use transfer learning using the official weight? If it's possible, could you give me the way or just a keyword about this?

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Feb 27, 2019

@jw-pyo you can do anything you want, but you have to do it, we can't "give you a way". Recommend you visit our tutorials to get started, and the PyTorch tutorials for more general customization questions.

https://docs.ultralytics.com/yolov5/tutorials/train_custom_data
https://github.com/ultralytics/yolov3/wiki/Example:-Transfer-Learning
https://pytorch.org/tutorials/

@hac135
Copy link

hac135 commented Mar 23, 2019

I hava a problem, I want to train some new classes and pictures using transfer learning.
but my classes number=7. so if I use darknet53.conv.74 as pretrained model, it doesn't work !
what should I do

@jw-pyo
Copy link

jw-pyo commented Mar 23, 2019

@hac135 If you want to use pretrained model as transfer learning but your own model has different shape, what I know is just copying the weights which are same shape with pretrained model, and about layers of different shape, you just manually initialize the corresponding layer.

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Mar 23, 2019

@hac135 most people don't realize this, and it's not the recommended method to go about things, but you can technically use the existing YOLOv3 architecture (and hence the pretrained yolov3.pt) to train any model with n<=80 classes with no changes. The unused conf outputs will learn to simply default to zero, and the rest of the unused outputs (the box and class conf associated with those unused classes) will no longer matter.

For example, our single class tutorial operates just as well with no modifications to the cfg file:
https://github.com/ultralytics/yolov3/wiki/Example:-Train-Single-Class

It's not clean and its not optimal, but it works.

@hac135
Copy link

hac135 commented Mar 24, 2019

@hac135 most people don't realize this, and it's not the recommended method to go about things, but you can technically use the existing YOLOv3 architecture (and hence the pretrained yolov3.pt) to train any model with n<=80 classes with no changes. The unused conf outputs will learn to simply default to zero, and the rest of the unused outputs (the box and class conf associated with those unused classes) will no longer matter.

For example, our single class tutorial operates just as well with no modifications to the cfg file:
https://github.com/ultralytics/yolov3/wiki/Example:-Train-Single-Class

It's not clean and its not optimal, but it works.

Thank you ! it did works!

@hac135
Copy link

hac135 commented Mar 24, 2019

@hac135 If you want to use pretrained model as transfer learning but your own model has different shape, what I know is just copying the weights which are same shape with pretrained model, and about layers of different shape, you just manually initialize the corresponding layer.

that's a good suggestion, thanks

@glenn-jocher
Copy link
Member Author

@shahidammer try training from scratch, and observe your training results in results.txt.

@glenn-jocher
Copy link
Member Author

glenn-jocher commented May 14, 2019

@shahidammer please note that most technical problems are due to:

  • Your changes to the default repository. If your issue is not reproducible in a fresh git clone version of this repository we can not debug it. Before going further run this code and ensure your issue persists:
sudo rm -rf yolov3  # remove exising repo
git clone https://github.com/ultralytics/yolov3 && cd yolov3 # git clone latest
python3 detect.py  # verify detection
python3 train.py  # verify training (a few batches only)
# CODE TO REPRODUCE YOUR ISSUE HERE
  • Your custom data. If your issue is not reproducible with COCO data we can not debug it. Visit our Custom Training Tutorial for exact details on how to format your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of training and testing data.
  • Your environment. If your issue is not reproducible in a GCP Quickstart Guide VM we can not debug it. Ensure you meet the requirements specified in the README: Unix, MacOS, or Windows with Python >= 3.7, Pytorch >= 1.0, etc.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

@parul19
Copy link

parul19 commented May 15, 2019

@hac135 most people don't realize this, and it's not the recommended method to go about things, but you can technically use the existing YOLOv3 architecture (and hence the pretrained yolov3.pt) to train any model with n<=80 classes with no changes. The unused conf outputs will learn to simply default to zero, and the rest of the unused outputs (the box and class conf associated with those unused classes) will no longer matter.
For example, our single class tutorial operates just as well with no modifications to the cfg file:
https://github.com/ultralytics/yolov3/wiki/Example:-Train-Single-Class
It's not clean and its not optimal, but it works.

Thank you ! it did works!

i want to retain the existing classes and add new class i.e total of 80+1=81 class in coco dataset.Please tell me how to do it using transfer learning

@glenn-jocher
Copy link
Member Author

@parul19 you create a new 81 class cfg. Follow the directions in the example above.

@sooonism
Copy link

Do we still need COCO dataset if we only do transfer-learning?

@glenn-jocher glenn-jocher pinned this issue May 29, 2019
@glenn-jocher
Copy link
Member Author

@sooonism you need whatever dataset you want to train on.

@glenn-jocher glenn-jocher unpinned this issue Jul 30, 2019
@glenn-jocher glenn-jocher pinned this issue Aug 7, 2019
@Santhosh1509
Copy link

Santhosh1509 commented Aug 26, 2019

@glenn-jocher
I am interested in extracting the vehicles on the road. So my interested Motorbike Bicycle Bus Car and truck.

I have a vehicle that is not truck but is being detected as truck. I have collected the new data for this vehicle in COCO format. I want to this add a new class to the existing pre trained network.

Planning to

  • load the final layer weights of the truck to this new class
  • alter the conf file according and start training

My question is how do i, do it?

@glenn-jocher
Copy link
Member Author

@Santhosh1509 well I would start by reviewing the examples in the wiki, such as the custom training tutorial:
https://github.com/ultralytics/yolov3/wiki

@Santhosh1509
Copy link

Santhosh1509 commented Aug 26, 2019

@glenn-jocher Need your opinion on this. I just saw a post called transfer learning tutorial for SSD using keras.

Its mentioned in

Option 1: Just ignore the fact that we need only 8 classes

This would work, and it wouldn't even be a terrible option. Since only 8 out of the 80 classes would get trained, the model might get gradually worse at predicting the other 72 clases in the second paragraph.

So I feel, even if i could some how train as i mentioned above for a particular new class, the prediction for the other classes might get affected.

Is my approach, right? Is there an alternative way where I could preserve the prediction of the other classes introducing this new class in the same neural network? I feel it needs to be trained from scratch then. What do you think?

@glenn-jocher
Copy link
Member Author

@Santhosh1509 training normally will produce the best results. Transfer learning produces mediocre results quickly.

@Santhosh1509
Copy link

Santhosh1509 commented Aug 28, 2019

@glenn-jocher How do I get to know the training loss,training accuracy,validation loss and validation accuracy ?

All i get is this during training
63856968-71b37180-c9c0-11e9-82f5-b6e8683f6f43

Please guide how do I tune my hyper parameters with this data that is being displayed here?

I could have increase the batch size I have more memory on the GPU
Untitled

I do not understand the comment on these line

parser.add_argument('--epochs', type=int, default=273) # 500200 batches at bs 16, 117263 images = 273 epochs

parser.add_argument('--batch-size', type=int, default=32) # effective bs = batch_size * accumulate = 16 * 4 = 64

parser.add_argument('--accumulate', type=int, default=2, help='batches to accumulate before optimizing')

PS: latest training image

obj and cls values decreasing, is it good for this training?

63859125-6104fa80-c9c4-11e9-9b46-7399c031a9f2

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Aug 28, 2019

@Santhosh1509 all of the information you mention is recorded in results.txt. You can plot this with from utils.utils import *; plot_results(). You should use batch_size 64 accumulate 1 if possible, if not compensate with smaller batch sizes and larger accumulation counts, i.e. batch_size 32 accumulate 2.

obj and cls are training losses, they are supposed to decrease during training. See #392 for hyperparameter evolution, and explore the open issues for answers to your questions.

@Santhosh1509
Copy link

Santhosh1509 commented Aug 29, 2019

@glenn-jocher This is what is stored in results.txt

image

obj cls total targets , I am confused as to how these relate to training loss,training accuracy,validation loss and validation accuracy

Don't we have a graph which is easy to visualize, rather than just numbers.

Something like this

image

Now we can use even tensor board support inside pytorch to visualize the values

As the name mentions HYPERPARAMETER EVOLUTION is to plot those not how these (training loss,training accuracy,validation loss and validation accuracy) changed per epoch

@glenn-jocher
Copy link
Member Author

@Santhosh1509 Tensorboard logs automatically in this repo if you have it installed. See #435

@Santhosh1509
Copy link

@glenn-jocher Please explain how obj cls total targets being displayed here relate to training loss,training accuracy,validation loss and validation accuracy?

I can only relate terms
P -> Precision
R -> Recall
mAP -> mean Average Precision
F1 ->F1 score

@glenn-jocher
Copy link
Member Author

@glenn-jocher accuracy is a classification metric, it is not used here. The metrics displayed during training are training losses and the number of targets per batch.

@Santhosh1509
Copy link

@glenn-jocher obj or cls which one of these is training loss and what does the other terms mean because both of them decrease during training.

@glenn-jocher
Copy link
Member Author

object loss and class loss. training loss is the total of all training losses.

@glenn-jocher
Copy link
Member Author

@joel5638 no, you call it once before training to convert your last.pt into a backbone.pt file ready to be used as pretrained weights for future trainings:

yolov3/utils/utils.py

Lines 607 to 619 in 5d42cc1

def create_backbone(f='weights/last.pt'): # from utils.utils import *; create_backbone()
# create a backbone from a *.pt file
x = torch.load(f, map_location=torch.device('cpu'))
x['optimizer'] = None
x['training_results'] = None
x['epoch'] = -1
for p in x['model'].values():
try:
p.requires_grad = True
except:
pass
torch.save(x, 'weights/backbone.pt')

@joel5638
Copy link

joel5638 commented May 3, 2020

@glenn-jocher perfect will do that. Thank u so much

@glenn-jocher
Copy link
Member Author

glenn-jocher commented May 4, 2020

@joel5638 you should update your repo, image plotting has been updated to show predictions and ground truth jpgs. See #1114

If objects are not labelled correctly in your ground truth jpg then you have a labelling problem.

@glenn-jocher
Copy link
Member Author

@joel5638 can you paste your test_gt.jpg and test_pred.jpg here?

@glenn-jocher
Copy link
Member Author

@joel5638 ah, it looks like it's working well! Remember there is NMS, so if the person and the face are largely occupying the same region one or the other may be suppressed. You could try it on zidane.jpg to compare, as in that photo the faces and the persons do not occupy similar areas, the way Bush does above.

@joel5638
Copy link

joel5638 commented May 4, 2020

@glenn-jocher perfect.
Will try it on zidane.jpg once the training is complete.

@glenn-jocher
Copy link
Member Author

@joel5638 --transfer flag is deprecated, you may have been using an older version of the repo before. Basically you no longer need it. Simply train normally, specifying the --weights you want to start from (but making a backbone from them first!).

Your command will technically work, but it is not recommended, as hyps with schedules like the learning rate will asssume 273/373 epochs are complete, which is not the case. So just create a backbone first, and then use a normal training command:

python train.py --data ... --weights weights/backbone.pt --cfg ...

@glenn-jocher
Copy link
Member Author

glenn-jocher commented May 5, 2020

@joel5638 just the same as before. For example to create weights/backbone.pt from weights/last.pt #106 (comment)

from utils.utils import *; create_backbone(f='weights/last.pt')

@glenn-jocher
Copy link
Member Author

@Works fine for me. Remember this is a python command, so you run it from a python console. If you are trying to run it from a bash prompt, you need to encapsulate the command in quotes appropriately.

Screen Shot 2020-05-05 at 1 13 53 PM

@glenn-jocher
Copy link
Member Author

@joel5638 from the ubuntu terminal you run the same command, but as python -c "", so this should work:

python -c "from utils.utils import *; create_backbone(f='weights/last.pt')"

@glenn-jocher
Copy link
Member Author

glenn-jocher commented May 5, 2020

@joel5638 I'm not sure. The command works fine for me. You could try omitting the argument, as it's the default argument anyways. Maybe the single quotes is causing problems.

Screen Shot 2020-05-05 at 1 46 12 PM

EDIT: updated image

@joel5638
Copy link

joel5638 commented May 5, 2020

@glenn-jocher got it. Its with the quotes. Thank you

@joel5638
Copy link

joel5638 commented May 6, 2020

@emmbertelen

try this in the command. this works

python3 -c "from utils.utils import *; create_backbone(f='weights/last.pt')"

@glenn-jocher
Copy link
Member Author

@joel5638 that's odd. I scanned my screen with iDetection and all people are picked up fine. Maybe your dataset is too small, or if you are using tiny you should switch to the default yolov3-spp.cfg with the default pretrained weights. Are you training with all default settings? You should also post your results.png.

IMG_A52038F68DA2-1

@glenn-jocher
Copy link
Member Author

@joel5638 looks fine. If you want to add a class, like face, be aware you need to train all the existing classes plus your new class. If you're already doing this, then you may just need a larger dataset or longer training.

@renosatyaadrian
Copy link

Hi @glenn-jocher, I want to ask you about the transfer learning. Is it possible to train a new dataset to increase the prediction of a model? The model I am using here is YOLOv3. In this case for an example of increasing the prediction of the motorbike class by adding a dataset that consists of a few motorbikes images.

I have tried training with new 44 images (34 training - 10 validation), but the result for detection in total was decreased. Is there anything wrong with my training step or my dataset?
I'm using this code
!python train.py --data data/coco16.data --cfg cfg/yolov3.cfg --weight weights/yolov3.pt --epoch 300 --batch-size 8

This one is the result of using yolov3.weights. There is 47 total detection,
Pict1

and this is the result of using yolov3-transf.weights (after trained). This result has slightly decreased to 37 in total,
Pict2

@glenn-jocher
Copy link
Member Author

@renosatyaadrian I very seriously doubt that you would expect to improve upon a model trained on perhaps thousands of images of motorbikes by training it on 34 images and then expecting it to generalize better.

If you come back with a dataset of 3400 images then perhaps.

@github-actions
Copy link

github-actions bot commented Aug 1, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Aug 1, 2020
@github-actions github-actions bot closed this as completed Aug 6, 2020
@AymenKermiche
Copy link

@glenn-jocher i want to apply the transfer learning fine-tuning(add 2 or 3 layers on YOLOv3 architecture and train it on my costum dataset) on darknet . Any help Bro and thanks a lot

@RadoslawDebosz
Copy link

Hi,
I have a problem with test results. I've trained yolo v3 -spp with default settings from pretrained weights on coco datset (transfer learning).
My dataset:
1000 images of city traffic from view of camera's
about 12k objects (cars, persons, trucks, buses)
train /valid split 0.8 / 0.2

and I freezed only last layers.

The results of training after 100 epochs :
obraz

And Then I runned test.py on validation set and I've got test results:
obraz

And on the training set:
obraz

I have 2 questions :

  1. Do you know why the test results are other (worse) than train results on the charts (Preciosion, Recall, F1, mAP)?
  2. I dont know why e.g. the recall on the trainig chart start from value ~0.7 and then decrease near to 0 after first epochs and then increase during learning again to the ~0.7. It looks like the model ignore the pretrained data and learn from the scratch.

@dbkalaria
Copy link

@kairavpatel the transfer learning tutorial is not recommended. We do not recommend freezing any layers, as this will nearly always result in poorer performance. To train from pretrained weights (recommended) use python3 train.py --weights ...

@glenn-jocher I'm bit confused between transfer learning and training on pretrained weights. Means isn't both are the same like in transfer learning we are using pretrained model weights and freeze the layers weights and fine tune the model. So in training from pretrained model are you just using the weights without freezing any layers and update the weights as training progresses.

@glenn-jocher
Copy link
Member Author

@Dhruv312 its just wording.

Basically freezing will always lead to worse results on large datasets.

@Utsabab
Copy link

Utsabab commented Mar 21, 2024

@kairavpatel the transfer learning tutorial is not recommended. We do not recommend freezing any layers, as this will nearly always result in poorer performance. To train from pretrained weights (recommended) use python3 train.py --weights ...

@glenn-jocher I'm bit confused between transfer learning and training on pretrained weights. Means isn't both are the same like in transfer learning we are using pretrained model weights and freeze the layers weights and fine tune the model. So in training from pretrained model are you just using the weights without freezing any layers and update the weights as training progresses.

@kairavpatel I get how you are confused with this cause I was in the same position. Let me explain:

When we say using pre-trained weights, the assumption is that our new custom data has images with labels that are also available in COCO. There are no new class labels in our new dataset, therefore, we can use pre-trained weights from COCO for our desired class.

Transfer learning is done when we have new class in our custom dataset. The COCO pre-trained model will not be able to detect it. Therefore, we will have to train the model with COCO + custom dataset with updated class labeling (0-79 COCO labels, 80-n new labels from custom data). This is transfer learning where pre-trained weights can be utilized to make predictions in custom dataset with new class labels. To speed up the process layer freezing is done for faster training time, however, mAP is reduced almost every time.

Hope this explanation helps!

@glenn-jocher
Copy link
Member Author

@Utsabab Great question! In the context of using pretrained weights, the idea is to leverage what the model has already learned without specifically freezing any layers. This means we update the weights across all layers based on the new data, which usually gives a better performance because the model can adapt more flexibly to the new task.

Transfer learning, as often discussed, involves modifying or extending the existing model architecture to better fit new data, which can include freezing certain layers to not update during training. Essentially, using pretrained weights and not freezing layers allows the whole model to adjust and learn from the new data, while freezing layers in transfer learning is more about fine-tuning or adapting the model to new, possibly related tasks.

So, when training with pretrained weights the command is simple:

python3 train.py --weights yolov3.pt --data yourdata.yaml

And there's no need to explicitly freeze layers unless you have a very specific case where you believe it's necessary. 🤓 Hope that clarifies things!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests