-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
small lesson about problems during train my own dataset #128
Comments
thanks for the update, I will add these to the readme, a lot of people are having nan loss problem, |
@zzh8829 I've provided another detailed explanation here on nan lose, its found almost in every nan loss issue here. Hopefully you can add it too |
I added a section in read me with @LongxingTan 's insight, @AnaRhisT94 is it possible for you to make a pull request on the readme file with your detailed explanation? I am not sure which one specifically you are referring to. I really appreciate you helping other people to solve their training problems. It would be great if we can share that knowledge to everyone else too. |
Zihao - do you want me to make a pull request about integrating your change
for adding more than 80 classes?
…On Wed, 18 Dec 2019 at 11:24, Zihao Zhang ***@***.***> wrote:
I added a section in read me with @LongxingTan
<https://github.com/LongxingTan> 's insight, @AnaRhisT94
<https://github.com/AnaRhisT94> is it possible for you to make a pull
request on the readme file with your detailed explanation? I am not sure
which one specifically you are referring to. I really appreciate you
helping other people to solve their training problems. It would be great if
we can share that knowledge to everyone else too.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#128?email_source=notifications&email_token=AFR5TKA74JQMMJPEPDSZFBLQZH24RA5CNFSM4J3XVIZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHFUVGI#issuecomment-566971033>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFR5TKCEYUESBYHJBVMIQKLQZH24RANCNFSM4J3XVIZA>
.
|
@antongisli for sure that would be amazing |
I compiled a full tutorial at https://github.com/zzh8829/yolov3-tf2/blob/master/docs/training_voc.md on custom training. welcome to add your learning on it |
@LongxingTan @zzh8829 I applied this idea to the code as follows and it did converge a lot faster. And we can see conf_focal = tf.squeeze(tf.pow(true_obj-pred_obj, 2), -1)
obj_loss = binary_crossentropy(true_obj, pred_obj) * conf_focal Can you shed some light on it? If I'm not mistaken, this is the focal loss for gamma=2. |
@zzh8829 I'm just wondering why you chose |
@nicolefinnie
you are right that this is focal loss for gamma=2. |
Oops, @LongxingTan thanks! I overlooked your reply. Adding But if we add the focal loss, the first term will get more weight and get penalized more, so adding the focal loss actually changes the Reference: // calculating the best IOU, skip...
// calculate the loss of false positives, ignore true positives when it crosses the ignore threshold, equivalent to `(1-obj_mask)*ignore_mask*obj_loss` in this repo
l.delta[obj_index] = 0 - l.output[obj_index];
if (best_iou > l.ignore_thresh) {
l.delta[obj_index] = 0;
}
// calculate the loss of the true positives above the threshold, equivalent to `obj_mask*obj_loss` in this repo
if (best_iou > l.truth_thresh) {
l.delta[obj_index] = 1 - l.output[obj_index];
} |
@LongxingTan Is |
The validation losses are zeros from the first to the end iteration. The detection has no boxes into the output.jpg when I use my training model for 7 epochs (yolov3_train_7.tf) |
There are so many different codes and formats out there and most are just to get a PoC.
What does that mean ? Which one ?
Or as the FAQ is saying, is it what I may understand:
? |
Got it, it's the second one |
@LongxingTan Hi, I am experiencing the problem of the validation loss exploding and then not converging properly on my custom dataset. In the first few epochs, validation loss is reasonable. Then it explodes to some large number like 2000000. You mentioned that you made changes to the backbone? Can you guide me with making those changes to see if they can help my problem? Thank you. Edit: I would also like to add that I successfully trained the yolov3 tiny model on my custom dataset. This problem only seems to be happening with yolov3. |
yeah, it looks like the same phenomenon as my situation.
|
@LongxingTan Thank you so much for the code. I will look into it and try to locate the error. Also, if you have time, can you check the new issue I created #206 . It discusses the problem I am experiencing and gives a log of my training output for further inspection. Edit: Quick question, did you get the backbone code from a single repository or did you combine them from multiple? Thank you for your help! |
@LongxingTan where exactly is class_prob_loss located and how to remove the sigmoid operator?
|
maybe like this, when parse/decode the network output to calculate the loss, we have to parse the value by anchor to the value by real coordinates. So in the decode function, there is sigmoid to change the scale of value. in this repo, you can find yolo_boxes function in models.py
Yololoss function in models.py
the last item class_loss is the class_prob_loss. |
Hi @LongxingTan, may I know what hyperparameters did you used to train the model? I'm using: In my implementation, I have to lower down the confidence threshold to 0.1. Many thanks in advance! |
Actually i have try these code to deal the class unbalanced problem, and it work well.
then update the ignore_mask using the negative num:
final obj_loss is:
|
Hi @yyccR |
@yjwong1999 |
thanks zzh8829 for the code sharing, really nice writing, I like it
when i use it to try training my own dataset, i have some problems, that's how i solve them.
hope this could save some time for others.
conf_focal=tf.pow(true_obj-pred_obj , 2)
as a multiplier in confidence_lossThe text was updated successfully, but these errors were encountered: