-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structure decoder with beam size 3 not working #12
Comments
Hmm, I did not see anything obviously wrong. I can only make some guesses. First, I did not use pre-trained resnet, because I think pretrained model on imagenet would not help much with tables. I trained the whole resnet from scratch. Second, when you pre-process the images into h5 files, what resolution did you use? Another thing worth of trying is pass training images into your model and see what you get. If it does not even work on training samples (where your training log shows high accuracy), I thing there maybe something wrong in your inference code, or maybe your training batch does not iterate through your data properly. |
Hi @zhxgj Thank you for your quick response. Let me fine tune all the layers of resnet and give it a try. Did you fine tune the decoder embedding layer also? I'll also check passing the training image to inference code and check. Will update here. |
Hi @zhxgj I tried to use resnet101 without pretrained model and fine tune all the layers of resnet101 as below,
Fine tuning,
After first epoch i see below,
Second epoch,
Still i feel i'm doing something wrong. As suggested, i tried to use training image for inference, but not getting correct output. Actual Output, I have not changed anything from caption.py from https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning/blob/master/caption.py And also regarding training, i have not changed any major code. I tried with flickr8k with my code base, it works as expected. But not getting correct output for table structure. Requesting to please help. |
I'm also trying to run the training with top1 word accuracy, i think which is nothing but greedy method. One more question, have you used any transforms function for normalize? I'm using below, |
@zhxgj I tried to run the inference while training the model it self. Output from the model is correct, in this case teacher forcing is enabled. But if i use trained model without teacher forcing the output is bad. PS: After removing the normalize function, by model has improved little bit. Still not working good. Looking forward for your response. |
Hi @Sharathmk99 , have you solved the problem yet? |
Hi @zhxgj
I went through your paper, its amazing paper. Thank you:)
Initially i thought of only training for structure task.
I started with amazing tutorial https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning
Downloaded the PubTabNet 2.0 dataset and preprocessed with below filters,
I took only
100k
examples from train and validation of5k
. I got wordmap of size 32(including start, end, pad and unk){"<thead>": 1, "<tr>": 2, "<td>": 3, "</td>": 4, "</tr>": 5, "</thead>": 6, "<tbody>": 7, "</tbody>": 8, "<td": 9, " colspan=\"5\"": 10, ">": 11, " colspan=\"2\"": 12, " colspan=\"3\"": 13, " rowspan=\"2\"": 14, " colspan=\"4\"": 15, " colspan=\"6\"": 16, " rowspan=\"3\"": 17, " colspan=\"9\"": 18, " colspan=\"10\"": 19, " colspan=\"7\"": 20, " rowspan=\"4\"": 21, " rowspan=\"5\"": 22, " rowspan=\"9\"": 23, " colspan=\"8\"": 24, " rowspan=\"8\"": 25, " rowspan=\"6\"": 26, " rowspan=\"7\"": 27, " rowspan=\"10\"": 28, "<unk>": 29, "<start>": 30, "<end>": 31, "<pad>": 0}
Finally i started training with below configuration,
On my first epoch, accuracy went up to 95%, which is not correct. I'm doing something wrong.
Example snapshot,
After 3 epoch, i took best model and run the inference with beam size as 3,
If i increase the beam size to 10 i get below output,
But above output doesn't change if i pass different image.
Can you help to point what i'm doing wrong here. I could have trained for some more epoch, but i felt accuracy and loss was not looking correct.
Please help.
The text was updated successfully, but these errors were encountered: