Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multimodal: initialize hidden state of encoder + transformer #9

Open
LinuxBeginner opened this issue Jun 24, 2020 · 2 comments
Open

Comments

@LinuxBeginner
Copy link

LinuxBeginner commented Jun 24, 2020

Hi, could you please tell me how using the Image as additional data to initialise the encoder hidden states (Calixto et al., 2017) take place when implemented with the transformer model?

python train_mm.py -data dataset/bpe -save_model model/IMGE_ADAM -gpuid 0 -path_to_train_img_feats image_feat/train_vgg19_bn_cnn_features.hdf5 -path_to_valid_img_feats image_feat/valid_vgg19_bn_cnn_features.hdf5 -enc_layers 6 -dec_layers 6 -encoder_type transformer -decoder_type transformer -position_encoding -epochs 300 -dropout 0.1 -batch_size 128 -batch_type tokens -optim adam -learning_rate 0.01 -gpuid 0 --multimodal_model_type imge

On running the above command, does the system ignore the image features and train the system with text only transformer model? @Eurus-Holmes

@Eurus-Holmes
Copy link
Owner

@LinuxBeginner
Please refer to README.md.

To train a multi-modal NMT model, use the train_mm.py script. In addition to the parameters accepted by the standard train.py (that trains a text-only NMT model), this script expects the path to the training and validation image features, as well as the multi-modal model type (one of imgd, imge, imgw, or src+img).

@LinuxBeginner
Copy link
Author

LinuxBeginner commented Aug 14, 2020

@Eurus-Holmes
Thank you for the response. I get what you are implying.
It's just that, in Calixto et al., 2017, the author only talks about the attention mechanism of (Bahdanau et al., 2014) and not the transformer model.

We tried the transformer model with text only ( train.py script) and also with multimodal (train_mm.py script). But, there was no improvement in the result and the BLEU scores are almost same.

So, I was under the assumption that, even if I use the train_mm.py script along with one of the multi-modal model type to train transformer model (as mentioned in my question), train_mm.py script ignores the multimodal approach and simply train with the text only version.

Our goal is to train the multi-modal NMT (transformer) model.

Please correct me if I am wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants