multimodal: initialize hidden state of encoder + transformer #9

LinuxBeginner · 2020-06-24T05:16:38Z

Hi, could you please tell me how using the Image as additional data to initialise the encoder hidden states (Calixto et al., 2017) take place when implemented with the transformer model?

python train_mm.py -data dataset/bpe -save_model model/IMGE_ADAM -gpuid 0 -path_to_train_img_feats image_feat/train_vgg19_bn_cnn_features.hdf5 -path_to_valid_img_feats image_feat/valid_vgg19_bn_cnn_features.hdf5 -enc_layers 6 -dec_layers 6 -encoder_type transformer -decoder_type transformer -position_encoding -epochs 300 -dropout 0.1 -batch_size 128 -batch_type tokens -optim adam -learning_rate 0.01 -gpuid 0 --multimodal_model_type imge

On running the above command, does the system ignore the image features and train the system with text only transformer model? @Eurus-Holmes

The text was updated successfully, but these errors were encountered:

Eurus-Holmes · 2020-08-10T06:14:10Z

@LinuxBeginner
Please refer to README.md.

To train a multi-modal NMT model, use the train_mm.py script. In addition to the parameters accepted by the standard train.py (that trains a text-only NMT model), this script expects the path to the training and validation image features, as well as the multi-modal model type (one of imgd, imge, imgw, or src+img).

LinuxBeginner · 2020-08-14T13:28:59Z

@Eurus-Holmes
Thank you for the response. I get what you are implying.
It's just that, in Calixto et al., 2017, the author only talks about the attention mechanism of (Bahdanau et al., 2014) and not the transformer model.

We tried the transformer model with text only ( train.py script) and also with multimodal (train_mm.py script). But, there was no improvement in the result and the BLEU scores are almost same.

So, I was under the assumption that, even if I use the train_mm.py script along with one of the multi-modal model type to train transformer model (as mentioned in my question), train_mm.py script ignores the multimodal approach and simply train with the text only version.

Our goal is to train the multi-modal NMT (transformer) model.

Please correct me if I am wrong.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal: initialize hidden state of encoder + transformer #9

multimodal: initialize hidden state of encoder + transformer #9

LinuxBeginner commented Jun 24, 2020 •

edited

Loading

Eurus-Holmes commented Aug 10, 2020

LinuxBeginner commented Aug 14, 2020 •

edited

Loading

multimodal: initialize hidden state of encoder + transformer #9

multimodal: initialize hidden state of encoder + transformer #9

Comments

LinuxBeginner commented Jun 24, 2020 • edited Loading

Eurus-Holmes commented Aug 10, 2020

LinuxBeginner commented Aug 14, 2020 • edited Loading

LinuxBeginner commented Jun 24, 2020 •

edited

Loading

LinuxBeginner commented Aug 14, 2020 •

edited

Loading