Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attention on different input and output length #14

Open
aayushee opened this issue Nov 17, 2017 · 5 comments
Open

Attention on different input and output length #14

aayushee opened this issue Nov 17, 2017 · 5 comments

Comments

@aayushee
Copy link

Hello
Thanks a lot for providing easy to understand tutorial and attention layer implementation.
I am trying to use attention on a dataset with different input and output length.
My training data sequence consists of size 6004 (600 4-dimensional points) and output one hot encoded is of size 7066 (66 symbols represented in a 70 length vector). I have to map the 600 points sequence to the 70 symbols for ~15000 such sequences.
Just after applying LSTM layer, I tried using a Repeated Vector with the output length for a small dataset. I read that Repeated Vector is used in encoder decoder models where output and input sequence are not of same length. Here is what I tried:
x_train.shape=(50,600,4)
y_train.shape=(50,70,66)
inputs = Input(shape=(x_train.shape[1:]))
rnn_encoded = Bidirectional(LSTM(32, return_sequences=False),name='bidirectional_1',merge_mode='concat',trainable=True)(inputs)
encoded = RepeatVector(y_train.shape[1])(rnn_encoded)
y_hat = AttentionDecoder(70,name='attention_decoder_1',output_dim=y_train.shape[2], return_probabilities=False, trainable=True)(encoded)

But the prediction from this model always gives same symbols in the output sequence after every run:
'decoded model output:', ['d', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I'])
('decoded original output:', ['A', ' ', 'M', 'O', 'V', 'E', ' ', 't', 'o', ' ', 's', 't', 'o', 'p', ' ', 'M', 'r', ' ', '.', ' ', 'G', 'a', 'i', 't', 's', 'k', 'e', 'l', 'l', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0'])
Could you please give an idea where I am going wrong and what can I possibly do to solve the problem?
Any help would be much appreciated.

Thanks
Aayushee

@pgyrya
Copy link

pgyrya commented Dec 13, 2017

Hello, Aayushee -

I've practiced with this library a bit, ultimately I made it work for my practice project (though it does conflict with later version of keras installation as another issue with time distributed dense layer suggests). I have also experienced similar very-imperfect translations at some point when the model was not well tuned - but I was able to make it work eventually.

Notice that the first and second symbols in your translations are different, so your model is technically able to generate different translations. Perhaps the model has simply not learned the right translations yet? With long sequences, the parameter space of the model may be too complex (e.g. high curvature) to be learned quickly. I have chosen to stick with words rather than symbols for output encoding to shorten sequence length and facilitate learning.

Could you confirm what happens if you run the optimization further? Can you see loss function improving substantially as you tune the model? I suggest to use relatively small learning rate, and go through many iterations of gradient descend to see if you can notice improvement.

@chungfu27
Copy link

Hi Aayushee,
If you use "return_sequence=False" and RepeatVector, encoder lstm will always return the same hidden vector into the input of decoder lstm. Attention mechanism needs "return_sequence=True" to return the hidden vector from every timestep of encoder lstm and caculate the different weighted sum vector at each timestep in decoder lstm.

@ghost
Copy link

ghost commented Jun 4, 2018

@chungfu27 If return_sequences are made true with repeat vector then you will be getting this error before passing to the decoder

ValueError: Input 0 is incompatible with layer repeat_vector_1: expected ndim=2, found ndim=3

@Kaustubh1Verma
Copy link

Yeah @chungfu27 .That's right,as said by ghost that making return sequence True,it won't be possible to use repeat vector which makes it incompatible for different lengths

@NehaTamore
Copy link

Yeah, @chungfu27. It doesn't make sense to make return_sequence false.
But have we found any workaround to implement attention with different input and output lengths?
I'm working on abstractive summarization and looks like we could concatenate zeros to match encoder_output and decoder_output length. will that work?
Also, has anyone found possible reasons for repeating words/characters in the inference model?
Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants