Output values of output_recurrence #59

helga-lvl · 2020-04-01T16:38:52Z

Hey, I’m trying to understand the outputs of the output_recurrence function in models.py. Right now, I’m not seeing the updates I would expect. I’ve gone through the function step by step and somewhere it just starts multiplying everything with 0 and loses all values. All the fusion parameters seem to be initalized at 0 and I don’t see where they’re updated.

My first question:

weights_const: it’s defined as np.ones but then every time it’s used all the values are multiplied by 0 giving it only an output of zeroes?

My second question:

Output recurrence, I went through it some steps of the theano.scan, will give the results of the third update.

I have:

minibatch_size = 128
n_hidden = 256
x_vocabulary_size = 15695
y_vocabulary_size = 8

Inputs in the theano.scan:

x_t:
context[3]: hidden layer forward and backward concatenated
shape: (50,128,512)

h_tm1:
starts at GRU.h0=np.zeros((128,256) in the first iteration, updates with the GRU.step function
shape: (128,256)

Attention parameters

Wa_h:
d = 6/(256+512)
random between -d and d
shape: (256, 512)

Wa_y:
d = 6/(512+1)
random between -d and d
shape: (512, 1)

Late fusion parameters

Wf_h:
np.zeros((256,256))
shape: (256, 256)

Wf_c:
np.zeros((512,256))
shape: (512,256)

Wf_f:
np.zeros((256,256))
shape: (512,256)

bf:
np.zeros((1,256))
shape: (1,256)

Output model

Wy:
np.zeros((256,8))
shape: (512,256)
by:
np.zeros((256,8))
shape: (256,8)

context:
forward and backward sequences
shape: (50,128,512)

projected_context:
T.dot(context, Wa_c) + ba = (context*random) + 0
shape: (50,128,512)

Inside the function:

h_a:
values between -1 and 1
shape: (50,128,512)

alphas (1):
T.exp(T.dot(h_a, Wa_y)): random values
shape: (50,128)

alphas (2):
alphas.reshaped to only the first two dimensions (extra question, I only have two dimensions, should I have three?)
shape: (50,128)

alphas (3):
Normalized alphas
shape: (50,128)

weighed_context:
(context * alphas[:,:,None]).sum(axis=0), random values
shape: (128,512)

h_t:
GRU.step, the hidden time step that is used in the recurrence, the initial value is h0, updated values have gone through sigmoid and tanh activations.

Late fusion (now I stop understanding)

lfc:
T.dot(weighted_context, Wf_c) -Wf_c is a matrix of zeros so it becomes np.zeros((128,256)).
shape: (128,256)

fw (fusion weights):
T.nnet.sigmoid(T.dot(lfc, Wf_f) + T.dot(h_t, Wf_h) + bf)
Wf_f, lfc, Wf_h and bf are all matrices of zeroes, since it’s taking sigmoid of it I get a matrix of 0.5.
shape: (128,256)

hf_t (weighted fused context + hidden state):
lfc * fw + h_t - lfc is 0, fw is 0.5 and h_t is just the hidden state, so hf_t = h_t in all steps.
Shape: (128,256)

z:
T.dot(hf_t, Wy) + by: Wy is a zero matrix and by a zero vector, becomes np.zeros((128,8))
shape: (128,8)

y_t:
T.nnet.softmax(x)
Softmax of zeroes in every single step, result row_number * ([0.125] * y_vocabulary_size)
Shape: (128,8)

I want to ask, shouldn’t I be able to update this? Are the fusion values all supposed to be zero through the process? Thanks. :)

Disclaimer: I write that values are random although they are being carefully updated if they were initialized as random.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output values of output_recurrence #59

Output values of output_recurrence #59

helga-lvl commented Apr 1, 2020

Output values of output_recurrence #59

Output values of output_recurrence #59

Comments

helga-lvl commented Apr 1, 2020

Attention parameters

Late fusion parameters

Output model

Inside the function:

Late fusion (now I stop understanding)