You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, I’m trying to understand the outputs of the output_recurrence function in models.py. Right now, I’m not seeing the updates I would expect. I’ve gone through the function step by step and somewhere it just starts multiplying everything with 0 and loses all values. All the fusion parameters seem to be initalized at 0 and I don’t see where they’re updated.
My first question:
weights_const: it’s defined as np.ones but then every time it’s used all the values are multiplied by 0 giving it only an output of zeroes?
My second question:
Output recurrence, I went through it some steps of the theano.scan, will give the results of the third update.
alphas (1): T.exp(T.dot(h_a, Wa_y)): random values shape: (50,128)
alphas (2):
alphas.reshaped to only the first two dimensions (extra question, I only have two dimensions, should I have three?)
shape: (50,128)
alphas (3):
Normalized alphas shape: (50,128)
weighed_context: (context * alphas[:,:,None]).sum(axis=0), random values shape: (128,512)
h_t: GRU.step, the hidden time step that is used in the recurrence, the initial value is h0, updated values have gone through sigmoid and tanh activations.
Late fusion (now I stop understanding)
lfc: T.dot(weighted_context, Wf_c) -Wf_c is a matrix of zeros so it becomes np.zeros((128,256)). shape: (128,256)
fw (fusion weights): T.nnet.sigmoid(T.dot(lfc, Wf_f) + T.dot(h_t, Wf_h) + bf) Wf_f, lfc, Wf_h and bf are all matrices of zeroes, since it’s taking sigmoid of it I get a matrix of 0.5. shape: (128,256)
hf_t (weighted fused context + hidden state): lfc * fw + h_t - lfc is 0, fw is 0.5 and h_t is just the hidden state, so hf_t = h_t in all steps. Shape: (128,256)
z: T.dot(hf_t, Wy) + by: Wy is a zero matrix and by a zero vector, becomes np.zeros((128,8)) shape: (128,8)
y_t: T.nnet.softmax(x)
Softmax of zeroes in every single step, result row_number * ([0.125] * y_vocabulary_size) Shape: (128,8)
I want to ask, shouldn’t I be able to update this? Are the fusion values all supposed to be zero through the process? Thanks. :)
Disclaimer: I write that values are random although they are being carefully updated if they were initialized as random.
The text was updated successfully, but these errors were encountered:
Hey, I’m trying to understand the outputs of the output_recurrence function in
models.py
. Right now, I’m not seeing the updates I would expect. I’ve gone through the function step by step and somewhere it just starts multiplying everything with 0 and loses all values. All the fusion parameters seem to be initalized at 0 and I don’t see where they’re updated.My first question:
weights_const: it’s defined as np.ones but then every time it’s used all the values are multiplied by 0 giving it only an output of zeroes?
My second question:
Output recurrence, I went through it some steps of the theano.scan, will give the results of the third update.
I have:
Inputs in the
theano.scan
:x_t:
context[3]: hidden layer forward and backward concatenated
shape:
(50,128,512)
h_tm1:
starts at
GRU.h0=np.zeros((128,256)
in the first iteration, updates with theGRU.step
functionshape:
(128,256)
Attention parameters
Wa_h:
d = 6/(256+512)
random between -d and d
shape:
(256, 512)
Wa_y:
d = 6/(512+1)
random between -d and d
shape:
(512, 1)
Late fusion parameters
Wf_h:
np.zeros((256,256))
shape:
(256, 256)
Wf_c:
np.zeros((512,256))
shape:
(512,256)
Wf_f:
np.zeros((256,256))
shape:
(512,256)
bf:
np.zeros((1,256))
shape:
(1,256)
Output model
Wy:
np.zeros((256,8))
shape:
(512,256)
by:
np.zeros((256,8))
shape:
(256,8)
context:
forward and backward sequences
shape:
(50,128,512)
projected_context:
T.dot(context, Wa_c) + ba = (context*random) + 0
shape:
(50,128,512)
Inside the function:
h_a:
values between -1 and 1
shape:
(50,128,512)
alphas (1):
T.exp(T.dot(h_a, Wa_y)): random values
shape:
(50,128)
alphas (2):
alphas.reshaped to only the first two dimensions (extra question, I only have two dimensions, should I have three?)
shape:
(50,128)
alphas (3):
Normalized alphas
shape:
(50,128)
weighed_context:
(context * alphas[:,:,None]).sum(axis=0)
, random valuesshape:
(128,512)
h_t:
GRU.step
, the hidden time step that is used in the recurrence, the initial value is h0, updated values have gone through sigmoid and tanh activations.Late fusion (now I stop understanding)
lfc:
T.dot(weighted_context, Wf_c)
-Wf_c is a matrix of zeros so it becomesnp.zeros((128,256))
.shape:
(128,256)
fw (fusion weights):
T.nnet.sigmoid(T.dot(lfc, Wf_f) + T.dot(h_t, Wf_h) + bf)
Wf_f, lfc, Wf_h
andbf
are all matrices of zeroes, since it’s taking sigmoid of it I get a matrix of0.5
.shape:
(128,256)
hf_t (weighted fused context + hidden state):
lfc * fw + h_t
-lfc
is0, fw
is0.5
andh_t
is just the hidden state, sohf_t = h_t
in all steps.Shape:
(128,256)
z:
T.dot(hf_t, Wy) + by
:Wy
is a zero matrix and by a zero vector, becomesnp.zeros((128,8))
shape:
(128,8)
y_t:
T.nnet.softmax(x)
Softmax of zeroes in every single step, result
row_number * ([0.125] * y_vocabulary_size)
Shape:
(128,8)
I want to ask, shouldn’t I be able to update this? Are the fusion values all supposed to be zero through the process? Thanks. :)
Disclaimer: I write that values are random although they are being carefully updated if they were initialized as random.
The text was updated successfully, but these errors were encountered: