You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since I was working on implementing the ConvLSTM model in a precipitation estimation project and there were a couple of confusing points coming up when reading the original paper by Shi et al. and the code based on the paper. Here I present some notes emulating the underlying principles behind the code and how the code illustrates those points provided by Shi et al.
importtensorflowastf# A new class inherited from tf.nn.rnn_cell.RNNCellclassConvLSTMCell(tf.nn.rnn_cell.RNNCell):
def__init__(self,shape,filters,kernel,forget_bias=1.0,activation=tf.tanh,normalize=True, peehole=True, data_format='channel_last', reuse=None):
super(ConvLSTMCell,self).__init__(_reuse=reuse) #???self._kernel=kernelself._filters=filtersself._forget_bias=forget_biasself._activation=activationself._normalize=normalizeself._peehole=peehole# whether the previous layers' parameters are accessible ifdata_format=='channel_last':
# set the _size of the tensor as [spatial shape]+[num_filters].e.g, if every single input is 64*64 image, and the number of filters is 4, then the _size is [64,64,4]self._size=tf.TensorShape(shape+ [self._filters])
# ndims return the rank of the tensor or the dimension of the rank. E.g, if it's a 3D tensor, the method will return 3.self._feature_axis=self._size.ndimsself._data_format=None`
elifdata_format=='channel_first':
self._size=tf.TensorShape(shape+ [self._filters])
self._feature_axis=0self._data_format='NC'else:
raiseValueError("Unknown data fromat")
According to the official documents data_format : A string or None.
Specifies whether the channel dimension of the input and output is the last dimension (default, or if data_format does not start with "NC"), or the second dimension (if data_format starts with "NC"). For N=1, the valid values are "NWC" (default) and "NCW". For N=2, the valid values are "NHWC" (default) and "NCHW". For N=3, the valid values are "NDHWC" (default) and "NCDHW".
Returns:
A Tensor with the same type as input of shape [batch_size] + output_spatial_shape + [out_channels] if data_format is None or does not start with "NC", or [batch_size, out_channels] + output_spatial_shape
# @property # override the properties inherited from the parent class(RNNCell). defstate_size(self):
returntf.nn.rnn_cell.LSTMStateTuple(self._size,self._size)
defoutput_size(self):
returnself._size
## rewrite the main method -- call defcall(self, x, state):#state, x 哪里来的c, h=state# state is a tuple; c is the hidden state, h is the output of a whole bunches of cell operationsx=tf.concat([x,h],axis=self._feature_axis)
n=x.shape[-1].value# n: num_input_channels m=4*tf._filtersiftf._filter>1else4# m:num_output_channels; since there are four state contributing to forming the new hidden state, we multiply the number of filters by 4.W=tf.get_variable('kernel',self._kernel+[n,m]) # here shape = [3,3,input channels, output_channels]# compute the sum of N -d comvolution, see more here https://www.tensorflow.org/versions/master/api_docs/python/tf/nn/convolution# x: input, W: filtersy=tf.nn.convolution(x,W,'SAME',data_format=self._data_format)
For f gate, input gate and output gate, you can see they have the similar structure and both take in the X and H(t-1) which is the previous hidden state, therefore, the author concates these two items and present it as a new X
Notice that tf.nn..convolution is the major change Shi et al. made to original LSTM model. This operation illustrates the main point of capturing both temporal and spatial information, which is proposed by Shi et al.
The only difference between the original LSTM and convLSTM has been demonstrated in the following picture.
Any operations between Weights W and input [X,H(t-1)] in the FC_LSTM have been altered as convolutional operations.
# normalization ifnotself._normalize:
y+=tf.get_variable("bias",[m],initializer=tf.zeros_initializer())##zero initializer# Splits a tensor into sub tensors.# the shape of y is [batch_size, out_channels]+ output_spatical_shape, therefore, were gonna split output_channels into four equal parts using the feature_axis which has been declared previously.j,i,f,o=tf.split(y, 4, axis=self._feature_axis)
#j: input contribution(hidden state); i: input_gate; f:forget_gate; o:output_gate
ifself._peehole:
i+=tf.get_variable('W_ci',c.shape[1:])*c# c: C(t-1), the previous cell state. f+=tf.get_variable('W_fi',c.shape[1:])*c# c.shape[0] is the batch size dimension
If peehole is true, we can access to the previous cell state C(t-1).
Here, i and f update themselves by adding the corresponding item W_ci/W_fi * c
# Adds a Layer Normalization layer.ifself._normalize:
j=tf.contrib.layers.layer_norm(j)
i=tf.contrib.layers.layer_norm(i)
f=tf.contrib.layers.layer_norm(f) # see more https://www.tensorflow.org/versions/master/api_docs/python/tf/contrib/layers/layer_normf=tf.sigmoid(f+self._forget_bias)
i=tf.sigmoid(i)
c=c*f+i*self._activation(j)
ifself._peehole:
o+=tf.get_variable('W_oi',c.shape[1:])*cifself._normalize:
o=tf.contrib.layers.layer_norm(o)
c=tf.contrib.layers.layer_norm(c)
o=tf.sigmoid(o)
h=o*self._activation(c)
state=tf.nn.rnn_cell.LSTMStateTuple(c,h)
returnh, state#output is the hidden state, not cell state
Prescription
Since I was working on implementing the ConvLSTM model in a precipitation estimation project and there were a couple of confusing points coming up when reading the original paper by Shi et al. and the code based on the paper. Here I present some notes emulating the underlying principles behind the code and how the code illustrates those points provided by Shi et al.
According to the official documents
data_format
: A string or None.Specifies whether the channel dimension of the input and output is the last dimension (default, or if data_format does not start with "NC"), or the second dimension (if data_format starts with "NC"). For N=1, the valid values are "NWC" (default) and "NCW". For N=2, the valid values are "NHWC" (default) and "NCHW". For N=3, the valid values are "NDHWC" (default) and "NCDHW".
Returns:
A Tensor with the same type as input of shape
[batch_size] + output_spatial_shape + [out_channels]
if data_format is None or does not start with "NC", or[batch_size, out_channels] + output_spatial_shape
For f gate, input gate and output gate, you can see they have the similar structure and both take in the X and H(t-1) which is the previous hidden state, therefore, the author concates these two items and present it as a new X
Notice that
tf.nn..convolution
is the major change Shi et al. made to original LSTM model. This operation illustrates the main point of capturing both temporal and spatial information, which is proposed by Shi et al.The only difference between the original LSTM and convLSTM has been demonstrated in the following picture.
Any operations between Weights W and input [X,H(t-1)] in the FC_LSTM have been altered as convolutional operations.
If
peehole
is true, we can access to the previous cell state C(t-1).Here, i and f update themselves by adding the corresponding item W_ci/W_fi * c
Reference
The text was updated successfully, but these errors were encountered: