You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As the comments explain the reduce_mean call performs global average pooling across dimensions 1 (height) and 2 (width). This results in a feature map with size Nx1x1xC, which is then passed to a 1x1 conv (no shape change). As such, the tf.image.resize_bilinear call is used to upsample the feature dimensions to match the input dimensions so that they can be concatenated.
@haydengunraj , Thanks for explanation. I was mistakenly confused with the keepdims. So, if tf.image.resize_biliear converts A=Nx1x1xC to B=NxHxWxC, isn't the all HxW values are same with the 1x1 number (channel-wisely)? For example, A[n, 0, 0, c] == B[n, :, :, c] for any n and c?
Then, isn't it the same with the tf.tile in this case?
In the function "atrous_spatial_pyramid_pooling", (line 21, deeplab_model.py)
There is "image_level_features" (line 54--61)
I think "image_level_features" is same size as "inputs", since it is just a
reduce_mean
withkeepdims
.Also,
input_size = tf.shape(inputs[1:3])
.=> Then they are the same size, and why one should do the
tf.image.resize_bilinear(image_level_features, inputs_size)
?The text was updated successfully, but these errors were encountered: