You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When taking videos input, the feature maps in each layer have four dimensions, i.e., THW*C. Are the attention maps are still query-independent? Could you please give more details? Thanks a lot.
The text was updated successfully, but these errors were encountered:
When taking videos input, the feature maps in each layer have four dimensions, i.e., THW*C. Are the attention maps are still query-independent? Could you please give more details? Thanks a lot.
The text was updated successfully, but these errors were encountered: