You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have one question for the partition weight \theta learning in Eq(7), we may claim that the parameter \theta is learning mappings from image ids to the context(environment) space in \mathbb{R}^M.
I am just curious that if it is possible to directly learn a function that maps from the original feature space \mathcal{X} to the context space, such that theta will not be parameter matrix but a parameterized model?
I assume it could be, however a difference between the current implementation and parameterized model is, we may need multiple model for each X. As the paper has shown that CAAM can be plugged in any intermediate layers in the models, thus for different intermediate X, we need to learn different parameterized models. Different from that, a global parameter matrix \theta is more efficient, since it can be shared across different intermediate representation of X. Please let me know if there is anything missing.
Thank you in advance.
The text was updated successfully, but these errors were encountered:
Thank you sharing this amazing work.
I have one question for the partition weight \theta learning in Eq(7), we may claim that the parameter \theta is learning mappings from image ids to the context(environment) space in \mathbb{R}^M.
I am just curious that if it is possible to directly learn a function that maps from the original feature space \mathcal{X} to the context space, such that theta will not be parameter matrix but a parameterized model?
I assume it could be, however a difference between the current implementation and parameterized model is, we may need multiple model for each X. As the paper has shown that CAAM can be plugged in any intermediate layers in the models, thus for different intermediate X, we need to learn different parameterized models. Different from that, a global parameter matrix \theta is more efficient, since it can be shared across different intermediate representation of X. Please let me know if there is anything missing.
Thank you in advance.
The text was updated successfully, but these errors were encountered: