You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi authors, thank you for sharing the awesome work.
As far as I understand, only the style representation from Q-former is used during the inference.
If it is correct, why is the content training needed tho?
Does it help the Q-former to have better disentangled representation for "style"?
Probably I missed some parts of the paper. Would appreciate it if somebody let me know.
Thanks!
The text was updated successfully, but these errors were encountered:
The goal of using dual content training is to help the model better distinguish between the style and the semantics of the reference image. Therefore, it will reduce the impact of reference image semantics and lead to better text alignment, as shown in Table 2 of our paper.
Hi authors, thank you for sharing the awesome work.
As far as I understand, only the style representation from Q-former is used during the inference.
If it is correct, why is the content training needed tho?
Does it help the Q-former to have better disentangled representation for "style"?
Probably I missed some parts of the paper. Would appreciate it if somebody let me know.
Thanks!
The text was updated successfully, but these errors were encountered: