Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the inference #9

Open
srymaker opened this issue Apr 19, 2024 · 9 comments
Open

About the inference #9

srymaker opened this issue Apr 19, 2024 · 9 comments

Comments

@srymaker
Copy link

Thanks for your great work!
I want to know, when I want to do style transfer task, do I need to input a reference picture, a style word corresponding to this reference picture and a target prompt to the model? Just like This triplet <reference img,reference style word,target prompt>

@Tianhao-Qi
Copy link
Collaborator

No, you only need a pair <reference image, target prompt>.

@srymaker
Copy link
Author

Thank you for your answer. So what are the inputs and target during the training?

@Tianhao-Qi
Copy link
Collaborator

There are three kinds of training pairs:

  1. reference and target images are with the same style, but distinct subjects (STRE);
  2. reference and target images are with the same subject, but distinct styles (SERE);
  3. reference and target images are the same (Reconstruction).
    You can refer to Sec 3.2 in our paper.

@srymaker
Copy link
Author

Thank you,but in the paper,the qforme’s input should have the text {content} or {style},what is it

@Tianhao-Qi
Copy link
Collaborator

The text input of Q-former is the word "content" or "style".

@srymaker
Copy link
Author

Oh,i see,thanks for your patience

@SkylerZheng
Copy link

Hi @Tianhao-Qi , does the current released code support this "Stylized Reference Object Generation" function? Basically I want to convert my given image to a different style by providing the text only, the given image is the source image rather than the style image.

@Tianhao-Qi
Copy link
Collaborator

You can refer to this script. Besides, if you want to keep the structure of the source image as well, you'll need to use the controlnet.

@LiamLiu62
Copy link

There are three kinds of training pairs:

  1. reference and target images are with the same style, but distinct subjects (STRE);
  2. reference and target images are with the same subject, but distinct styles (SERE);
  3. reference and target images are the same (Reconstruction).
    You can refer to Sec 3.2 in our paper.

In Dataset part, for "style", your paper says use the same prompts to generate the reference and target image. So, i think they should be in the same subject?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants