About the inference #9

srymaker · 2024-04-19T09:42:24Z

Thanks for your great work!
I want to know, when I want to do style transfer task, do I need to input a reference picture, a style word corresponding to this reference picture and a target prompt to the model? Just like This triplet <reference img,reference style word,target prompt>

Tianhao-Qi · 2024-04-23T09:50:11Z

No, you only need a pair <reference image, target prompt>.

srymaker · 2024-04-23T09:53:56Z

Thank you for your answer. So what are the inputs and target during the training?

Tianhao-Qi · 2024-04-23T10:00:39Z

There are three kinds of training pairs:

reference and target images are with the same style, but distinct subjects (STRE);
reference and target images are with the same subject, but distinct styles (SERE);
reference and target images are the same (Reconstruction).
You can refer to Sec 3.2 in our paper.

srymaker · 2024-04-23T10:14:44Z

Thank you,but in the paper,the qforme’s input should have the text {content} or {style},what is it

Tianhao-Qi · 2024-04-23T11:37:32Z

The text input of Q-former is the word "content" or "style".

srymaker · 2024-04-23T11:39:55Z

Oh,i see,thanks for your patience

SkylerZheng · 2024-04-28T08:10:12Z

Hi @Tianhao-Qi , does the current released code support this "Stylized Reference Object Generation" function? Basically I want to convert my given image to a different style by providing the text only, the given image is the source image rather than the style image.

Tianhao-Qi · 2024-05-15T02:18:31Z

You can refer to this script. Besides, if you want to keep the structure of the source image as well, you'll need to use the controlnet.

LiamLiu62 · 2024-08-07T16:16:48Z

There are three kinds of training pairs:

reference and target images are with the same style, but distinct subjects (STRE);

reference and target images are with the same subject, but distinct styles (SERE);

reference and target images are the same (Reconstruction).
You can refer to Sec 3.2 in our paper.

In Dataset part, for "style", your paper says use the same prompts to generate the reference and target image. So, i think they should be in the same subject?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the inference #9

About the inference #9

srymaker commented Apr 19, 2024

Tianhao-Qi commented Apr 23, 2024

srymaker commented Apr 23, 2024

Tianhao-Qi commented Apr 23, 2024

srymaker commented Apr 23, 2024

Tianhao-Qi commented Apr 23, 2024

srymaker commented Apr 23, 2024

SkylerZheng commented Apr 28, 2024

Tianhao-Qi commented May 15, 2024

LiamLiu62 commented Aug 7, 2024

About the inference #9

About the inference #9

Comments

srymaker commented Apr 19, 2024

Tianhao-Qi commented Apr 23, 2024

srymaker commented Apr 23, 2024

Tianhao-Qi commented Apr 23, 2024

srymaker commented Apr 23, 2024

Tianhao-Qi commented Apr 23, 2024

srymaker commented Apr 23, 2024

SkylerZheng commented Apr 28, 2024

Tianhao-Qi commented May 15, 2024

LiamLiu62 commented Aug 7, 2024