Pivotal Tuning CLI and how to use them. #121
Replies: 8 comments 33 replies
-
Thanks for your amazing work here! Excited to follow along the progress. Can you explain what initializer_tokens and placeholder_tokens mean? I'm looking to train a particular person's face, where should I put a name/unique token associated with them and where the class token ("man", "girl") go? |
Beta Was this translation helpful? Give feedback.
-
Thanks for this! Can you let us know the minimum VRAM that is required to run this? Also -- I'm wondering if it might be possible to provide an example with public domain images that we could use to try to reproduce a test result and make sure everything is set up properly. I appreciate that this probably is not a priority for you at the moment! |
Beta Was this translation helpful? Give feedback.
-
@cloneofsimo . This is my output running your example at https://github.com/cloneofsimo/lora/blob/master/training_scripts/use_face_conditioning_example.sh
|
Beta Was this translation helpful? Give feedback.
-
using civitai various models and others that i trained myself i get "Rank should be the same per model" error using the cloneofsimo/lora from replicate. What can i do to make the LoRAs compatible with that repo? Using Automatic1111 those loras work ok. |
Beta Was this translation helpful? Give feedback.
-
All of the examples use |
Beta Was this translation helpful? Give feedback.
-
hello, I have a question about the word "CLI". I know LoRA, and I know a little about Pivotal Tuning. But what does the word "CLI" mean? Is this short for CLIP? Or it refers to a new paper? |
Beta Was this translation helpful? Give feedback.
-
I really dont understand how to prepare data. There are some xxx.json some xxx.yaml other datasets have imagename.png and imagename.txt, I cannot relate etc to these. I could not find a clear guide to explain this. Peft is different , your lora and automaitic 1111 lora dataset seems different. Civit AI has an article to scape and train lora it is different. Hugging face diffusers lora is also different. Can you please explain this for a confused person :). I am ready to start from textual inversion and most basic. But since guides and codes have api like middlewares I cannot understand what lies behind and its logic. Thank you. |
Beta Was this translation helpful? Give feedback.
-
Also I put safetensors to automatic 1111 seems that it does not work. Either I could not get dataset tags right or safetensors does not work with it. In that case, How can I convert safetensors to pt ? btw believe me "I have googled a lot" |
Beta Was this translation helpful? Give feedback.
-
Most of the recent updates are about lora_pti : CLI for Pivotal Tuning Inversion, with various tricks and techs to get extreme performance LoRA training output.
All of the README examples, including example above, were built with
lora_pti
CLI with DEFAULT PARAMETERS Here are new parameters and what they mean:Extended Latent + Dataset
So, with recent extended latent, you can train multiple tokens for textual inversion, and they are declared with placeholder_tokens argument
"<TOK1>|<TOK2>|..."
, each of them seperated with|
character.If you are using caption dataset (That is, file name of the dataset is caption), you might want to map certain token in caption to the token of your need.
For example, let's say you initialized two tokens
<s1>|<s2>
, and you have an image file :a photo of <tok> holding flowers .jpg
, you want to substitute<tok>
with<s1> wearing <s2>
. Then, you use argument :--placeholder_token_at_data="<tok>|<s1> wearing <s2>"\
The argument will transform the caption to :
a photo of <s1> wearing <s2> holding flowers
for the image.Also, if you want to use template caption, use the argument :
There is type
style
andobject
available.Mask Conditioned Training
Use this if you want to focus on faces. Else, remove this line. Check this if you are interested what this does.
Training steps for Two stages
So there is two stage in PTI. One is Bayesian training textual inversion with high learning rate, and one is training LoRA.
If the concept is difficult, you want stage 1 steps to be higher. But 1000 is very likely to be ok. In many cases, having 500 ti, 500 tuning works just as ok.
If you are going to have smaller learning rate, you would definitely want to bump up these values, as it would need longer training time.
Other arguments (default is probably OK)
Here are kind of unimportant ones / or ones you might be familiar with. You can use the default values, but if you want to know :
If you have any questions, comment below, I will update this discussion whenever there is an update on the CLI part.
Beta Was this translation helpful? Give feedback.
All reactions