WIP guide on style training with [filewords]. #443

piyarsquare · 2022-12-07T06:05:11Z

piyarsquare
Dec 7, 2022

NEWLY UPDATED (so probably already outdated...)

I tried to train a Simpsons style with best practices gleaned from this site. This is an outline from my workflow. Anything that was uncertain, I marked with a ❓. I would be happy to hear any insights. I will amend/have amended this post with your feedback, denoted with a ❗.
Thanks to @d8ahazard for all of this unbelievable fun we've been having. But do you think you could type a little slower? It's getting hard for us to keep up.
Also, there is an excellent post from @cerega66 detailing the effects of a wide range of parameters. I will update this post to reflect their results soon.

Note:
I use xyz as the new keyword in this example.
❗ This is a bad keyword since it is composed of two tokens ❗
You should pick a short keyword that is rarely used and is a single token. For my project, I used the asim keyword from the keyword lists from this reddit post. I do not know if the ordering of the list matters as the post suggests(❓). I selected a four-letter word from this github list, by text searching for pairs of letters that have something to do with my subject, "The Simpsons." I searched for sim and found asim was a single token near the bottom of this list. I tried asim out in the Tokenizer tab and it confirmed that it is a single token. I also generated a few test images with just "asim" as the prompt and they struck me as pretty vague. Be warned, if you do not test your token, you may end up with the Tottenham Hotspur Football Club. (thfc, also not a great token...)

DO NOT USE "xyz" as your keyword.

However, as a variable name we will use xyz as a placeholder in this post.
❗ According to @cerega66, using a two-token keyword will double your training time.

The image dataset.

Gather 100 good images In his guide, @nitrosocke recommends "70% people, 20% landscapes and 10% animals/objects."
Crop to 512x512 .png
I did this in Photoshop.
Here I made some mistakes. I used the used a crop preset set to 512px by 512 px. Even though I selected regions that were larger than that in size, photoshop downscaling algorithm would often add a "halo" to the edges, particularly, the black edges common to cartoon characters. I think the best thing to do is to crop without scaling, but second to that you can choose a downscaler that minimizes artifacts. GIMP has a "NoHalo" option. In photoshop, I got the following results from Professor Frink. I think all of them show some halo effect though nearest neighbor has the least and bilinear looks like the best trade off :

Select images and crop them so that you can describe them easily in the next section.
Rename all the files consistently:
In windows, select all the files in the image directory.
Select “rename.”
Type xyz and hit return.
All the files should be labeled xyz (1).png through xyz (100).png.

The captions.

For each cropped image, write a detailed caption.
I followed this reddit post.
Example caption: "A smiling Caucasian 40-year-old man with thick glasses and buck teeth dressed in a blue NASA jumpsuit with brown boots standing next to a garden with birds of paradise plants and ferns growing inside a large room. A robot arm picks flowers in the foreground."
I wrote the captions as a numbered list in a Word file and saved as a text file, xyz_captions.txt.
I used a short handmade python program to turn my 1-100 list into 100 txt files.
Important hang-up: Needed a “utf-8” output file so used:

import codecs
codecs.open(filename, ‘w’, ‘utf-8’)

At the end, each caption is matched to the file xyz (1).txt through xyz (100).txt and describe the correspondingly labeled .png files.
4. Put them all into the same directory for training: /path/to/xyz

Training.

For now, we are going to stick with the old-fashioned non-LORA stuff.
I am testing the LORA encoding out now. I don't really know what I'm doing, but I'm doing it much faster.

Open Auto1111 and select the Dreambooth tab

Go to the Create Model tab.

Name: This will be the base name for your model for example sd15_xyz (see above note for why this is a terrible name.)
Source checkpoint = v1-5-pruned.ckpt (7 GB weights)
3.Extract EMA Weights = unchecked. ❓
Scheduler = ddim ❓
Click “Create” and wait a minute.

Select your new model

Select model sd15_xyz from Model dropdown.
Ignore the Lora Model and Lora weights.
Half-Model = checked (because it’s just as good ❓)
Save Checkpoint to Subdirectory = checked (to keep your model directory uncluttered)

Go to the Concepts tab.

We will be training one concept "xyz style" so we will only use the Concept 1 tab. Set maximum training steps to -1.

For Directories
Dataset Directory = /path/to/xyz (directory containing xyz (n).png and .txt files.)
Classification Dataset Directory = blank
For Filewords:
Leave these empty. This is useful when swapping in a keyword to your captions. For example, if your captions were "A closeup photograph of a woman eating a pie." and you are training that particular woman to keyword xyz, you would put xyz for "instance token" and woman for the "class token."
For Prompts:
Instance Prompt = xyz style [filewords]
Class Prompt = empty, we are not using prior preservation (this time❓)
NOTE: ❗ I tried running the same model but with prior preservation.
See final section for notes on Prior Preservation. For style training, it is the current wisdom that prior preservation is not needed. Without prior preservation, training runs 50% faster. But it may be that you get better results in half the steps. For more details, see below.
Sample Image Prompt = xyz style. Mark Twain.
Sample Prompt Template File = path to file with a list of prompts to randomly select from during sample generation. Leave blank to use Sample Image Prompt.
Sample Image Negative Prompt = watermark, text, signature, cross-eyed. (What you do not want to see in your samples.)
For Image Generation leave untouched.

Go to the Parameters tab.

For Settings

Intervals:
Training steps per image (Epochs) = 100
Max Training Steps = 0 (let epochs determine the number of steps)
Pause After N Epochs = 0
Amount of time to pause between Epochs, in seconds = 0
Use Lifetime Steps/Epochs when Saving: checked
Save Preview/Ckpt Every epoch: unchecked (that would be 100 ckpt in this case.)
Save checkpoint frequency = 1000
Save Preview(s) frequency = 1000
Batch: ❗ trying out higher values here. now that I have xformers and adam 8 bit on, I can run larger batches. This seems to make everything go faster.
Batch Size = 1
Class Batch Size = 1
My 3090 Ti refuses anything higher.
Learning rate:
Learning rate = 1.72e-6 (thanks @cerega66!)*will explore this more in time.
Lora unet Learning Rate = does not matter because we're not using lora.
Lora Text Encoder Learning Rate = does not matter because we're not using lora.
Scale learning rate = unchecked ❓
❗ This section allows you to vary the learning rates over the course of training. One may wish to start with a high learning rate and then lower the rate as training proceeds. This is similar to the process of annealing where the temperature of glass or steel is reduced over time to obtain a more stable product. I may experiment with this in the future, but this is not high on my list.
Learning rate scheduler = constant (does not matter if SLR unchecked)
Learning rate warmup steps = 0 (does not matter if SLR unchecked)
Image processing
Resolution 512
Center crop = off
Apply Horizontal flip = off (do not want to sacrifice asymmetry of my images)
Miscellaneous:
Pretrained VAE Name or Path = blank
Use concept list = unchecked
Concept List = blank

For Advanced

Tuning
Use CPU only (SLOW) = unchecked
Use LORA = unchecked
Use EMA = checked (can't say why or what this does)
Use 8 bit Adam = ❗ checked less memory
Mixed Precision = fp16 (experimenting with bp16 which seems to be better suited to this type of work.)
Memory Attention = ❗ xformers!! I can now run a batch size of 4 (maybe more, waiting impatiently)
Don’t Cache Latents = checked
Train Text Encoder = checked
Prior loss weight = 1
Pad Tokens = checked
Shuffle tags = unchecked (but should it be❓)
Max token length = 300, because some of my captions are very long.
Gradients:
Gradient Checkpointing = checked, but probably not needed if more than 4GB of VRAM still free. I will try without this next time.
Adam Advanced: I find this section daunting and leave everything as is.

Hit Train!

On my 3090 Ti takes about 1.25 hours for 10,000 training steps = 100 epochs.
Preliminary results look good:

However, some generations give double pupils or no pupils.
I included some images in the training data with multiple characters and I think that may have been a mistake.

Prior Preservation.

I have tried a run with prior preservation turned on.
To do that, we have the following minor deviations from the above notes:
Our only changes are to the Concept tab.

Directories: Set Classification Dataset Directory = F:\training_data\xyz_classifiers. Make this directory and leave it empty. We will fill it with images when we start training. Once made, you can reuse those images for the same captioned images.
Prompts: Class Prompt = [filewords]
Image Generation: Total Number of Class/Reg Images = 1000 (Wisdom says 10 per training image. d8ahazard says you're punishing your GPU if you use more than 1000). Everything else, the same.

That is all. When the training begins, if there are no images in your classification dataset, the program will generate them from the text files in your training directory. That way, your classification images reflect "A smiling Caucasian 40-year-old man with thick glasses and buck teeth dressed in a blue NASA jumpsuit with brown boots standing next to a garden with birds of paradise plants and ferns growing inside a large room. A robot arm picks flowers in the foreground."
And Stable diffusion tries to match asim style to Frink without changing this guy too much.

Testing

The results are difficult to judge. I have not yet found a process for comparing models that makes me happy. I think the right tool is infinity grid generator. It generates an html with a nice interface where you can explore different seeds, parameters and models.

Screenshot of the web-interface:

Here are some images at 10K steps without prior preservation.

Same prompts and seeds at 10K steps with PP.

Which are better? The prior preservation model has a "locked-in" feeling where different seeds seem to generate very similar outputs. Another user suggested mixing the model back into the base.
These are mixed in at 75%. The cicadas defy the tag. but the others do have greater variation.

Mixed at 85%, we start getting cartoon bugs.

I also compare over people and landscapes but I do not have a satisfying metric for comparison.
I would like to hear what others do.

leppie · 2022-12-07T08:52:49Z

leppie
Dec 7, 2022

xyz is actually 2 tokens, but I assume that does not matter so much if you are training the text encoder.

3 replies

piyarsquare Dec 7, 2022
Author

I used xyz as a generic marker for the writeup.
For my project, I used the keyword lists from this reddit post.
I do not know if the ordering of the list matters as the post suggests. (❓)
I selected a four-letter word from this github list, by text searching for pairs of letters that have something to do with my subject.
I searched for sim and found asim was a single token near the bottom of this list.
I tried asim out in the Tokenizer tab and it comes up as a single token.
I also generated a few test images with just "asim" as the prompt and they struck me as pretty vague.

ekeric13 Dec 23, 2022

Where is the tokenizer tab? I am guessing it looks at an underlining word library and sees how it is used?

mykeehu Dec 23, 2022

Tokenizer is an extension, you can install it separately in SD.

cerega66 · 2022-12-08T01:35:21Z

cerega66
Dec 8, 2022

I'll try to tell you what I know.

Class Prompt: empty, we are not using prior preservation (this time❓)
Can be used without prior preservation.
Determines the area of your training, if you guessed right, it can reduce the required number of steps by 1.72 times.

Scale learning rate unchecked ❓
Allows you to train with a variable LR. You can save steps. But unlike using a low LR right away, it will simply smooth out the corners of rough details.

Memory Attention: default ❓
Specifies how the memory will be allocated. I advise you to leave it that way. xformers has problems with random numbers or precision. I can't say anything about others.

Train Text Encoder: checked ❓
I will not tell how it works, it just reduces the required number of steps by two. Possibly improves quality (needs more research).

Gradient Checkpointing: checked
Not sure if this is a required option on your 3090 TI. See how many VRAM a trainings uses with this setting. If more than 4 GB is free, you can turn it off, add speed.

xyz
Really bad thing. When using a multi-token token, you have to do extra work. For example: at 1e-6, a minimum of 6600 steps is needed to achieve the concept. When using a double token this goes up to 13200 steps, but using the Text Encoder drops it back down to 6600 steps. And when using a single token, we would get a good effect already at 3300 steps.

8 replies

cerega66 Dec 8, 2022

The simplest, in the web ui, select the model on which you want to train, then enter your token in the prompt window and see how many tokens it takes.

FugueSegue Dec 8, 2022

Oh my gosh. I've been using keywords words that have been at least 4 tokens. For example, if I'm training the face of a character named Bob, I would specify, "bobface221208". I use numbers representing the date a project file is created. It's a habit I've been using for years. I had no idea the length of the keyword was of any importance at all. What a strange mechanism.

FugueSegue Dec 12, 2022

I followed the advice above and chose "thfc" as an Instance Token. For the last several days I haven't had success training an image of my subject's face. I thought it had to do with my inability to comprehend how [filewords] and caption text files worked. Finally, after finishing a recent failed training, I simply entered "thfc" as my prompt. It generated an image of a football stadium. I tried "thfc woman" and it generated pictures of a women's football team. I did a Google search for "thfc" and discovered that I was training my model to look like the members of the Tottenham Hotspur Football Club. sigh

I'm trying a new method of naming my Instance Tokens. I'm taking the subjects first name, last name, the word "face", and my initials. I'm removing all vowels and space characters. Yes, this makes an Instance Token that has six tokens. But the word returns nothing when I search for it in Google.

Am I screwing things up by using six tokens? If it finally gets me training models again, I'm not sure I care.

cerega66 Dec 13, 2022

Any token in the model has some value, and you can train to merge multiple tokens, but then you need more steps. Or you can retrain the token as they usually do. If you don’t get what you trained after retraining, you didn’t go through the minimum steps necessary to rewrite the token or didn’t give enough hints to generate it.
Can you provide full details? Including number of images and training model.

piyarsquare Dec 14, 2022
Author

I followed the advice above and chose "thfc" as an Instance Token. For the last several days I haven't had success training an image of my subject's face. I thought it had to do with my inability to comprehend how [filewords] and caption text files worked. Finally, after finishing a recent failed training, I simply entered "thfc" as my prompt. It generated an image of a football stadium. I tried "thfc woman" and it generated pictures of a women's football team. I did a Google search for "thfc" and discovered that I was training my model to look like the members of the Tottenham Hotspur Football Club. sigh

I did suggest in the above guide to run your prompt through without anything to see if the result is pretty generic. On account of your experience, I will edit the original post to bold.

Have you found a better token? Let us know if you come up with a good solution.

kuroiokami · 2022-12-12T04:00:55Z

kuroiokami
Dec 12, 2022

Has the UI changed since this was written? I can't find or seems to be other fields?

1 reply

FugueSegue Dec 12, 2022

Yes, the UI seems to get an overhaul at least once a week. This guide needs to be updated.

I like the improvements. But until the updates stabilize and the UI remains unchanged, I can't write my own guide. I started writing one but the UI changed. And then it changed again. So for now I've stopped writing a guide until I can learn how to train using v2-1 at 512 and 768.

Keep up the good work, folks! I appreciate what you are doing.

Jonesyyyyyy · 2022-12-12T20:46:00Z

Jonesyyyyyy
Dec 12, 2022

It's all good man I followed yours and another one and between the two I managed to hack together a model! I'll upload it to huggingface later cheers man

…

On Mon, 12 Dec 2022, 11:03 pm GunnarQuist, ***@***.***> wrote: Yes, the UI seems to get an overhaul at least once a week. This guide needs to be updated. I like the improvements. But until the updates stabilize and the UI remains unchanged, I can't write my own guide. I started writing one but the UI changed. And then it changed again. So for now I've stopped writing a guide until I can learn how to train using v2-1 at 512 and 768. Keep up the good work, folks! I appreciate what you are doing. — Reply to this email directly, view it on GitHub <#443 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AX26FIIE73ZKORISGLM4RCLWM32AFANCNFSM6AAAAAASWMMAGE> . You are receiving this because you are subscribed to this thread.Message ID: <d8ahazard/sd_dreambooth_extension/repo-discussions/443/comments/4377247@ github.com>

2 replies

kuroiokami Dec 12, 2022

Do you happen to have the link to another guide?

Jonesyyyyyy Dec 13, 2022

https://www.reddit.com/r/StableDiffusion/comments/zcr644/make_better_dreambooth_style_models_by_using/

piyarsquare · 2022-12-13T02:55:01Z

piyarsquare
Dec 13, 2022
Author

That was pretty quickly outdated.
I will try to find some time in the next few days to update.
I have learned a few things that I'll include briefly here.

Be careful when cropping your images.
Even if you are starting with something larger, downsampling adds artifacts.
You can even see it in Mark Twain above.
He's got a white halo to his black edge.
That's in the training data because it was added by the photoshop downsampler.

I gave regularization/class images a shot.
I am having a hard time determining what makes for a "good" model.
The regularized version has issues not present in the other version.
More on this later.

Keen to try some of the new features, but don't know what they do.
I will try to research through the discussions/issues before posting.

0 replies

mykeehu · 2022-12-13T17:31:27Z

mykeehu
Dec 13, 2022

My personal experience was that regularization images made my model more flexible, but I prefer to see that if you want full flexibility, you need to mix a strong DB model with a weight of 0.75 with a base model that is close to your theme. (Model A is the base model, Model B is yours.) So if general, then sd version 1.5, but if adult, then NovelAI.
After mixing, I got an absolutely flexible model, even responding properly to long prompts. So it's not worth skipping this either, it's worth mixing with a main model at the end.

2 replies

piyarsquare Dec 13, 2022
Author

I am trying that out now. Running an infinite grid to test prompts with multiple seeds over multiple models. I think 75% was a little too dilute, so I'm trying 85% too. Do you run the same number of epochs when using prior preservation? Or do you find that the model converges sooner?

mykeehu Dec 14, 2022

When I test the merging rates, I test with the same seed, yes. For me, above 0.75 the Model B was very dominant, and below 0.75 the Model A (base) model. That said, sometimes it's still good to have 0.70 and 0.80 mixing.

cerega66 · 2022-12-14T10:24:01Z

cerega66
Dec 14, 2022

@piyarsquare, I'm still in the process of researching the impact of all standard settings. But I would like you to conduct one experiment. Try raising batch_size to 2, 4 or 8, and leave the rest of the parameters. Then compare the two models. When testing this parameter with its increase, I got the best quality, especially in faces and anatomy. But the training will take longer.

5 replies

piyarsquare Dec 14, 2022
Author

I will try it when I am back at my "fun computer." But I'm pretty sure the computer gets mad if I set the batch size any higher than 1. Do I need to change any of my starting parameters? At present, I only have --xformers. I am running on a 3090 Ti.

cerega66 Dec 15, 2022

I will try it when I am back at my "fun computer." But I'm pretty sure the computer gets mad if I set the batch size any higher than 1. Do I need to change any of my starting parameters? At present, I only have --xformers. I am running on a 3090 Ti.

I trained on a 3060 with 12gb of memory at a value of 8. You have a 3090 ti with 24gb. I think you won't have any problems. Just like I wrote, it will take you more time to train. For me, at a value of 1, 10,000 steps takes 1.5 hours, at 4, 3.5 hours. But you obviously have to be faster. Do not change the rest of the parameters. Then compare your old result with the new one.

piyarsquare Dec 15, 2022
Author

I have it running now at a batch size of 2 (for both training and class batch). I had to turn on 8bit Adam to get it to run. The runtime is reduced from 2hrs to a projected 1.5hrs (100 images, 100 epochs, with prior preservation). It will finish in an hour. So far, the samples look like they are cooking just fine.

I tried and it will not run a batch size of 4. Do I need to set any other parameters differently? (everything I did is above.)

cerega66 Dec 15, 2022

This is rather strange, if you have OOM then Gradient Checkpointing can help. Perhaps EMA consumes a lot of memory.

piyarsquare Dec 16, 2022
Author

I had memory attention set to "default." By setting it to "xformers", turning on 8bit adam and setting mixed precision to bf16, I can now run a batch size of 4. The runtime is down to 1 hr for 100 epochs. I will try a batch size of 8 when it finishes.

phutngo · 2022-12-14T18:45:29Z

phutngo
Dec 14, 2022

@piyarsquare, First thank you for the guide!

For the Dataset Directory field under the Concepts tab, what is the format of the directory path we put into the field?

1: "C:_CODE_Github\stable-diffusion-webui\ __inputs\images_and_captions"

OR

2: "__inputs\images_and_captions"

OR

3: "/__inputs/images_and_captions"

I'm in Windows 11

1 reply

piyarsquare Dec 14, 2022
Author

I used:
F:\stable_diffusion\path\to\images
in Windows 10.

mike4llison · 2022-12-15T17:02:45Z

mike4llison
Dec 15, 2022

@piyarsquare From my understanding, you're using [filewords] incorrectly. As described in the filewords and prompt boxes, they should be as follows...

When using [filewords]

In the Filewords section:
Instance Token
yourinstancename
Class Token
style OR person OR dog etc.

In the Prompts section:
Instance Prompt
beautiful fine art painting of [filewords]
Class Prompt
[filewords] style OR person OR dog etc.

Also, the following should help.

and if you're using the 7GB EMA file you should check "extract EMA" Unchecked if you're using the 4GB checkpoint. And mirroring this in the advanced tab i.e. "use EMA"

You can generate the filewords descriptions in the train tab under Preprocess images and choose your flavour of interrogations (deepbooru seems to give the best results) and double check them/edit where necessary.

1 reply

piyarsquare Dec 16, 2022
Author

I am not sure you are right.

I think that the fileword section is used to do "find and insert" and/or "find and remove" on the caption text. That way, your keyword is invoked when your caption is referring to a training image "a photo of a man wearing glasses" becomes "a xyz photo of a man wearing glasses" if you set instance token = xyz and class token = photo. (possible it trims the "a" too?) I think, with much less certainty, that if you have a caption that says "a xyz photo of a man wearing glasses" it will leave the caption untouched in the training image but edit out the xyz from the classification caption.

I know some people use automated methods to tag. I think that's what many of these above functions are for. You fed the model only pictures of xyz girl, but the auto-captioner just writes girl in the captions. Using the filewords section, you can easily swap girl for xyz girl. I am only using 100 "carefully" cropped images and I want to be careful about my captions. Stable diffusion turns language into pictures so the better I articulate my image, the better my results, IMO.

In a previous iteration of the interface, the fileword section had a dropbox where you could select the contents of your captions to include just the description, description + class token, description + class token + instance token, and something else maybe? I would select "description" and that function has the behavior I describe in the main post. I imagine if both the fields in fileword section are blank, it defaults to this behavior.

I would be grateful if you could point me to anything which say otherwise. I am still reading in this space and learning whatever I can. Also about the EMA. I have found nothing but shrugs in this space. It is on my list of "future tests." I will pose as a question on the WIP for parameter testing and post here if I see anything new.

Thanks for your comments.

mykeehu · 2022-12-23T18:24:54Z

mykeehu
Dec 23, 2022

Just wondering, does anyone use the Instance token and the Class token? How exactly does it work? I don't really understand what I'm replacing it with yet. Is it so that if I type in a keyword, the text I type in here is interpreted by the model? I'd be interested to see how much it improves the editing.

2 replies

piyarsquare Dec 24, 2022
Author

I think it does a find and replace on your filewords.
If your captions have your class token but not your instance token, it will modify the caption so the instance token appears before the class token.
So if your captions have "woman" and you are training "xyz woman" if you put xyz in instance and woman in class it will make the correct edits to your captions.
I think.
That said, I can write my own python code to insert my keywords into captions and I'm not 100% of what the extension does, so I think I will just use [filewords] in both of the Prompt entries (Instance and Class) and write my own caption files.
I may also generate my own class images and captions using a larger batch size than enabled in the dreambooth extension.

mykeehu Dec 24, 2022

I'm still learning how to use the tokens (see my post below), so I'll learn more about that in the future. Thanks for your help!

mykeehu · 2022-12-24T19:46:07Z

mykeehu
Dec 24, 2022

In the last few days I have been looking for logical relationships by adding extra parameters to the @cerega66 formula. I have undertaken a difficult task, because I am not training for a person or a style, but for a given pose. So there are several different people in each picture, and the pose is from several different angles.

I trained 16 images with 2.5e-6 LR using the formula, with 512 steps saved. In general, 1.5x steps were best, so 1536 and 2048 steps were the right ones, 1024 were under-exposed, the higher ones were over-exposed.

Let the unique keyword be xyz.

I tried the following options:

Use [fileword] in Instance prompt
Instance token
Class token
Instance prompt
Class prompt
Class/Regularization images
flipped images
Shuffle tags

I was looking for a correlation between the following goals:

not to be overtrain
to be well editable
not to use faces or objects from the sample, only indirectly.

I tried several different poses with different data sets. Since this seems like a lot of variation, I will highlight the context. I'm still not completely clear on how the Instance token and Class token work, but what I have noticed is that they have a big impact on the editing. Let's see what variations I tried with the following constant parameters (the rest varied):

"adam_beta1": 0.9,
"adam_beta2": 0.999,
"adam_epsilon": 1e-08,
"adam_weight_decay": 0.01,
"attention": "xformers",
"center_crop": false,
"concepts_path": "",
"custom_model_name": "",
"epoch_pause_frequency": 0.0,
"epoch_pause_time": 60.0,
"gradient_accumulation_steps": 1,
"gradient_checkpointing": true,
"half_model": true,
"learning_rate": 2.5e-06,
"lora_learning_rate": 0.0002,
"lora_txt_learning_rate": 0.0002,
"lr_scheduler": "constant",
"lr_warmup_steps": 0,
"max_token_length": 75,
"max_train_steps": 2048,
"mixed_precision": "fp16",
"not_cache_latents": true,
"num_train_epochs": 128,
"pad_tokens": true,
"pretrained_vae_name_or_path": null,
"prior_loss_weight": 1,
"resolution": 512,
"revision": 2048,
"sample_batch_size": 8,
"save_class_txt": true,
"save_embedding_every": 512,
"save_preview_every": 512,
"save_use_global_counts": false,
"save_use_epochs": false,
"scale_lr": false,
"src": "V:/!SDModels\\model-v1-5-pruned-emaonly.ckpt",
"train_batch_size": 1,
"train_text_encoder": true,
"use_8bit_adam": true,
"use_concepts": false,
"use_cpu": false,
"use_ema": true,
"use_lora": false,
"scheduler": "ddim",
"v2": false,
"has_ema": "True",

Shuffle tags and Horizontal flip were off in most cases where I turned them on, I'll let you know!

Here are the variations:

I'll start by continuing my [filewords] comment:

instance_prompt: xyz, [filewords]
class prompt none
instance token: xyz
class token none
class images: 0
If I guess the class token works correctly after the experience, I just hit the keyword xyz, so the prompt is left with [filewords] after all, xyz means nothing. From my previous comment, you have to pay attention to what and how you tag, and since there are no class images, the generated models have faces similar to the ones on the sample images when you call the keywords. If you enter a keyword that was only on a sample image, it will be copied almost exactly during generation, so it's worth using it more than once to mix up the results. So don't use a unique keyword for any image! However, it is good to specify a unique pose and viewpoint, for example top view, side view, crouch, standing, sitting. So it has the advantage of generating a specific pose and angle of view. I haven't tried it with class images yet, so that the faces and backgrounds don't look like the sample images. Here, any keyword will pop up the pose, depending on the keyword when.

My second best result was when I used these parameters. Here I used the "babe" version for woman, because I wanted to capture young woman, but not girl:

instance prompt: here I didn't use the [filewords] option, but I typed in the generic keywords, but I wanted to make it harmonize with the class prompt: a photo of xyz babe sitting in front view, opened legs, spread thighs
class prompt: a photo of a babe sitting in front view
instance token: xyz
class token: babe
class images: 200
images generated from the class prompt here: \babesposes
Faces and objects from the images came back less often, but the keyword and class were removed because of the tokens, but the pose is turned on for the other prompt keywords, no keyword is needed. Sometimes it will turn on for one keyword, sometimes you need to use more if the prompt is long, so it is worth specifying more typical keywords in the Instance prompt.

For the other variations, the models did not perform well:
3. version:

instance prompt: xyz, front view, opened legs, spread thighs (so I only used keywords)
class prompt: woman (so not harmonized with the instance prompt)
instance token and class token none
class images: 200
class images from the \babesposes folder
Shuffle tags: true
I typed in the keywords, but it wouldn't bring up the pose. I varied quite a few things here, so I'm not surprised.

version, here I tried to invite the pose as a style:

instance prompt: xyz style
class prompt: pose style (unknown to the model)
instance token and class token none
class images: 200
class images from the \babesposes folder
Shuffle tags: true
It didn't work, it didn't make the pose, because the model didn't know the pose, the class pictures were useless, she adapted to the pictures rather than learn the pose. Maybe it was the shuffle tags that made it fail because there was nothing to mix.

version, here I trained individually, with horizontal flip:

instance prompt: photo of xyz babe
class prompt: photo of a babe
instance token and class token none
class images: 200
class images generated in the \babes folder
Horizontal flip enabled
Here, out of curiosity, I switched on the Horizontal flip. I was able to train up to 6144 steps without burning in, but it didn't become flexible with the environment. The pose sometimes came, sometimes didn't, the persons were mixed, but the environment was absolutely from the pattern pictures. Even when mixed with the base model, I couldn't make it more flexible.

version, back to [filewords]:

instance prompt: xyz, [filewords]
class prompt none
instance token: xyz woman
class token: woman
class images: 0
He did not want to make the pose even at 3072 steps. I used token here, but not class prompt. I tried this variation, it didn't work.

version:

instance prompt and class prompt none
instance token: xyz woman
class token: woman
class images: 0
Also not good, because there is no class prompt, so the token itself is worth nothing.

That's it for now, it looks like I get the most flexible model when I don't have Horizontal flip, I use keywords in the Instance prompt, I use Class images for the general appearance, tokens are filled in. The [filewords] help a lot with editing if they are tagged correctly, and are the ones you want to use later. If it's an object, it should be highlighted, if it's features, it should be highlighted, but mix them with more related words, and that's why it's better to have lots of images, because if you use it only once, it will be tied to the image (see this post of mine). So it's worth taking the time to do it, or instead of using [filewords], you should enter the keywords in the Instance prompt with a comma.

Another tip: another way to improve the editability is to train a stronger model (so 2048 steps is good in our example), which I combine with the base model with 0.75 and 0.8 weights. This way the mixed model will take over the keywords, yet still be flexible and editable.

2 replies

mykeehu Dec 26, 2022

To summarize, if you use [filewords], it is strongly recommended to use class images, otherwise you will get very keyword bound to the sample images. Of course, if you are editing a specific image, that's fine, but in all other cases you should use [filewords] so that what differs from image to image (e.g. camera angle, pose) you write in TXT, and the same keywords and attributes in the Instance prompt.
I'll experiment with tokens, because you can use them for substitution, so you write the keyword in the instance token, and what you write in the Instance prompt is what SD will "understand" when you type the token in the prompt. The class token will also, and then it will in principle use the class when you type the keyword, so it will know it's a woman right away. Currently, if the word token is also in the Instance prompt, the two cancel each other out, so it doesn't make sense when generating.
This is all theory, I haven't tried it yet. But I'll do a variation like this, and then it will be good because I can specify an exact camera position or pose for the deviations, which it will generate based on the sample images, and the rest is generic. I'll write up the results in a later post.

mykeehu Dec 27, 2022

Yesterday and today I created two models, one with the following parameters:

This is the first one:

Instance token: (unique word, e.g. xyz)
Class token: (the group, e.g. babe)
Instance prompt: [filewords] (in this case, the TXT next to the images should contain words that are specific to the image only! E.g. (top view, side view, bottom view etc., or feet or feet if both are visible)
Class prompt: a photo of a babe (these are the words used to generate the class images)
Class images: 200
Good models, no keywords here, but the filewords are words that the model knows, so it will use them for generation.
The good position and body part will be generated based on these, depending on what you type in txt (top view, side view, bottom view, etc., or foot or feet if both are visible)

model:

Instance token and Class token none
Instance prompt: photo of xyz babe, [filewords] (in this case, the TXT next to the images should contain words that are specific to the image only! E.g. (top view, side view, bottom view etc., or foot or feet if both are visible)
Class prompt: photo of a babe (these are the basis for the images to be generated)
Class images: 200 images
This is an even better model, so no tokens. On the one hand, you can use the keyword (xyz) here if you want a general genre, or you can mix it with the filewords parameters, which can give a more flexible result, because I decide from which angle I want to see which part. It also works without a keyword.

I trained with 16 images, 2.5e-6 LR, I got good results at 1024 steps, but it got better at 1536, and at 2048 steps it got a bit strong, but it's good for mixing.
It is important to note that I only included three words in the filewords txt files this time, only the ones I wanted to specifically mark for later use, i.e. hand, foot, view. e.g.: hand, feet, front view or hand, foot, bottom view. If you have a permanent word, it is unnecessary to put it in the txt, you should type it in Instance prompt to list it (e.g.: photo of xyz babe, hand, [filewords]), or it is unnecessary.

mykeehu · 2022-12-27T10:00:43Z

mykeehu
Dec 27, 2022

I have modified this [filewords] post, now I know with almost complete certainty what it is for, and I have tried to explain clearly what the advantages and disadvantages are, but it is definitely recommended to use it with the class images!

0 replies

ekeric13 · 2022-12-28T18:17:13Z

ekeric13
Dec 28, 2022

Anyone figure out to increase the max token length by more than 300?

0 replies

WIP guide on style training with [filewords]. #443

NEWLY UPDATED (so probably already outdated...)

DO NOT USE "xyz" as your keyword.

The image dataset.

The captions.

Training.

Go to the Create Model tab.

Select your new model

Go to the Concepts tab.

Go to the Parameters tab.

For Settings

For Advanced

Hit Train!

Prior Preservation.

Testing

Replies: 13 comments · 27 replies

piyarsquare Dec 7, 2022 Author

piyarsquare Dec 14, 2022 Author

piyarsquare Dec 13, 2022 Author

piyarsquare Dec 13, 2022 Author

piyarsquare Dec 14, 2022 Author

piyarsquare Dec 15, 2022 Author

piyarsquare Dec 16, 2022 Author

piyarsquare Dec 14, 2022 Author

piyarsquare Dec 16, 2022 Author

piyarsquare Dec 24, 2022 Author

Replies: 13 comments 27 replies

piyarsquare Dec 7, 2022
Author

piyarsquare Dec 14, 2022
Author

piyarsquare
Dec 13, 2022
Author

piyarsquare Dec 13, 2022
Author

piyarsquare Dec 14, 2022
Author

piyarsquare Dec 15, 2022
Author

piyarsquare Dec 16, 2022
Author

piyarsquare Dec 14, 2022
Author

piyarsquare Dec 16, 2022
Author

piyarsquare Dec 24, 2022
Author