-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The MeshTransformer does not generate coherent results #18
Comments
This comment was marked as outdated.
This comment was marked as outdated.
I am currently on an old version of the codebase. So I don't know. Something probably broke |
The issue is this: not enough data. In the PolyGen and MeshGPT paper they stress that they didn't have enough training data and used only 28 000 mesh models. The loss of the transformer should be below 0.0001 for successful generations. Here is some idea what amount of data you should use. I recommend that you create/take at look at my fork a trainer that uses epochs instead of steps since printing out 400k steps will slow down the training. Then train on this for a day or two and use a large batch size (less then 64) to promote for generalizing. In the paper they used 28 000 3d models, lets say they generate 10 augmentations per each model and then used 10 duplicates since the it's more effective to train a model with big batch size of 64 and when you are using a small number of models per dataset it will not train effectively and you will waste parallelism of GPUs. I want to stress this:
It's GPT2 is quite old and there have been improvements so it's not very good anymore. |
It appears to be using the text to choose from the learned mesh instances of the chair class? |
@MarcusLoppe @fire next get some tables and chairs and see if it can learn to generate two separate classes! so this isn't documented, but in order to do better text binding (once you scale up with more text variety), you can improve text binding with classifier free guidance, a technique employed in a lot of denoising diffusion models (SD included). |
Success! :) Autoencoder training 6h 210 epochs, 756 000 steps : 0.9 loss Variations: 3, 100 examples each = 3600 examples/steps /epoch
When changing cond_scale it gave me this error:
|
@MarcusLoppe i'll fix the classifier free guidance mid next week, there's another issue with it (caching is very tricky) |
@MarcusLoppe fixed your current issue for now, but inference will still be super slow. will need a couple days to support kv cache correctly for conditional scaling congratulations on training these results! enjoy the rest of your sunday |
|
Hi, I also trained the model on around 4k decimated chair meshes with less than 800 faces as suggested by the paper. And I got very similar results as @Kurokabe. Here is the training loss of my autoencoder and GPT: |
@whaohan on first glance, your autoencoder loss is way too high |
this hyperparameter doesn't actually improve results, just better alignment to the text description (if it is not following it) |
The transformer needs to be near 0.01 or 0.001 and the autoencoder can be from 0.25 or 0.35 or lower. |
I have trained the MeshTransformer on 200 different meshes from the chair category on ShapeNet after decimation and filtering meshes with less than 400 vertices and faces. The MeshTransformer reached a loss very close to 0
But when I call the
generate
method from the MeshTransformer, I get very bad results.From left to right, ground truth, autoencoder output, MeshTransformer generated mesh with a temperature of 0, with a temperature of 0.1, a temperature of 0.7 and a temperature of 1. This is done with meshgpt-pytorch version 0.3.3
Note: the MeshTransformer was not conditioned on text or anything, so the output is not supposed to exactly look like the sofa, but it barely look like a chair. We can guess the backrest and the legs but that's it.
Initially I thought that there might have been an error with the KV cache so here are the results with
cache_kv=False
:And this one with meshgpt-pytorch version 0.2.11
When I trained on a single chair with a version before 0.2.11, the
generate
method was able to create a coherent chair (from left to right, ground truth, autoencoder output,meshtranformer.generate()
)Why even though the transformer loss was very low the generated results are very bad?
I have uploaded the autoencoder and meshtransformer checkpoint (on version 0.3.3) as well as 10 data samples there: https://file.io/nNsfTyHX4aFB
Also quick question, why rewrite the transformer from scratch, and not use the HuggingFace GPT2 transformer?
The text was updated successfully, but these errors were encountered: