Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data augmentation strategies #11

Open
fire opened this issue Dec 13, 2023 · 56 comments
Open

Data augmentation strategies #11

fire opened this issue Dec 13, 2023 · 56 comments

Comments

@fire
Copy link

fire commented Dec 13, 2023

In #6

For each mesh I generate augments_per_item (like 200), then I use it to index into the dataset.

Using a seed I augment using this strategy.

What do you think?

scale = random.uniform(0.8, 1.2)  # Uniform scaling
rotation = R.from_euler('y', random.uniform(-180, 180), degrees=True)  # Random rotation around y-axis
translation = np.array([random.uniform(-0.5, 0.5) for _ in [0, 2]])  # Random translation in x and z directions

The goal is for a chair item to be rotated, moved or scaled, but upright.

Edited:

The idea is to have a chair be displaced but under gravity so it keeps its lowest vertex position.

@lucidrains
Copy link
Owner

yup sounds good! just put all the functions into one file, say augment.py, and if you want to go the distance, have ways to compose / chain any number of augmentations

@lucidrains
Copy link
Owner

@fire scale and rotation will go a long way

@fire
Copy link
Author

fire commented Dec 13, 2023

image

Here's what my current augments do.

@fire
Copy link
Author

fire commented Dec 13, 2023

vs original

image

Edited:

There's a bias near the center D:

@fire
Copy link
Author

fire commented Dec 13, 2023

image

The bias is removed.

@fire
Copy link
Author

fire commented Dec 13, 2023

I have to go for now.

https://github.com/lucidrains/meshgpt-pytorch/pull/6/files#diff-bb1e7e12bca15c4f2fd0faa464db85f6e8cb35c55454247f94c31bfc1483c3bbR100-R150

See def augment_mesh(self, base_mesh, augment_count, augment_idx):

Edited: removed seed

@fire
Copy link
Author

fire commented Dec 13, 2023

@lucidrains Can you post something for me to extract the resulting mesh from the autoencoder?

@fire
Copy link
Author

fire commented Dec 13, 2023

You mentioned the topic of overfitting as a first step.

I added the Blender monkey as a validation of mesh input through an autoencoder as an initial step.

I want send another monkey to the autoencoder and get the same monkey out again. How do I do that?

@fire
Copy link
Author

fire commented Dec 13, 2023

I was able to train a 1 step that outputs garbage glb 🎉

@adeerAI
Copy link

adeerAI commented Dec 13, 2023

You mentioned the topic of overfitting as a first step.

I added the Blender monkey as a validation of mesh input through an autoencoder as an initial step.

I want send another monkey to the autoencoder and get the same monkey out again. How do I do that?

I have been using Marcus provided Notebook file to try that, I am also getting bad obj results. I am going to try the latest @lucidrains changes tomorrow in this notebook, maybe you can try, give a look; or maybe you might be ahead of what I am using. 😆 Thanks!
https://drive.google.com/file/d/1gpLjbnH1WUH6U50MJKrw-8BV6S_-3KH1/view?usp=sharing

@fire
Copy link
Author

fire commented Dec 13, 2023

image

I am getting bad mesh results too, but it's trying. The selected is the output, the background is the base mesh.

@MarcusLoppe
Copy link
Contributor

MarcusLoppe commented Dec 13, 2023

Just for testing purposes; give it a go without the data augment.
I think there needs to be some more improvements with the model + it will take a long time to train with the data augment.
In the paper they used 28 000 shapes and trained the encoder on 2x A100 for 2 days and 4x A100 for 5 days for the transformer.
So it will need lots of training data and time.

When I have been successful, the encoder loss was less 0.200- 0.250 and the loss for the transformer was around 0.00007.
So if you can get the loss using the data augmentation down to those levels it probably work but that will require lots of training

bild

Here is some details from the paper, they only use scalar and jitter-shift.
So remove translation & rotation and see if that helps.

@fire
Copy link
Author

fire commented Dec 13, 2023

I am currently at:

loss: 1.255
loss: 1.500
loss: 1.786
loss: 1.596
loss: 1.941
loss: 1.583
loss: 1.895
loss: 1.904

So maybe I can dream about 0.200 - 0.250 loss.

@MarcusLoppe
Copy link
Contributor

MarcusLoppe commented Dec 13, 2023

I am currently at:

loss: 1.255
loss: 1.500
loss: 1.786
loss: 1.596
loss: 1.941
loss: 1.583
loss: 1.895
loss: 1.904

So maybe I can dream about 0.200 - 0.250 loss.

How many steps are that at? I require about 2000 steps since 200 x10 epochs = 2000.
Also implement tqdm since print can slow down quite alot.

Try only doing scalar and see, probably will go better.

You can give it a go with my forked version @ https://github.com/MarcusLoppe/meshgpt-pytorch/tree/main

The data MeshDataset expect is a array of:

obj_data = {"texts": "chair", "vertices": vertices, "faces": faces} 
import torch
from torch.utils.data import Dataset, DataLoader 
from tqdm import tqdm

class MeshDataset(Dataset): 
    def __init__(self, obj_data): 
        self.obj_data = obj_data
        print(f"Got {len(obj_data)} data")

    def __len__(self):
        return len(self.obj_data)

    def __getitem__(self, idx):
       return  self.obj_data[idx] 

from meshgpt_pytorch import (
    MeshTransformerTrainer,
    MeshAutoencoderTrainer
)

autoencoder_trainer = MeshAutoencoderTrainer(model = autoencoder,learning_rate = 1e-3, warmup_steps = 10,dataset = dataset,batch_size=4,grad_accum_every=1,num_train_steps=1)

autoencoder_trainer.train(10, True)

max_length =  max(len(d["faces"]) for d in dataset if "faces" in d)
max_seq =  max_length * 6
print(max_length)
print(max_seq)
transformer = MeshTransformer(
    autoencoder,
    dim = 16,
    max_seq_len = max_seq,
    #condition_on_text = True
)
 
 
trainer = MeshTransformerTrainer(model = transformer,warmup_steps = 10, dataset = dataset,learning_rate = 1e-2,batch_size=2,grad_accum_every=1,num_train_steps=1)
trainer.train(10)

@fire
Copy link
Author

fire commented Dec 13, 2023

These are my current settings which is 200 steps. The outlined is the output mesh. You can see my code in the pull request.

run = wandb.init(
    project="meshgpt-pytorch",
    
    config={
        "learning_rate": 1e-2,
        "architecture": "MeshGPT",
        "dataset": dataset_directory,
        "num_train_steps": 200,
        "warmup_steps": 1,
        "batch_size": 4,
        "grad_accum_every": 1,
        "checkpoint_every": 20,
        "device": str(device),
        "autoencoder": {
            "dim": 512,
            "encoder_depth": 6,
            "decoder_depth": 6,
            "num_discrete_coors": 128,
        },
        "dataset_size": dataset.__len__(),
    }
)

image

image

@fire
Copy link
Author

fire commented Dec 13, 2023

You are right that I should ensure that we're in unit square distance and do less augmentations though.

@MarcusLoppe
Copy link
Contributor

MarcusLoppe commented Dec 14, 2023

You are right that I should ensure that we're in unit square distance and do less augmentations though.

I think that generating two objects are causing some issues, try using a singular box.

I tried your s_bed_full.glb file and the result was pretty good, it's not so smooth. Probably better result with data augmentation. The right side is the generated one.

bild
bild

@fire
Copy link
Author

fire commented Dec 14, 2023

https://imgsli.com/ is very good for image comparisons.

@fire
Copy link
Author

fire commented Dec 14, 2023

Writing down an idea. It should be possible to go over the 10 million 3d item set and find a small set of items in a small set of classes similar to the paper and label them manually (like via path name).

@MarcusLoppe
Copy link
Contributor

MarcusLoppe commented Dec 14, 2023

Writing down an idea. It should be possible to go over the 10 million 3d item set and find a small set of items in a small set of classes similar to the paper and label them manually (like via path name).

Training 10 million might be overkill and going over 28 000 shapes might cost a bit to much $$$.
Shapenet got 50k 3d models with like almost a paragraph of description text.

Renting A100 at 0.79$ per hour:
Training encoder on A100 x2 for 2 days: 75,84$
Training transformer on A100 x4 for 5 days: 379$

However H100 promises good performance but at like 2-3$ an hour.

https://imgsli.com/ is very good for image comparisons.

Seems pretty good, but probably not for 3D models

@fire
Copy link
Author

fire commented Dec 14, 2023

I can't use shapenet, but I'm sure we can find 10 class of 100 models like Shapenet in that 10 million dataset.

@MarcusLoppe
Copy link
Contributor

I can't use shapenet, but I'm sure we can find 10 class of 100 models like Shapenet in that 10 million dataset.

I think it's fine, there are many free sources, the trouble might be finding a dataset with descriptions.
But that is in the future, I think someone can get access from Shapenet.
But the bigger issue is the GPU bill, however Phil/lucidrains might be able to improve the models so much that the training time goes down dramatically.

But after the model is trained the issue the inference will be a big issue for users, if it's going to generate complex 3D models, it might not work on consumer hardware. But the recent performance boost is a good sign that the performance and effective is on the right track.

https://github.com/timzhang642/3D-Machine-Learning#3d_models

@fire
Copy link
Author

fire commented Dec 14, 2023

I want to mention, getting the indices so they're in the right order and making sure they fit in the box and not inside out are problems too.

If you're interested in training the head it's
image in the dataset. I can't get the autoencoder below 0.5 loss

@MarcusLoppe
Copy link
Contributor

MarcusLoppe commented Dec 14, 2023

I want to mention, getting the indices so they're in the right order and making sure they fit in the box and not inside out are problems too.

If you're interested in training the head it's in the dataset. I can't get the autoencoder below 0.5 loss

How many examples/steps of the same 3d mesh did you train it on? I trained for 10-20 epochs @ 2000 examples and got 0.19 loss.
I think you are training on too few examples, it needs massive amounts of data to model. And if you do data augmentation you'll need even more data, maybe 30-40 epochs or more.

I was able to generate a pretty good 3d mesh, it's not as smooth but very good result for such small amount of training data.
The transformer & encoder isn't good at generalizing with low training data but that will resolve itself when training with much more data.

3D mesh:
https://file.io/6JIueypFnRyT

bild

@fire
Copy link
Author

fire commented Dec 15, 2023

I was using the wrong strategy. You were using many same copies of the mesh and then some augments. I was doing the opposite.

@MarcusLoppe
Copy link
Contributor

I was using the wrong strategy. You were using many same copies of the mesh and then some augments. I was doing the opposite.

I might have worded that badly but no, I'm using the same model without any augmentations.
But train for 10/20 epochs @ 2000 items per dataset and let me know.
Kaggle has some awesome free GPU's.

@fire
Copy link
Author

fire commented Dec 15, 2023

Here's what I interpreted it.

  1. model * multiple
  2. model * multiple * augments

You were doing 2000 (same) * 1 * 1.

I was trying 1 * 2000 (agumented) * 1.

Thanks for telling me! I'm trying your suggestion.

@MarcusLoppe
Copy link
Contributor

MarcusLoppe commented Dec 15, 2023

Here's what I interpreted it.

1. model * multiple

2. model * multiple * augments

You were doing 2000 (same) * 1 * 1.

I was trying 1 * 2000 (agumented) * 1.

Thanks for telling me! I'm trying your suggestion.

No problem, I posted this in another issue but I think this might help you; according to the paper they sort the vertices in z-y-x order.
Then sort the faces as per their lowest vertex index.

Also, I'm current training on like 6 3d mesh chairs. Each chair has 1500 examples, but it have 3 augmentation version .
So each 3d mesh file have a total of 500 x 3 =1500 examples.

The total is 12 000 examples.

To give you some type of idea of why you need to train for 2 days on two A100, watch how slow the progress is (33 minutes running):


Epoch 1/20: 100%|██████████| 1125/1125 [03:29<00:00,  5.38it/s, loss=0.296]
Epoch 1 average loss: 0.7889469708336724
Epoch 2/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.52it/s, loss=0.307]
Epoch 2 average loss: 0.29623086002137927
Epoch 3/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.54it/s, loss=0.28] 
Epoch 3 average loss: 0.2731376721594069
Epoch 4/20: 100%|██████████| 1125/1125 [03:22<00:00,  5.54it/s, loss=0.248]
Epoch 4 average loss: 0.25995001827345954
Epoch 5/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.54it/s, loss=0.239]
Epoch 5 average loss: 0.251056260228157
Epoch 6/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.53it/s, loss=0.217]
Epoch 6 average loss: 0.24529405222998726
Epoch 7/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.54it/s, loss=0.227]
Epoch 7 average loss: 0.24055371418264176
Epoch 8/20: 100%|██████████| 1125/1125 [03:22<00:00,  5.54it/s, loss=0.221]
Epoch 8 average loss: 0.23791699058479732
Epoch 9/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.54it/s, loss=0.245]
Epoch 9 average loss: 0.23742892943488228
Epoch 10/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.54it/s, loss=0.208]
Epoch 10 average loss: 0.23614923742082383
Epoch 11/20: 100%|██████████| 1125/1125 [03:23<00:00,  5.53it/s, loss=0.219]
Epoch 11 average loss: 0.23556399891111585

@fire
Copy link
Author

fire commented Dec 15, 2023

#11 (comment) was the verification of z-y-x order and sort the faces as per their lowest vertex index. Note that I am using the convention that gives me that result like Y-Z-X, but it followed their requirement of sorted vertically.

@fire
Copy link
Author

fire commented Dec 15, 2023

@MarcusLoppe on your branch, can you add a feature that on the first quit I save, on the second quit quit. Then, we can restart from a checkpoint.

@MarcusLoppe
Copy link
Contributor

MarcusLoppe commented Dec 15, 2023

#11 (comment) was the verification of z-y-x order and sort the faces as per their lowest vertex index. Note that I am using the convention that gives me that result like Y-Z-X, but it followed their requirement of sorted vertically.

Oh, great :)
I'm currently testing and seeing if using 50% of the 3d mesh examples to be full and the rest of the faces are stepped on, e.g 0 to max(faces). My idea is that when generating the 3d mesh, the embedder might freak out since it have never seen a input graph that is not full. I'll let you know how it goes.

One other tip might be to normalize the size and set everything on the ground.
If i'm correct; the below will set the max value of a vertices to 1 and min 0, then set everything on the ground.

I'm limiting the size since I'm current training on a few different chairs and some of the chairs where huge like a building while others where "normal" size.

    max_abs = np.max(np.abs(vertices))
    vertices = vertices / max_abs 
    
    min_y = np.min(vertices[:, 1])
    vertices[:, 1] -= min_y

@MarcusLoppe on your branch, can you add a feature that on the first quit I save, on the second quit quit. Then, we can restart from a checkpoint.

I don't understand, can you clarify?

@fire
Copy link
Author

fire commented Dec 15, 2023

image

This is my current result.

I'll retype the last message in a bit.

output.log See also https://wandb.ai/ernest-lee/meshgpt-pytorch/runs/2fkwahjc/overview

@MarcusLoppe
Copy link
Contributor

image

This is my current result.

I'll retype the last message in a bit.

output.log See also https://wandb.ai/ernest-lee/meshgpt-pytorch/runs/2fkwahjc/overview

I see that the dataset size is 10, for training effective I just duplicate the one model x2000 times since it can train faster I think when dealing with bigger loads.
Since you are using a 3090 you can probably up batch size to 8 or 16. The only reason why I had the batch size at 1 or 4 was due to VRAM constraints but the encoder & transformer are now much more memory effective.

The learning rate seems bit high, for the encoder i used 1e-3 (0.001) and for the transformer i used 1e-2 (0.01).
When the loss becomes quite low for the transformer you can try using a lower learning rate such as 1e-3.

@fire
Copy link
Author

fire commented Dec 15, 2023

image

https://wandb.ai/ernest-lee/meshgpt-pytorch/runs/9b8k9mfc/overview?workspace=user-ernest-lee

I have some bugs, but this is really promising.

I had to recode my face index asc regularization strategy.

The clipped ears is the meshgpt.

@fire
Copy link
Author

fire commented Dec 15, 2023

I see that the dataset size is 10, for training effective I just duplicate the one model x2000 times since it can train faster I think when dealing with bigger loads.

Instead of duplicating the model, I multiply the epoch by n, but according to the graph the training flattens so I stop early.

@fire
Copy link
Author

fire commented Dec 15, 2023

image

I broke the counter clockwise triangle order, but it's invisible in this shot.

@MarcusLoppe
Copy link
Contributor

https://wandb.ai/ernest-lee/meshgpt-pytorch/runs/9b8k9mfc/overview?workspace=user-ernest-lee

I have some bugs, but this is really promising.

I had to recode my face index asc regularization strategy.

The clipped ears is the meshgpt.

That seems very good, I see that you increased the num_discrete_coors to 256. Did that help? Seems like that would smooth out the errors/give it a higher error margin so even if it's wrong it looks smoother.

What kind of augmentation are you doing? Are you applying all the augmentations including the rotation?
I'm bit unsure about the rotation one since neither MeshGPT or PolyGen mention it, only the scalar & jitter.

Is there any reason why you are adding 2 extra tokens as padding?

seq_len = dataset.get_max_face_count() * 3
seq_len = ((seq_len + 2) // 3) * 3

@fire

This comment was marked as outdated.

@fire
Copy link
Author

fire commented Dec 15, 2023

Is there any reason why you are adding 2 extra tokens as padding?

The generated tokens length needs to be a multiple of 3.

@fire
Copy link
Author

fire commented Dec 15, 2023

I see that you increased the num_discrete_coors to 256

To be honest I think this only affects the quantization loss on the discretionary of the mesh vertex positions.

I don't think it matters, but I haven't tested it off.

@fire
Copy link
Author

fire commented Dec 15, 2023

@MarcusLoppe
Copy link
Contributor

MarcusLoppe commented Dec 15, 2023

Is there any reason why you are adding 2 extra tokens as padding?

The generated tokens length needs to be a multiple of 3.

It should be 6 since 1 face = 6 tokens.

I see that you increased the num_discrete_coors to 256

To be honest I think this only affects the quantization loss on the discretionary of the mesh vertex positions.

I don't think it matters, but I haven't tested it off.

It should make it smoother since if it guesses wrong class of 128 vs 256 classes; the step values might be 0.20 vs 0.10, the 0.1 error will be less visible.

@MarcusLoppe
Copy link
Contributor

image

https://wandb.ai/ernest-lee/meshgpt-pytorch/runs/dn4mqfoj/overview?workspace=user-ernest-lee [Edited]

phone.zip

Training a single mesh seems to be going pretty good/solved, have you tried using the texts & multiple meshes?
Try with just 2-3 meshes and see how it goes, it's very slow to train the transformer with more then one mesh.

I'm guess that you resolved the issue with the mesh get cut off? I just scale it to fit -0.95 to +0.95, seems like there are some issues when the mesh gets above at 1.0.

Also; I was granted access to the shapenet v2 dataset on huggingface, you can probably get access as well.

@fire
Copy link
Author

fire commented Dec 16, 2023

I was able to train the transformer to use 1172 faces.

mesh_transforms_humanoid_avatar.zip

image

I respect the MIT, Apache-2 and cc-by licenses and so have a reason to not use shapenet.

V-Sekai-fire@a416837

https://wandb.ai/ernest-lee/meshgpt-pytorch/runs/rp8nbw7w?workspace=user-ernest-lee Some logs.

Duration: 1h 13m 26s

upbeat-waterfall-437-618dbfb6d54f78d191f293a55a0c9e7a41147541.json

@fire
Copy link
Author

fire commented Dec 16, 2023

Training a single mesh seems to be going pretty good/solved, have you tried using the texts & multiple meshes?
Try with just 2-3 meshes and see how it goes, it's very slow to train the transformer with more then one mesh.

I want to do after a break. Any suggestions? I was thinking of having one human be in multiple poses, but different objects is doable too.

@MarcusLoppe
Copy link
Contributor

I was able to train the transformer to use 1172 faces.

mesh_transforms_humanoid_avatar.zip

image

I respect the MIT, Apache-2 and cc-by licenses and so have a reason to not use shapenet.

V-Sekai-fire@a416837

https://wandb.ai/ernest-lee/meshgpt-pytorch/runs/rp8nbw7w?workspace=user-ernest-lee Some logs.

Duration: 1h 13m 26s

upbeat-waterfall-437-618dbfb6d54f78d191f293a55a0c9e7a41147541.json

I think it's fine to train while testing since it's not for any commercial purpose but pure testing that won't be touched by anyone else.

One benefit of using shapenet is they got nice labels and not just category's like "chair", examples:
"name": "easy chair,lounge chair,overstuffed chair",
"name": "water faucet,water tap,tap,hydrant",
"name": "ladder-back,ladder-back chair",

I want to do after a break. Any suggestions? I was thinking of having one human be in multiple poses, but different objects is doable too.

Yes, use very low faces mesh since using text to encode makes the training much harder.

Using a dataset of 2 chairs with 5000 examples (2 meshes, 5 augmentations x 500)
I got the encoder to 0.2 loss after 2 epochs but the transformer is at 0.001695 loss after 40 epochs and taken 2h's.

@fire
Copy link
Author

fire commented Dec 16, 2023

@MarcusLoppe I'm pretty sure you can use blip to categorize photos of the mesh so that's not a blocker. https://replicate.com/gfodor/instructblip

@fire
Copy link
Author

fire commented Dec 16, 2023

Someone wanted me to try https://www.kenney.nl/assets/castle-kit. So I'll need to generate labels for them, but it should work.

image

@MarcusLoppe
Copy link
Contributor

MarcusLoppe commented Dec 16, 2023

@MarcusLoppe I'm pretty sure you can use blip to categorize photos of the mesh so that's not a blocker. https://replicate.com/gfodor/instructblip

Well the downside is that you'll use blender to take a screenshot with a default camera and since models vary with the orientation/vertical axis you might take a snapshot at back/below of the object.
Why complicate it? :)

Someone wanted me to try https://www.kenney.nl/assets/castle-kit. So I'll need to generate labels for them, but it should work.

Try walking before running :) I've been trying to tell that you need massive amount of data and training time to actually create a good enough model for that.
Currently you have been over fitting a model with a very small sample of data. The harder part is when you want to create a general model that can generate general items.

I've been successful on over fitting it using text + 1 single model for around 40 epochs at 2000 examples per epoch.
If i use two models that are the same type of object e.g chair, it fails massively.

If you want to give it a go, use only the models with less then 500-600 faces and then create 10-20 augmentations per model, then duplicate each variation 200 times.
If you want to train it using 40 objects = 10 * 200 * 40 = 80 000 examples per dataset.

Then train on this for a day or two and then try to generate using the texts.

In the PolyGen and MeshGPT paper they stress that they didn't have enough training data and used only 28 000 mesh models.
They needed to augment those with lets say 20 augments, this means that they trained on 560 000 mesh models.
Since they only did autocomplete it makes the generation much easier then using texts.

In the paper they used 28 000 3d models, lets say they generate 10 augmentations per each model and then used 10 duplicates since the it's more effective to train a model with big batch size of 64 and when you are using a small number of models per dataset it will not train effectively and you will waste parallelism of GPUs.
This means that : 10 * 10 = 100 * 28 000 = 2 800 000

I want to stress this:
Over fitting a model = super easy.
Training a model to be general enough for many different models = Hard.

@fire
Copy link
Author

fire commented Dec 16, 2023

From @MarcusLoppe

But since it seems like you are not using the texts you can try to feed the transformer a prompt of 10-30 connected faces of a model and see what happens (like in the paper), it should act as a autocomplete.

@fire
Copy link
Author

fire commented Dec 17, 2023

@MarcusLoppe what sort of limits are you getting on your triangle count? I think mine is around 1349 triangles per mesh.

@MarcusLoppe
Copy link
Contributor

@MarcusLoppe what sort of limits are you getting on your triangle count? I think mine is around 1349 triangles per mesh.

I haven't bothered with such large meshes due to hardware constraints.
What limits are you talking about? If you are running out of VRAM while training; lower the batch size.

@fire
Copy link
Author

fire commented Dec 18, 2023

image

Does anyone know what the meshgpt paper means by jitter below Hausdorff?

@MarcusLoppe
Copy link
Contributor

@MarcusLoppe what sort of limits are you getting on your triangle count? I think mine is around 1349 triangles per mesh.

What happens when you go above it? Is it the VRAM or the transformer get stuck at a loss? If so; have you tried raising the dim to 768 or 1024?

Does anyone know what the meshgpt paper means by jitter below Hausdorff?

I don't think jitter is related to that, it talks about jitter but then switches the topic to planar decimation, e.g simplify the training mesh while having it look the same as before.

@fire
Copy link
Author

fire commented Dec 18, 2023

I had an avatar https://booth.pm/en/items/4861008 and I wanted to use it so was trying to optimize. (mirror https://github.com/V-Sekai-fire/SK_faavrs_breadbread)

Was around 15_023 triangles, and I don't think it's reasonable for people to pay for 48 gb gpus.

@MarcusLoppe
Copy link
Contributor

MarcusLoppe commented Dec 18, 2023

I had an avatar https://booth.pm/en/items/4861008 and I wanted to use it so was trying to optimize. (mirror https://github.com/V-Sekai-fire/SK_faavrs_breadbread)

Was around 15_023 triangles, and I don't think it's reasonable for people to pay for 48 gb gpus.

@fire
I'm guessing you never made it to the transformer encoding and crashes at the encoder training?

Did you load the models without training them and tried to generate a model and see what the inference VRAM requirement was?

@lucidrains
Since most meshes contain a lot triangles and each time the autoencoder embeds the mesh data; the whole mesh is embedded at once (not sure if the ResNet does the same thing).
This will create a lot of VRAM usage, currently it seems like it's zero-shotting the entire mesh.
This seems not very smart or effective, maybe a better idea is to provide faces but include extra connected faces as padding so it can understand the overall shape? Or maybe figure out a good way of summarizing the rest of the faces in a memory efficient way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants