Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issues #42

Open
w1hao opened this issue Jan 8, 2025 · 7 comments
Open

Memory issues #42

w1hao opened this issue Jan 8, 2025 · 7 comments

Comments

@w1hao
Copy link

w1hao commented Jan 8, 2025

According to your method, when I train a PCN dataset of a single category on the A6000 GPU, no matter how I adjust the batch_size, the GPU usage is 30+G, even if the batch_size is adjusted to 1, it will occupy 35G video memory. When the batch_size is set to 64, it only occupies 7G video memory, and 63 and 65 are both 30G+. How much video memory does a single class of your training PCN dataset occupy? The 3090GPU is only 24G, how did you train it, or is there a problem with my device?

@w1hao
Copy link
Author

w1hao commented Jan 8, 2025

You say you're training one category at a time, how can you spend so much time? I trained a single category on the PCN dataset and it took me more than an hour.

@w1hao
Copy link
Author

w1hao commented Jan 8, 2025

I'm a newbie and I'm looking forward to hearing from you.

@CuiRuikai
Copy link
Owner

CuiRuikai commented Jan 8, 2025 via email

@w1hao
Copy link
Author

w1hao commented Jan 8, 2025

It sounds weird. I usually set the batch size as 32 or less. But it’s obviously larger than 1. And this only takes no more than 10GB as I usually train two model on a single 3090GPU which only have 24 GB memory. Did you change the number of output points. The decoder is actually the part that consumes the most memory. If you increased it. This will take a lot of memory. Sent from Outlook for iOShttps://aka.ms/o0ukef

________________________________ From: w1hao @.> Sent: Wednesday, January 8, 2025 7:03:04 PM To: CuiRuikai/Partial2Complete @.> Cc: Subscribed @.> Subject: [CuiRuikai/Partial2Complete] Memory issues (Issue #42) According to your method, when I train a PCN dataset of a single category on the A6000 GPU, no matter how I adjust the batch_size, the GPU usage is 30+G, even if the batch_size is adjusted to 1, it will occupy 35G video memory. When the batch_size is set to 64, it only occupies 7G video memory, and 63 and 65 are both 30G+. How much video memory does a single class of your training PCN dataset occupy? The 3090GPU is only 24G, how did you train it, or is there a problem with my device? — Reply to this email directly, view it on GitHub<#42>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALJ7PYXQ4TWPPBMSVFQIBOT2JTLTRAVCNFSM6AAAAABUZKJI6WVHI2DSMVQWIX3LMV43ASLTON2WKOZSG43TINJZGA4DOMQ. You are receiving this because you are subscribed to this thread.Message ID: @.>

I didn't modify anything.

@w1hao
Copy link
Author

w1hao commented Jan 8, 2025

I'm just download the project and train it the way you gave me.

@CuiRuikai
Copy link
Owner

Can you try to disable the normal consistency loss?

This loss involves computing the normal and uses PyTorch3D. This operation is also very memory consuming. I doubt this may lead to such a weird issue.

You can also try to empty CUDA cache before the first epoch but after model initialisation.

@w1hao
Copy link
Author

w1hao commented Jan 8, 2025

您可以尝试禁用正常的一致性损失吗?

这种损失涉及计算法线并使用 PyTorch3D。此操作也非常消耗内存。我怀疑这可能会导致这样一个奇怪的问题。

您也可以尝试在第一个 epoch 之前但在模型初始化之后清空 CUDA 缓存。

I think of a problem, when I install the environment, pip install pytorch3D is installed normally, but there is a problem with the program, and an error is made when calling a function. So I used the original code to install,such a method allows the program to function properly——git clone https://github.com/facebookresearch/pytorch3d.git cd pytorch3d pip install -e .

Thanks for the reply, I will try the solution you said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants