Memory issues #42

w1hao · 2025-01-08T08:02:43Z

According to your method, when I train a PCN dataset of a single category on the A6000 GPU, no matter how I adjust the batch_size, the GPU usage is 30+G, even if the batch_size is adjusted to 1, it will occupy 35G video memory. When the batch_size is set to 64, it only occupies 7G video memory, and 63 and 65 are both 30G+. How much video memory does a single class of your training PCN dataset occupy? The 3090GPU is only 24G, how did you train it, or is there a problem with my device?

w1hao · 2025-01-08T08:07:19Z

You say you're training one category at a time, how can you spend so much time? I trained a single category on the PCN dataset and it took me more than an hour.

w1hao · 2025-01-08T08:08:05Z

I'm a newbie and I'm looking forward to hearing from you.

CuiRuikai · 2025-01-08T08:09:10Z

It sounds weird. I usually set the batch size as 32 or less. But it’s obviously larger than 1. And this only takes no more than 10GB as I usually train two model on a single 3090GPU which only have 24 GB memory. Did you change the number of output points. The decoder is actually the part that consumes the most memory. If you increased it. This will take a lot of memory. Sent from Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: w1hao ***@***.***> Sent: Wednesday, January 8, 2025 7:03:04 PM To: CuiRuikai/Partial2Complete ***@***.***> Cc: Subscribed ***@***.***> Subject: [CuiRuikai/Partial2Complete] Memory issues (Issue #42) According to your method, when I train a PCN dataset of a single category on the A6000 GPU, no matter how I adjust the batch_size, the GPU usage is 30+G, even if the batch_size is adjusted to 1, it will occupy 35G video memory. When the batch_size is set to 64, it only occupies 7G video memory, and 63 and 65 are both 30G+. How much video memory does a single class of your training PCN dataset occupy? The 3090GPU is only 24G, how did you train it, or is there a problem with my device? — Reply to this email directly, view it on GitHub<#42>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALJ7PYXQ4TWPPBMSVFQIBOT2JTLTRAVCNFSM6AAAAABUZKJI6WVHI2DSMVQWIX3LMV43ASLTON2WKOZSG43TINJZGA4DOMQ>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

w1hao · 2025-01-08T08:11:07Z

It sounds weird. I usually set the batch size as 32 or less. But it’s obviously larger than 1. And this only takes no more than 10GB as I usually train two model on a single 3090GPU which only have 24 GB memory. Did you change the number of output points. The decoder is actually the part that consumes the most memory. If you increased it. This will take a lot of memory. Sent from Outlook for iOShttps://aka.ms/o0ukef
…
________________________________ From: w1hao @.> Sent: Wednesday, January 8, 2025 7:03:04 PM To: CuiRuikai/Partial2Complete @.> Cc: Subscribed @.> Subject: [CuiRuikai/Partial2Complete] Memory issues (Issue #42) According to your method, when I train a PCN dataset of a single category on the A6000 GPU, no matter how I adjust the batch_size, the GPU usage is 30+G, even if the batch_size is adjusted to 1, it will occupy 35G video memory. When the batch_size is set to 64, it only occupies 7G video memory, and 63 and 65 are both 30G+. How much video memory does a single class of your training PCN dataset occupy? The 3090GPU is only 24G, how did you train it, or is there a problem with my device? — Reply to this email directly, view it on GitHub<#42>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALJ7PYXQ4TWPPBMSVFQIBOT2JTLTRAVCNFSM6AAAAABUZKJI6WVHI2DSMVQWIX3LMV43ASLTON2WKOZSG43TINJZGA4DOMQ. You are receiving this because you are subscribed to this thread.Message ID: @.>

I didn't modify anything.

w1hao · 2025-01-08T08:12:42Z

I'm just download the project and train it the way you gave me.

CuiRuikai · 2025-01-08T08:18:16Z

Can you try to disable the normal consistency loss?

This loss involves computing the normal and uses PyTorch3D. This operation is also very memory consuming. I doubt this may lead to such a weird issue.

You can also try to empty CUDA cache before the first epoch but after model initialisation.

w1hao · 2025-01-08T08:24:26Z

您可以尝试禁用正常的一致性损失吗？

这种损失涉及计算法线并使用 PyTorch3D。此操作也非常消耗内存。我怀疑这可能会导致这样一个奇怪的问题。

您也可以尝试在第一个 epoch 之前但在模型初始化之后清空 CUDA 缓存。

I think of a problem, when I install the environment, pip install pytorch3D is installed normally, but there is a problem with the program, and an error is made when calling a function. So I used the original code to install，such a method allows the program to function properly——git clone https://github.com/facebookresearch/pytorch3d.git cd pytorch3d pip install -e .

Thanks for the reply, I will try the solution you said.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory issues #42

Memory issues #42

w1hao commented Jan 8, 2025

w1hao commented Jan 8, 2025

w1hao commented Jan 8, 2025

CuiRuikai commented Jan 8, 2025 via email

w1hao commented Jan 8, 2025

w1hao commented Jan 8, 2025

CuiRuikai commented Jan 8, 2025

w1hao commented Jan 8, 2025

Memory issues #42

Memory issues #42

Comments

w1hao commented Jan 8, 2025

w1hao commented Jan 8, 2025

w1hao commented Jan 8, 2025

CuiRuikai commented Jan 8, 2025 via email

w1hao commented Jan 8, 2025

w1hao commented Jan 8, 2025

CuiRuikai commented Jan 8, 2025

w1hao commented Jan 8, 2025