You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)
Reproduction
from accelerate import Accelerator
from transformers import CLIPVisionModelWithProjection
I am using the accelerate and transformers libraries to train the my model in a single-node multi-GPU environment, with the parameters of CLIPVisionModelWithProjection frozen. When I start the experiment, for example, using GPUs 4, 5, 6, and 7 (a total of 4 GPUs), I notice that processes for GPUs 5, 6, and 7 also consume memory on GPU 4, approximately 510MB. Once I remove the CLIPVisionModelWithProjection, everything returns to normal. What could be the reason for this? Is there a solution?
The text was updated successfully, but these errors were encountered:
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
from accelerate import Accelerator
from transformers import CLIPVisionModelWithProjection
accelerator = Accelerator(
gradient_accumulation_steps=opt["training_opt"]["gradient_accumulation_steps"],
mixed_precision=opt["training_opt"]["mixed_precision"],
log_with=opt["tracker_opt"]["report_to"],
project_config=accelerator_project_config,
kwargs_handlers=[ddp_kwargs]
)
image_encoder = CLIPVisionModelWithProjection.from_pretrained(opt["module_opt"]["image_encoder"]["pretrained_model_path"])
image_encoder .to(accelerator.device, dtype=weight_dtype)
Expected behavior
I am using the accelerate and transformers libraries to train the my model in a single-node multi-GPU environment, with the parameters of CLIPVisionModelWithProjection frozen. When I start the experiment, for example, using GPUs 4, 5, 6, and 7 (a total of 4 GPUs), I notice that processes for GPUs 5, 6, and 7 also consume memory on GPU 4, approximately 510MB. Once I remove the CLIPVisionModelWithProjection, everything returns to normal. What could be the reason for this? Is there a solution?
The text was updated successfully, but these errors were encountered: