Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't work with DataParallel #4

Open
i7p9h9 opened this issue May 8, 2023 · 4 comments
Open

Doesn't work with DataParallel #4

i7p9h9 opened this issue May 8, 2023 · 4 comments

Comments

@i7p9h9
Copy link

i7p9h9 commented May 8, 2023

Minimum example

import torch
import timm
from torch import nn
from minlora import add_lora, get_lora_params, get_lora_state_dict


model_timm = timm.create_model("vit_large_patch14_clip_336.openai", pretrained=True, num_classes=0, global_pool='avg')
add_lora(model_timm)
model_timm = nn.DataParallel(model_timm, device_ids=[0,1]).cuda()

with torch.no_grad():
    asdf = model_timm(torch.randn(2, 3, 336, 336).cuda())
  File "/home/anaconda3/envs/face/lib/python3.8/site-packages/minlora/model.py", line 39, in lora_forward
    return X + torch.mm(*self.swap((self.lora_B, self.dropout_fn(self.lora_A)))).view(X.shape) * self.scaling
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
@SKDDJ
Copy link

SKDDJ commented Apr 29, 2024

use accelerate, accelerator.prepare(model)

@niklasbubeck
Copy link

@SKDDJ
Even when using huggingfaces accelerator I am struggeling to make it work for the multi-gpu setting (works with only 1 gpu). It leads to a pytorch warning message

an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it.

Furthermore it breaks when tracking the gradients using wandb.watch() since the grad.data object send to wandb is None indicating that the gradients dont get backpropagated properly.

Im currently using pytorch 2.2.0, can you maybe reference what version you tried?

@SKDDJ
Copy link

SKDDJ commented May 3, 2024

@niklasbubeck Hi, my torch version is "torch 2.2.2", maybe you can try the latest torch to see if this works:)

@SKDDJ
Copy link

SKDDJ commented May 3, 2024

@niklasbubeck Note that you'd better do this once you've load your model

  # Load pretrained model and tokenizer
    config = AutoConfig.from_pretrained(
        model_args.config_name if model_args.config_name else model_args.model_name_or_path,
        num_labels=num_labels,
        finetuning_task=data_args.task_name,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        # use_auth_token=True if model_args.use_auth_token else None,
    )
    tokenizer = AutoTokenizer.from_pretrained(
        model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
        use_fast=model_args.use_fast_tokenizer,
        revision=model_args.model_revision,
        # use_auth_token=True if model_args.use_auth_token else None,
    )
    model = AutoModelForSequenceClassification.from_pretrained(
        model_args.model_name_or_path,
        from_tf=bool(".ckpt" in model_args.model_name_or_path),
        config=config,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        # use_auth_token=True if model_args.use_auth_token else None,
        ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,
    )
    model, tokenizer = accelerator.prepare(model, tokenizer)
    # ... your other code
    apply_lora(model) # add lora here after use prepare(model)

hope this help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants