Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't GISTEmbedLoss support DDP or DP? #2772

Closed
daegonYu opened this issue Jun 23, 2024 · 3 comments · Fixed by #2775
Closed

Doesn't GISTEmbedLoss support DDP or DP? #2772

daegonYu opened this issue Jun 23, 2024 · 3 comments · Fixed by #2775
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@daegonYu
Copy link
Contributor

When running CachedGISTEmbedLoss with DDP via the torchrun command, the following error occurs. The same error occurs even when running with DP. I built the anaconda environment using the "pip install ." command from the github here. Below is the result of the pip list command. Can you tell me what to modify?

Sorry for bothering you every time. I need your help.

error log

[rank0]:   File "/home/brianjang7/home1/NLP/sentence_similarity/sbert3_pretrain_ver3.py", line 554, in <module>
[rank0]:     main()
[rank0]:   File "/home/brianjang7/home1/NLP/sentence_similarity/sbert3_pretrain_ver3.py", line 537, in main
[rank0]:     trainer.train()
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/transformers/trainer.py", line 1885, in train
[rank0]:     return inner_training_loop(
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs)
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/transformers/trainer.py", line 3238, in training_step
[rank0]:     loss = self.compute_loss(model, inputs)
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/sentence_transformers/trainer.py", line 329, in compute_loss
[rank0]:     loss = loss_fn(features, labels)
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/sentence_transformers/losses/CachedGISTEmbedLoss.py", line 369, in forward
[rank0]:     for reps_mb, reps_guided_mb, random_state in self.embed_minibatch_iter(
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/sentence_transformers/losses/CachedGISTEmbedLoss.py", line 206, in embed_minibatch_iter
[rank0]:     reps, guide_reps, random_state = self.embed_minibatch(
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/sentence_transformers/losses/CachedGISTEmbedLoss.py", line 175, in embed_minibatch
[rank0]:     decoded = self.model.tokenizer.batch_decode(
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
[rank0]:     raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
[rank0]: AttributeError: 'DistributedDataParallel' object has no attribute 'tokenizer'

pip list

Package                  Version
------------------------ ----------
accelerate               0.31.0
aiohttp                  3.9.5
aiosignal                1.3.1
asttokens                2.4.1
async-timeout            4.0.3
attrs                    23.2.0
backcall                 0.2.0
certifi                  2024.6.2
charset-normalizer       3.3.2
comm                     0.2.2
datasets                 2.20.0
debugpy                  1.6.7
decorator                5.1.1
dill                     0.3.8
entrypoints              0.4
executing                2.0.1
filelock                 3.15.4
frozenlist               1.4.1
fsspec                   2024.5.0
huggingface-hub          0.23.4
idna                     3.7
ipykernel                6.29.4
ipython                  8.12.0
jedi                     0.19.1
Jinja2                   3.1.4
joblib                   1.4.2
jupyter-client           7.3.4
jupyter_core             5.7.2
MarkupSafe               2.1.5
matplotlib-inline        0.1.7
mpmath                   1.3.0
multidict                6.0.5
multiprocess             0.70.16
nest_asyncio             1.6.0
networkx                 3.2.1
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.5.40
nvidia-nvtx-cu12         12.1.105
packaging                24.1
pandas                   2.2.2
parso                    0.8.4
pexpect                  4.9.0
pickleshare              0.7.5
pillow                   10.3.0
pip                      24.0
platformdirs             4.2.2
prompt_toolkit           3.0.47
psutil                   5.9.0
ptyprocess               0.7.0
pure-eval                0.2.2
pyarrow                  16.1.0
pyarrow-hotfix           0.6
Pygments                 2.18.0
python-dateutil          2.9.0
pytz                     2024.1
PyYAML                   6.0.1
pyzmq                    25.1.2
regex                    2024.5.15
requests                 2.32.3
safetensors              0.4.3
scikit-learn             1.5.0
scipy                    1.13.1
sentence-transformers    3.1.0.dev0
setuptools               69.5.1
six                      1.16.0
stack-data               0.6.2
sympy                    1.12.1
threadpoolctl            3.5.0
tokenizers               0.19.1
torch                    2.3.1
tornado                  6.1
tqdm                     4.66.4
traitlets                5.14.3
transformers             4.41.2
triton                   2.3.1
typing_extensions        4.12.2
tzdata                   2024.1
urllib3                  2.2.2
wcwidth                  0.2.13
wheel                    0.43.0
xxhash                   3.4.1
yarl                     1.9.4



@tomaarsen
Copy link
Collaborator

tomaarsen commented Jun 23, 2024

Hello!

Thank you for reporting this - this is a bug caused by the retokenization that is required when your guide model has a different tokenizer than your training model:

if self.must_retokenize:
decoded = self.model.tokenizer.batch_decode(
sentence_feature_minibatch["input_ids"], skip_special_tokens=True
)
sentence_feature_minibatch = self.guide.tokenize(decoded)
sentence_feature_minibatch = {
key: value.to(self.guide.device) for key, value in sentence_feature_minibatch.items()
}

I think we can fix this by setting a guide_tokenizer parameter in the init:

super(CachedGISTEmbedLoss, self).__init__()
self.model = model
self.guide = guide

Because here we do have the "normal" model rather than the DDP-wrapped model, so we can still access the tokenizer. We can then use the guide_tokenizer parameter when retokenizing. I'll fix this in the coming days I reckon, if someone doesn't beat me to it with a PR.

Until then, you can perhaps try to use a guide model with the same tokenizer as the model that you're training?

  • Tom Aarsen

@tomaarsen
Copy link
Collaborator

Thanks for reporting this. You can use (Cached)GISTEmbedLoss again with DDP/DP by installing the "bleeding edge" version of sentence-transformers:

pip install git+https://github.com/UKPLab/sentence-transformers.git
  • Tom Aarsen

@daegonYu
Copy link
Contributor Author

daegonYu commented Jul 1, 2024

But it doesn't work...

Error Message

AttributeError: 'DistributedDataParallel' object has no attribute 'tokenizer'

I ran "pip install git+https://github.com/UKPLab/sentence-transformers.git" in my virtual environment. Therefore, I experimented with changing to the version below.

sentence-transformers 3.1.0.dev0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants