Doesn't GISTEmbedLoss support DDP or DP? #2772

daegonYu · 2024-06-23T08:14:34Z

When running CachedGISTEmbedLoss with DDP via the torchrun command, the following error occurs. The same error occurs even when running with DP. I built the anaconda environment using the "pip install ." command from the github here. Below is the result of the pip list command. Can you tell me what to modify?

Sorry for bothering you every time. I need your help.

error log

[rank0]:   File "/home/brianjang7/home1/NLP/sentence_similarity/sbert3_pretrain_ver3.py", line 554, in <module>
[rank0]:     main()
[rank0]:   File "/home/brianjang7/home1/NLP/sentence_similarity/sbert3_pretrain_ver3.py", line 537, in main
[rank0]:     trainer.train()
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/transformers/trainer.py", line 1885, in train
[rank0]:     return inner_training_loop(
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs)
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/transformers/trainer.py", line 3238, in training_step
[rank0]:     loss = self.compute_loss(model, inputs)
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/sentence_transformers/trainer.py", line 329, in compute_loss
[rank0]:     loss = loss_fn(features, labels)
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/sentence_transformers/losses/CachedGISTEmbedLoss.py", line 369, in forward
[rank0]:     for reps_mb, reps_guided_mb, random_state in self.embed_minibatch_iter(
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/sentence_transformers/losses/CachedGISTEmbedLoss.py", line 206, in embed_minibatch_iter
[rank0]:     reps, guide_reps, random_state = self.embed_minibatch(
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/sentence_transformers/losses/CachedGISTEmbedLoss.py", line 175, in embed_minibatch
[rank0]:     decoded = self.model.tokenizer.batch_decode(
[rank0]:   File "/home/brianjang7/home1/anaconda3/envs/sbert3_verup/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
[rank0]:     raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
[rank0]: AttributeError: 'DistributedDataParallel' object has no attribute 'tokenizer'

pip list

Package                  Version
------------------------ ----------
accelerate               0.31.0
aiohttp                  3.9.5
aiosignal                1.3.1
asttokens                2.4.1
async-timeout            4.0.3
attrs                    23.2.0
backcall                 0.2.0
certifi                  2024.6.2
charset-normalizer       3.3.2
comm                     0.2.2
datasets                 2.20.0
debugpy                  1.6.7
decorator                5.1.1
dill                     0.3.8
entrypoints              0.4
executing                2.0.1
filelock                 3.15.4
frozenlist               1.4.1
fsspec                   2024.5.0
huggingface-hub          0.23.4
idna                     3.7
ipykernel                6.29.4
ipython                  8.12.0
jedi                     0.19.1
Jinja2                   3.1.4
joblib                   1.4.2
jupyter-client           7.3.4
jupyter_core             5.7.2
MarkupSafe               2.1.5
matplotlib-inline        0.1.7
mpmath                   1.3.0
multidict                6.0.5
multiprocess             0.70.16
nest_asyncio             1.6.0
networkx                 3.2.1
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        8.9.2.26
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.5.40
nvidia-nvtx-cu12         12.1.105
packaging                24.1
pandas                   2.2.2
parso                    0.8.4
pexpect                  4.9.0
pickleshare              0.7.5
pillow                   10.3.0
pip                      24.0
platformdirs             4.2.2
prompt_toolkit           3.0.47
psutil                   5.9.0
ptyprocess               0.7.0
pure-eval                0.2.2
pyarrow                  16.1.0
pyarrow-hotfix           0.6
Pygments                 2.18.0
python-dateutil          2.9.0
pytz                     2024.1
PyYAML                   6.0.1
pyzmq                    25.1.2
regex                    2024.5.15
requests                 2.32.3
safetensors              0.4.3
scikit-learn             1.5.0
scipy                    1.13.1
sentence-transformers    3.1.0.dev0
setuptools               69.5.1
six                      1.16.0
stack-data               0.6.2
sympy                    1.12.1
threadpoolctl            3.5.0
tokenizers               0.19.1
torch                    2.3.1
tornado                  6.1
tqdm                     4.66.4
traitlets                5.14.3
transformers             4.41.2
triton                   2.3.1
typing_extensions        4.12.2
tzdata                   2024.1
urllib3                  2.2.2
wcwidth                  0.2.13
wheel                    0.43.0
xxhash                   3.4.1
yarl                     1.9.4

The text was updated successfully, but these errors were encountered:

tomaarsen · 2024-06-23T09:43:33Z

Hello!

Thank you for reporting this - this is a bug caused by the retokenization that is required when your guide model has a different tokenizer than your training model:

sentence-transformers/sentence_transformers/losses/CachedGISTEmbedLoss.py

Lines 174 to 181 in e5c15a5

    
           if self.must_retokenize: 
        
               decoded = self.model.tokenizer.batch_decode( 
        
                   sentence_feature_minibatch["input_ids"], skip_special_tokens=True 
        
               ) 
        
               sentence_feature_minibatch = self.guide.tokenize(decoded) 
        
               sentence_feature_minibatch = { 
        
                   key: value.to(self.guide.device) for key, value in sentence_feature_minibatch.items() 
        
               }

I think we can fix this by setting a guide_tokenizer parameter in the init:

sentence-transformers/sentence_transformers/losses/CachedGISTEmbedLoss.py

Lines 135 to 137 in e5c15a5

    
           super(CachedGISTEmbedLoss, self).__init__() 
        
           self.model = model 
        
           self.guide = guide

Because here we do have the "normal" model rather than the DDP-wrapped model, so we can still access the tokenizer. We can then use the guide_tokenizer parameter when retokenizing. I'll fix this in the coming days I reckon, if someone doesn't beat me to it with a PR.

Until then, you can perhaps try to use a guide model with the same tokenizer as the model that you're training?

Tom Aarsen

tomaarsen · 2024-06-25T08:17:56Z

Thanks for reporting this. You can use (Cached)GISTEmbedLoss again with DDP/DP by installing the "bleeding edge" version of sentence-transformers:

pip install git+https://github.com/UKPLab/sentence-transformers.git

Tom Aarsen

daegonYu · 2024-07-01T08:21:18Z

But it doesn't work...

Error Message

AttributeError: 'DistributedDataParallel' object has no attribute 'tokenizer'

I ran "pip install git+https://github.com/UKPLab/sentence-transformers.git" in my virtual environment. Therefore, I experimented with changing to the version below.

sentence-transformers 3.1.0.dev0

tomaarsen added bug Something isn't working good first issue Good for newcomers labels Jun 23, 2024

tomaarsen mentioned this issue Jun 25, 2024

[fix] Fix retokenization on DDP/DP with GIST losses #2775

Merged

tomaarsen closed this as completed in #2775 Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doesn't GISTEmbedLoss support DDP or DP? #2772

Doesn't GISTEmbedLoss support DDP or DP? #2772

daegonYu commented Jun 23, 2024

tomaarsen commented Jun 23, 2024 •

edited

Loading

tomaarsen commented Jun 25, 2024

daegonYu commented Jul 1, 2024

Doesn't GISTEmbedLoss support DDP or DP? #2772

Doesn't GISTEmbedLoss support DDP or DP? #2772

Comments

daegonYu commented Jun 23, 2024

tomaarsen commented Jun 23, 2024 • edited Loading

tomaarsen commented Jun 25, 2024

daegonYu commented Jul 1, 2024

tomaarsen commented Jun 23, 2024 •

edited

Loading