Skip to content

v2.3.0 - Bug fixes, improved model loading & Cached MNRL

Compare
Choose a tag to compare
@tomaarsen tomaarsen released this 29 Jan 08:32
· 317 commits to master since this release

This release focuses on various bug fixes & improvements to keep up with adjacent works like transformers and huggingface_hub. These are the key changes in the release:

Pushing models to the Hugging Face Hub (#2376)

Prior to Sentence Transformers v2.3.0, saving models to the Hugging Face Hub may have resulted in various errors depending on the versions of the dependencies. Sentence Transformers v2.3.0 introduces a refactor to save_to_hub to resolve these issues.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
...
model.save_to_hub("tomaarsen/all-MiniLM-L6-v2-quora")
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 90.9M/90.9M [00:06<00:00, 13.7MB/s]
Upload 1 LFS files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:07<00:00,  7.11s/it]

Model Loading

Efficient model loading (#2345)

Recently, transformers has shifted towards using safetensors files as their primary model file formats. Additionally, various other file formats are commonly used, such as PyTorch (pytorch_model.bin), Rust (rust_model.ot), Tensorflow (tf_model.h5) and ONNX (model.onnx).

Prior to Sentence Transformers v2.3.0, almost all files of a repository would be downloaded, even if theye are not strictly required. Since v2.3.0, only the strictly required files will be downloaded. For example, when loading sentence-transformers/all-MiniLM-L6-v2 which has its model weights in three formats (pytorch_model.bin, rust_model.ot, tf_model.h5), only pytorch_model.bin will be downloaded. Additionally, when downloading intfloat/multilingual-e5-small with two formats (model.safetensors, pytorch_model.bin), only model.safetensors will be downloaded.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
Downloading modules.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 349/349 [00:00<?, ?B/s]
Downloading (…)ce_transformers.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<?, ?B/s]
Downloading README.md: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 10.6k/10.6k [00:00<?, ?B/s]
Downloading (…)nce_bert_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 53.0/53.0 [00:00<?, ?B/s]
Downloading config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 612/612 [00:00<?, ?B/s]
Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████| 90.9M/90.9M [00:06<00:00, 15.0MB/s]
Downloading tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 350/350 [00:00<?, ?B/s]
Downloading vocab.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 1.37MB/s]
Downloading tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 4.61MB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 112/112 [00:00<?, ?B/s]
Downloading 1_Pooling/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:00<?, ?B/s]

Note

This release updates the default cache location from ~/.cache/torch/sentence_transformers to the default cache location of transformers, i.e. ~/.cache/huggingface. You can still specify custom cache locations via the SENTENCE_TRANSFORMERS_HOME environment variable or the cache_folder argument.
Additionally, by supporting newer versions of various dependencies (e.g. huggingface_hub), the cache format changed. A consequence is that the old cached models cannot be used in v2.3.0 onwards, and those models need to be redownloaded. Once redownloaded, an airgapped machine can load the model like normal despite having no internet access.

Loading custom models (#2398)

This release brings models with custom code to Sentence Transformers through trust_remote_code, such as jinaai/jina-embeddings-v2-base-en.

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

model = SentenceTransformer("jinaai/jina-embeddings-v2-base-en", trust_remote_code=True)
embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])

print(cos_sim(embeddings[0], embeddings[1]))
# => tensor([[0.9341]])

Loading specific revisions (#2419)

If an embedding model is ever updated, it would invalidate all of the embeddings that you have created with the prior version of that model. We promise to never update the weights of any sentence-transformers/... model, but we cannot offer this guarantee for models by the community.

That is why this version introduces a revision keyword, allowing you to specify exactly which revision or branch you'd like to load:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BAAI/bge-small-en-v1.5", revision="982532469af0dff5df8e70b38075b0940e863662")
# or a branch:
model = SentenceTransformer("BAAI/bge-small-en-v1.5", revision="main")

Soft deprecation of use_auth_token, use token instead (#2376)

Following updates from transformers & huggingface_hub, Sentence Transformers now recommends that you use the token argument to provide your Hugging Face authentication token to download private models.

from sentence_transformers import SentenceTransformer

# new:
model = SentenceTransformer("tomaarsen/all-mpnet-base-v2", token="hf_...")
# old, still works, but throws a warning to upgrade to "token"
model = SentenceTransformer("tomaarsen/all-mpnet-base-v2", use_auth_token="hf_...")

Note

The recommended way to include your Hugging Face authentication token is to run huggingface-cli login & paste your User Access Token from your Hugging Face Settings. See these docs for more information. Then, you don't have to include the token argument at all; it'll be automatically read from your filesystem.

Device patch (#2351)

Prior to this release, SentenceTransformers.device would not always correspond to the device on which embeddings were computed, or on which a model gets trained. This release brings a few fixes:

  • SentenceTransformers.device now always corresponds to the device that the model is on, and on which it will do its computations.
  • Models are now immediately moved to their specified device, rather than lazily whenever the model is being used.
  • SentenceTransformers.to(...), SentenceTransformers.cpu(), SentenceTransformers.cuda(), etc. will now work as expected, rather than being ignored.

Cached Multiple Negatives Ranking Loss (CMNRL) (#1759)

MultipleNegativesRankingLoss (MNRL) is a powerful loss function that is commonly applied to train embedding models. It uses in-batch negative sampling to produce a large number of negative pairs, allowing the model to receive a training signal to push the embeddings of this pair apart. It is commonly shown that a larger batch size results in better performing models (Qu et al., 2021, Li et al., 2023), but a larger batch size requires more VRAM in practice.

To counteract that, @kwang2049 has implemented a slightly modified GradCache technique that is able to separate the batch computation into mini-batches without any reduction in training quality. This allows the common practitioner to train with competitive batch sizes, e.g. 65536!
The downside is that training with Cached MNRL (CMNRL) is roughly 2 to 2.4 times slower than using normal MNRL.

CachedMultipleNegativesRankingLoss is a drop-in replacement for MultipleNegativesRankingLoss, but with a new mini_batch_size argument. I recommend trying out CMNRL with a large batch size and a fairly small mini_batch_size - the larger mini batch size that will fit into memory.

from sentence_transformers import SentenceTransformer, losses, InputExample
from torch.utils.data import DataLoader

model = SentenceTransformer("distilbert-base-uncased")
train_examples = [
    InputExample(texts=['Anchor 1', 'Positive 1']),
    InputExample(texts=['Anchor 2', 'Positive 2']),
]
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=1024)  # Here we can try much larger batch sizes!
train_loss = losses.CachedMultipleNegativesRankingLoss(model=model, mini_batch_size = 32)

model.fit([(train_dataloader, train_loss)], ...)

Community Detection (#1879, #2277, #2381)

This release updates the community_detection function in various ways. Notably:

  1. It should no longer run forever when there is only one community (d8982c9).
  2. A new show_progress_bar option has been added (#1879)
  3. The first item in each community is now the cluster centroid, and all subsequent items are sorted by similarity to the centroid (#2277)
  4. Heavily improve processing speed on GPUs (#2381)

In the below graph, master refers to Sentence Transformers v2.2.2 and refactor refers to v2.3.0. On GPU, the computation time was heavily reduced.
290551911-d52142c4-ffc6-4ff2-8a8e-80502e414e76

Updated Dependencies (#2376, #2432)

Sentence Transformers has deprecated Python 3.7 following its end of security support. Additionally, various dependencies have been updated to prevent functionality from breaking. In particular:

  • torch >= 1.11.0
  • transformers>= 4.32.0
  • huggingface_hub>=0.15.1

Lastly, torchvision has been removed as a dependency.

Additional Highlights

See the following for a list of release highlights:

  • Add weighted mean & last token pooling for SGPT support by @Muennighoff (#1613)
  • Prevent community_detection from running forever by @nreimers (d8982c9)
  • Allow loading private transformers models by @su-park (#1682)
  • Add support for multilingual T5 encoders (db34d38)
  • Reduce RAM usage in InformationRetrievalEvaluator and util.semantic_search by @kwang2049 (#1715)
  • Automatically place models on MPS if available by @nikitajz (#2342)
  • Add a progress bar for community detection by @Marlon154 (#1879)
  • Simplify tests, add CI, patch paraphrase_mining_embeddings by @tomaarsen (#2350)
  • Remove unused torchvision dependency by @dvruette (#1881)
  • Introduce Pillow as a dependency by @tomaarsen (#2374)
  • Remove Python 3.7 support by @tomaarsen (#2375)
  • Refactor model loading, no more unnecessary file downloads by @tomaarsen (#2345)
  • Prevent to from getting ignored, replace ._target_device with .device by @tomaarsen (#2351)
  • Add normalize_embeddings support to multi-process encoding by @tomaarsen (#2377)
  • Fix multi-process encoding on CUDA devices by @tomaarsen (#2377)
  • Simplify & fix save_to_hub, remote git dependency, add token argument by @tomaarsen (#2376)
  • Update dependencies: transformers>=4.32.0 and huggingface_hub>=0.15.1 by @tomaarsen (#2376)
  • Simplify the smart_batching_collate function by @vsuarezpaniagua (#1852)
  • Fix indexing of lasttoken pooling for longest sequence by @ssharpe42 (#2111)
  • Set the Linear device equal to the main model device in SoftmaxLoss by @tomaarsen (#2378)
  • Ensure the first item in each community is the cluster centroid in community_detection by @dyaaalbakour (#2277)
  • Improve efficiency of community detection on GPU by @tomaarsen (#2381)
  • Use the library_name metadata in the model card by @tomaarsen (#2386)
  • Fix error when encoding empty list with convert_to_tensor=True by @oToToT (#1775)
  • Add return type hints to util methods by @zachschillaci27 (#1754)
  • Also accept word2vec format in WordEmbeddings by @mokha (#1875)
  • Fix LSTM layer on newer torch versions by @lambdaofgod (#1420)
  • Pass token and trust_remote_code to tokenizer_args too by @tomaarsen (#2411)
  • If cache_folder nor SENTENCE_TRANSFORMERS_HOME are set, use HF default cache by @tomaarsen (#2412)
  • replace unittest with pytest by @bwanglzu (#2407)
  • Add GradCache + MNRL: Go beyond GPU-memory limit for MNRL by @kwang2049 (#1759)
  • Add revision to load a specific model version by @tomaarsen (#2419)
  • Add @k at the end of csv file name for RerankingEvaluator by @milistu (#2427)
  • bump the minimum supported torch version to 1.11 by @statelesshz (#2432)