Skip to content

v3.0.1 - Patch introducing new Trainer features, model card improvements and evaluator fixes

Compare
Choose a tag to compare
@tomaarsen tomaarsen released this 07 Jun 13:01
· 141 commits to master since this release

This patch release introduces some improvements for the SentenceTransformerTrainer, as well as some updates for the automatic model card generation. It also patches some minor evaluator bugs and a bug with MatryoshkaLoss. Lastly, every single Sentence Transformer model can now be saved and loaded with the safer model.safetensors files.

Install this version with

# Full installation:
pip install sentence-transformers[train]==3.0.1

# Inference only:
pip install sentence-transformers==3.0.1

SentenceTransformerTrainer improvements

  • Implement gradient checkpointing for lower memory usage during training (#2717)
  • Implement support for push_to_hub=True Training Argument, also implement trainer.push_to_hub(...) (#2718)

Model Cards

This patch release improves on the automatically generated model cards in several ways:

  • Your training datasets are now automatically linked if they're on Hugging Face (#2711)
  • A new generated_from_trainer tag is now also added (#2710)
  • The automatically included widget examples are now improved, especially for question-answering. Previously, the widget could give examples of comparing two questions with eachother (#2713)
  • If you save a model locally, then load it again and upload it, it would previously still show
...
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
...

This now gets replaced with your new model ID on Hugging Face (#2714)

  • The exact training dataset size is now included in the model metadata, rather than as a bucket of e.g. 1K<n<10K (#2728)

Evaluators fixes

  • The primary metric of evaluators in SequentialEvaluator would be ignored in the scores calculation (#2700)
  • Fix confusing print statement in TranslationEvaluator when using print_wrong_matches=True (#1894)
  • Fix bug that prevents you from customizing the primary_metric in InformationRetrievalEvaluator (#2701)
  • Allow passing a list of evaluators to the STTrainer rather than a SequentialEvaluator (#2717)

Losses fixes

  • Fix MatryoshkaLoss crash if the first dimension is not the biggest (#2719)

Security

  • Integrate safetensors with all modules, including Dense, LSTM, CNN, etc. to prevent needing pickled pytorch_model.bin anymore (#2722)

All changes

  • updating to evaluation_strategy by @higorsilvaa in #2686
  • fix loss link by @Samoed in #2690
  • Fix bug that restricts users from specifying custom primary_function in InformationRetrievalEvaluator by @hetulvp in #2701
  • Fix a bug in SequentialEvaluator to use primary_metric if defined in evaluator. by @hetulvp in #2700
  • [fix] Always override the originally saved version in the ST config by @tomaarsen in #2709
  • [model cards] Also include HF datasets in the model card metadata by @tomaarsen in #2711
  • Add "generated_from_trainer" tag to auto-generated model cards by @tomaarsen in #2710
  • Fix confusing print statement in TranslationEvaluator by @NathanS-Git in #1894
  • [model cards] Improve the widget example selection: not based on embeddings, better for QA by @tomaarsen in #2713
  • [model cards] Replace 'sentence_transformers_model_id' from reused model if possible by @tomaarsen in #2714
  • [feat] Allow passing a list of evaluators to the Trainer by @tomaarsen in #2716
  • [fix] Fix gradient checkpointing to allow for much lower memory usage by @tomaarsen in #2717
  • [fix] Implement create_model_card on the Trainer, allowing args.push_to_hub=True by @tomaarsen in #2718
  • [fix] Fix MatryoshkaLoss crash if the first dimension is not the biggest by @tomaarsen in #2719
  • Update models_en_sentence_embeddings.html by @saikartheekb in #2720
  • [typing] Improve typing for many functions & add py.typed to satisfy mypy by @tomaarsen in #2724
  • [fix] Fix edge case with evaluator being None by @tomaarsen in #2726
  • [simplify] Set can_return_loss=True globally, instead of via the data collator by @tomaarsen in #2727
  • [feat] Integrate safetensors with Dense, etc. modules too. by @tomaarsen in #2722
  • [model cards] Specify the exact dataset size as a tag, will be bucketized by HF by @tomaarsen in #2728

New Contributors

Full Changelog: v3.0.0...v3.0.1