🏡 TensorFlow Model Garden LMs

BERT with TensorFlow Model Garden

🔎 Overview

This repository showcases language model pretraining with the awesome TensorFlow Model Garden library.

The following LMs are currently supported:

BERT Pretraining - see pretraining instructions
Token Dropping for efficient BERT Pretraining - see pretraining instructions
Training ELECTRA Augmented with Multi-word Selection (TEAMS) - see pretraining instructions

💡 Features

Additionally, the following features are provided:

A cheatsheet for TPU VM creation (including all necessary dependencies to pretrain models with TF Model Garden library), which can be found here.
An extended pretraining data generation script that allows, for example, the use of tokenizers from the Hugging Face Model Hub or different data packing strategies (Original BERT packing or RoBERTa-like packing), which can be found here.
Conversion scripts that convert TF Model Garden weights to Hugging Face Transformers-compatible models, which can be found here.

🏡 Model Zoo

FineWeb-LMs

Following LMs were pretrained on the (10BT subset) of the famous FineWeb and FineWeb-Edu dataset:

BERT-based - find the best model checkpoint here
Token Dropping BERT-based - find the best model checkpoint here
TEAMS-based - fine the best model checkpoint here

All models can be found in the TensorFlow Model Garden LMs organization on the Model Hub and in this collection.

Detailed evaluation results with the ScandEval library are available in this repository.

❤️ Acknowledgements

This repository is the outcome of the last two years of working with TPUs from the awesome TRC program and the TensorFlow Model Garden library.

Made from Bavarian Oberland with ❤️ and 🥨.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
bert		bert
cheatsheet		cheatsheet
conversion		conversion
teams		teams
token-dropping-bert		token-dropping-bert
utils		utils
README.md		README.md
bert_tf_model_garden.png		bert_tf_model_garden.png
bert_tf_model_garden.webp		bert_tf_model_garden.webp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏡 TensorFlow Model Garden LMs

🔎 Overview

💡 Features

🏡 Model Zoo

FineWeb-LMs

❤️ Acknowledgements

About

Releases

Packages

Languages

stefan-it/model-garden-lms

Folders and files

Latest commit

History

Repository files navigation

🏡 TensorFlow Model Garden LMs

🔎 Overview

💡 Features

🏡 Model Zoo

FineWeb-LMs

❤️ Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages