Continue to pretrain Gemma on TPU #133

bebechien · 2025-01-27T07:48:38Z

Description of the feature request:

A cookbook showcasing Continued Pretraining on TPU.
Continuing and potentially expanding the Gemma to further enhance the model's capabilities and efficiency.

What problem are you trying to solve with this feature?

Large language models (LLMs) like Gemma require massive computational resources for pretraining. The choice of hardware significantly impacts the efficiency, speed, and ultimately the quality of the resulting model. While Gemma has benefited from TPU pretraining, it's important to ensure this strategy continues to be a priority to maintain its competitive edge and drive further advancements.

Any other information you'd like to share?

No response

windmaple · 2025-01-27T09:17:31Z

This is being worked on by @kinarr et al. It is not hard but does need to revert the JAX sharding scheme back to Gemma v1 when it first came out; otherwise Kaggle TPU OOO (Colab TPU is hopeless).

bebechien added the wishlist A wish list of cookbooks label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continue to pretrain Gemma on TPU #133

Continue to pretrain Gemma on TPU #133

bebechien commented Jan 27, 2025

windmaple commented Jan 27, 2025

Continue to pretrain Gemma on TPU #133

Continue to pretrain Gemma on TPU #133

Comments

bebechien commented Jan 27, 2025

Description of the feature request:

What problem are you trying to solve with this feature?

Any other information you'd like to share?

windmaple commented Jan 27, 2025