Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue to pretrain Gemma on TPU #133

Open
bebechien opened this issue Jan 27, 2025 · 1 comment
Open

Continue to pretrain Gemma on TPU #133

bebechien opened this issue Jan 27, 2025 · 1 comment
Labels
wishlist A wish list of cookbooks

Comments

@bebechien
Copy link
Collaborator

Description of the feature request:

A cookbook showcasing Continued Pretraining on TPU.
Continuing and potentially expanding the Gemma to further enhance the model's capabilities and efficiency.

What problem are you trying to solve with this feature?

Large language models (LLMs) like Gemma require massive computational resources for pretraining. The choice of hardware significantly impacts the efficiency, speed, and ultimately the quality of the resulting model. While Gemma has benefited from TPU pretraining, it's important to ensure this strategy continues to be a priority to maintain its competitive edge and drive further advancements.

Any other information you'd like to share?

No response

@bebechien bebechien added the wishlist A wish list of cookbooks label Jan 27, 2025
@windmaple
Copy link
Collaborator

This is being worked on by @kinarr et al. It is not hard but does need to revert the JAX sharding scheme back to Gemma v1 when it first came out; otherwise Kaggle TPU OOO (Colab TPU is hopeless).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wishlist A wish list of cookbooks
Projects
None yet
Development

No branches or pull requests

2 participants