You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A cookbook showcasing Continued Pretraining on TPU.
Continuing and potentially expanding the Gemma to further enhance the model's capabilities and efficiency.
What problem are you trying to solve with this feature?
Large language models (LLMs) like Gemma require massive computational resources for pretraining. The choice of hardware significantly impacts the efficiency, speed, and ultimately the quality of the resulting model. While Gemma has benefited from TPU pretraining, it's important to ensure this strategy continues to be a priority to maintain its competitive edge and drive further advancements.
Any other information you'd like to share?
No response
The text was updated successfully, but these errors were encountered:
This is being worked on by @kinarr et al. It is not hard but does need to revert the JAX sharding scheme back to Gemma v1 when it first came out; otherwise Kaggle TPU OOO (Colab TPU is hopeless).
Description of the feature request:
A cookbook showcasing Continued Pretraining on TPU.
Continuing and potentially expanding the Gemma to further enhance the model's capabilities and efficiency.
What problem are you trying to solve with this feature?
Large language models (LLMs) like Gemma require massive computational resources for pretraining. The choice of hardware significantly impacts the efficiency, speed, and ultimately the quality of the resulting model. While Gemma has benefited from TPU pretraining, it's important to ensure this strategy continues to be a priority to maintain its competitive edge and drive further advancements.
Any other information you'd like to share?
No response
The text was updated successfully, but these errors were encountered: