mugpt

mugpt is a fork of seqax by MatX modified to run on Google's TPU v4-32s and uses the hyperparameter transfer described by Everett et al. Previously, I experimented with a modified form of attention "SharedKV" which resides in the sharedkv branch as well as mu-parameter transfer described by Yang et al. For more information on these investigations, take a look here:

The installation procedure is identical to that described in seqax.

Getting started

Installation

Install graphviz from your system package manager: e.g. brew install graphviz or apt install graphviz.
Install Python dependencies, typically inside a virtualenv: python -m pip install -r requirements-cpu.txt.

NOTE: the requirements-cpu.txt is configured for CPU-based installation. For GPU or TPU installation, you may need a different install of JAX and jaxlib. Consult the JAX install documentation. If your GPU environment has a Torch-GPU installation, you may need to switch it to a Torch-CPU installation to avoid conflicts with JAX-GPU.

Run on CPU for local development

For development and testing you can run on CPU. Typically you'd use our synthetic dataset (which is checked into this repository) or the Huggingface data loader and you'd set XLA flags to simulate multiple devices so as to test that parallelism is working as intended:

XLA_FLAGS=--xla_force_host_platform_device_count=8 python -m train --config-name=local_test_synthetic +paths.model_name=synthetic_000

The paths.model_name flag specifies which subdirectory on disk (inside /tmp) to write model checkpoints to. You'll typically want to change this when starting a new model run.

Acknowledgements

Thanks to the MatX team for their implementation of GPT in seqax which I used to implement muP and SharedKV attention.

Thanks to the Google TPU Research Cloud, which has supported my investigations.

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
configs		configs
docs		docs
images		images
shardlib		shardlib
synthetic_dataset.zarr		synthetic_dataset.zarr
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
env.py		env.py
hpo.py		hpo.py
input_loader.py		input_loader.py
jax_extra.py		jax_extra.py
lr_sweep.py		lr_sweep.py
requirements-cpu.txt		requirements-cpu.txt
requirements-gpu.txt		requirements-gpu.txt
requirements-tpu.txt		requirements-tpu.txt
train.py		train.py
training_io.py		training_io.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mugpt

Getting started

Installation

Run on CPU for local development

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

License

clankur/muGPT

Folders and files

Latest commit

History

Repository files navigation

mugpt

Getting started

Installation

Run on CPU for local development

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages