Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Megatron #5

Open
shaileshj2803 opened this issue Apr 3, 2023 · 1 comment
Open

Megatron #5

shaileshj2803 opened this issue Apr 3, 2023 · 1 comment

Comments

@shaileshj2803
Copy link

Is it possible to give more details about which version of megatron and how to reference megatron during training. Detailed step by step instructions will be very very helpful. Thanks for the awesome work.

@dropreg
Copy link
Owner

dropreg commented Apr 5, 2023

Hi:

We updated the new version, which is easier to read.

We also give a step for efficient-finetuning using Megatron-LoRA (when you only have two 24G 3090s):

  1. First, you need to download following (As shown in alpaca/scripts/utils/README.md):

    • The LLaMA 7B model: consolidated.00.pth
    • The dictionary: alpaca/scripts/assert/dict.txt
    • The alpaca training data alpaca_data.json.
  2. Processing model:
    python alpaca_lora/scripts/utils/process_llama_megatron_ckpt.py --llama-model-dir -llama-model-file --parallel-size

  3. Processing data:
    bash prepare_llama_training_data.sh

  4. Training step, As shown in alpaca/scripts/megatron_lora/README.md:
    bash alpaca/scripts/megatron_lora/run_train_megatron_lora.sh

  5. Inference Step:

    • bash alpaca/scripts/megatron_lora/inference/run_inf_megatron_lora.sh
    • You can merge multiple megatron checkpoints into one:
      python merge_llama_megatron_ckpt.py
      bash alpaca/scripts/lora/inference/run_inf_hub.sh
      

In addition, please pay attention to modifying the parameters in above scripts, e.g., file path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants