Skip to content

Commit

Permalink
add docs for lora generate and hostfile setup
Browse files Browse the repository at this point in the history
Signed-off-by: ftgreat <[email protected]>
  • Loading branch information
ftgreat committed Jun 26, 2023
1 parent ab0b549 commit 781e153
Show file tree
Hide file tree
Showing 12 changed files with 137 additions and 58 deletions.
19 changes: 16 additions & 3 deletions examples/Aquila/Aquila-chat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,32 +83,45 @@ python generate_chat_bminf.py
### 基础模型微调-SFT
1. 配置`hostfile`文件
1. 准备微调的初始模型(放在checkpoints_in里)
2. 配置`hostfile`文件
<details><summary>详情如下:</summary>
以单机八卡为例
1. 查看本机ip地址
```
ifconfig eth0 | grep "inet " | awk '{print $2}'
```
2. 在`hostfile`里填入
```
[上一步得到的ip地址] slots=8
```
3. 确认本机可以免密登录,可用如下指令测试
```
ssh localhost
```
如果不能免密登录,可以尝试以下方法配置免密
```
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
service sshd restart
```
</details>
2. 启动训练脚本
3. 启动训练脚本
```
bash dist_trigger_docker.sh hostfile Aquila-chat.yaml aquilachat-7b aquila_experiment
```
**如果想启动LoRA微调(可在单张V100上运行微调),上一步改为运行**
```
bash dist_trigger_docker.sh hostfile Aquila-chat-lora.yaml aquilachat-7b aquila_experiment
```
注:lora训练出来的模型需要用generate_chat_lora.py来推理,并在autoloader加载模型时添加训练时用的lora参数
<details><summary>正确运行输出信息如下所示:</summary>
Expand Down
32 changes: 20 additions & 12 deletions examples/Aquila/Aquila-chat/README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,22 +92,30 @@ Note: The Aquila-7B basic model may not perform as well for dialogue reasoning t
### (Supervised Fine-tuning(SFT)
1. Configure the `hostfile` file.
2.
<details><summary>Details are as follows:</summary>
Taking a single machine with eight GPUs as an example:
1. Check the IP address of the local machine:
```
ifconfig eth0 | grep "inet " | awk '{print $2}'
```
```
ifconfig eth0 | grep "inet " | awk '{print $2}'
```
2. Fill in the `hostfile` with the following
```
[上一步得到的ip地址] slots=8
```
```
[ip address from last step] slots=8
```
3. Confirm that the local machine can log in without a password by testing using the following command:
```
ssh localhost
```
```
ssh localhost
```
You can try the following command to log in without a password
```
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
service sshd restart
```
</details>
Expand All @@ -119,7 +127,7 @@ Note: The Aquila-7B basic model may not perform as well for dialogue reasoning t
```
bash dist_trigger_docker.sh hostfile Aquila-chat-lora.yaml aquila-7b aquila_experiment
```
The model trained using LoRa needs to be inferred using "generate_chat_lora.py", and the Lora parameters used during training should be added when loading the model in the autoloader.
<details><summary>The correct output information is shown below:</summary>
The following information will be output. Note that `NODES_NUM` should be equal to the number of nodes, and `LOGFILE` is the log file for the model run.
Expand Down
2 changes: 1 addition & 1 deletion examples/Aquila/Aquila-chat/generate_chat_bminf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
model.eval()

with torch.cuda.device(0):
model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30)
model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30) # n << 30 is equivalent to n GB memory limit

predictor = Predictor(model, tokenizer)

Expand Down
32 changes: 21 additions & 11 deletions examples/Aquila/Aquila-code/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,20 +94,30 @@ python generate_code_bminf.py
<details><summary>详情如下:</summary>
以单机八卡为例
1. 查看本机ip地址
```
ifconfig eth0 | grep "inet " | awk '{print $2}'
```
```
ifconfig eth0 | grep "inet " | awk '{print $2}'
```
2. 在`hostfile`里填入
```
[上一步得到的ip地址] slots=8
```
```
[上一步得到的ip地址] slots=8
```
3. 确认本机可以免密登录,可用如下指令测试
```
ssh localhost
```
```
ssh localhost
```
如果不能免密登录,可以尝试以下方法配置免密
```
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
service sshd restart
```
</details>
3. 启动训练脚本
Expand Down
11 changes: 9 additions & 2 deletions examples/Aquila/Aquila-code/README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,13 +96,20 @@ Currently, the minimum requirement for pre-training the 7B base model is to run
```
2. Fill in the `hostfile` with the following
```
[上一步得到的ip地址] slots=8
[ip address from last step] slots=8
```
3. Confirm that the local machine can log in without a password by testing using the following command:
```
ssh localhost
```
You can try the following command to log in without a password
```
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
service sshd restart
```
</details>
3. Run the training script:
Expand Down
4 changes: 2 additions & 2 deletions examples/Aquila/Aquila-code/generate_code_bminf.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@
tokenizer = loader.get_tokenizer()
model.half()
model.eval()
model.cuda()
model.to(device)
with torch.cuda.device(0):
model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30) # n << 30 is equivalent to n GB memory limit

vocab = tokenizer.get_vocab()

Expand Down
11 changes: 11 additions & 0 deletions examples/Aquila/Aquila-pretrain/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,17 +89,28 @@ python generate_bminf.py
<details><summary>详情如下:</summary>
以单机八卡为例
1. 查看本机ip地址
```
ifconfig eth0 | grep "inet " | awk '{print $2}'
```
2. 在`hostfile`里填入
```
[上一步得到的ip地址] slots=8
```
3. 确认本机可以免密登录,可用如下指令测试
```
ssh localhost
```
如果不能免密登录,可以尝试以下方法配置免密
```
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
service sshd restart
```
</details>
Expand Down
28 changes: 18 additions & 10 deletions examples/Aquila/Aquila-pretrain/README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,22 +96,30 @@ Note: The Aquila-7B base model may not perform as well for dialogue reasoning ta
```
2. Configure the `hostfile` file.
3.
<details><summary>Details are as follows:</summary>
Taking a single machine with eight GPUs as an example:
1. Check the IP address of the local machine:
```
ifconfig eth0 | grep "inet " | awk '{print $2}'
```
```
ifconfig eth0 | grep "inet " | awk '{print $2}'
```
2. Fill in the `hostfile` with the following
```
[上一步得到的ip地址] slots=8
```
```
[ip address from last step] slots=8
```
3. Confirm that the local machine can log in without a password by testing using the following command:
```
ssh localhost
```
```
ssh localhost
```
You can try the following command to log in without a password
```
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
service sshd restart
```
</details>
Expand Down
2 changes: 1 addition & 1 deletion examples/Aquila/Aquila-pretrain/generate_bminf.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
model.eval()

with torch.cuda.device(0):
model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30)
model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30) # n << 30 is equivalent to n GB memory limit

predictor = Predictor(model, tokenizer)

Expand Down
23 changes: 18 additions & 5 deletions examples/Aquila/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,41 +81,54 @@ python generate_bminf.py
### 基础模型微调-SFT
1. 进入对话模型微调目录, 并在checkpoints_in目录下准备好需要微调的预训练模型
1. 准备微调的初始模型(放在checkpoints_in里)
2. 进入对话模型微调目录, 并在checkpoints_in目录下准备好需要微调的预训练模型
假设刚刚在Aquila-pretrain下运行了推理脚本,则可以运行
```
cd ./Aquila-chat
mv ./Aquila-pretrain/checkpoints_in ./
```
2. 配置`hostfile`文件
3. 配置`hostfile`文件
<details><summary>详情如下:</summary>
以单机八卡为例
1. 查看本机ip地址
```
ifconfig eth0 | grep "inet " | awk '{print $2}'
```
2. 在`hostfile`里填入
```
[上一步得到的ip地址] slots=8
```
3. 确认本机可以免密登录,可用如下指令测试
```
ssh localhost
```
如果不能免密登录,可以尝试以下方法配置免密
```
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
service sshd restart
```
</details>
3. 启动训练脚本
4. 启动训练脚本
```
bash dist_trigger_docker.sh hostfile Aquila-chat.yaml aquila-7b aquila_experiment
```
**如果想启动LoRA微调(可在单张V100上运行微调),上一步改为运行**
```
bash dist_trigger_docker.sh hostfile Aquila-chat-lora.yaml aquila-7b aquila_experiment
```
注:lora训练出来的模型需要用generate_chat_lora.py来推理,并在autoloader加载模型时添加训练时用的lora参数。
<details><summary>正确运行输出信息如下所示:</summary>
首先会输出下列信息,注意`NODES_NUM`应该与节点数相等,`LOGFILE`是模型运行的日志文件。
Expand Down
29 changes: 19 additions & 10 deletions examples/Aquila/README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,22 +94,31 @@ Note: The Aquila-7B basic model may not perform as well for dialogue reasoning t
```
2. Configure the `hostfile` file.
3.
<details><summary>Details are as follows:</summary>
Taking a single machine with eight GPUs as an example:
1. Check the IP address of the local machine:
```
ifconfig eth0 | grep "inet " | awk '{print $2}'
```
```
ifconfig eth0 | grep "inet " | awk '{print $2}'
```
2. Fill in the `hostfile` with the following
```
[上一步得到的ip地址] slots=8
```
```
[ip address from last step] slots=8
```
3. Confirm that the local machine can log in without a password by testing using the following command:
```
ssh localhost
```
```
ssh localhost
```
You can try the following command to log in without a password
```
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
service sshd restart
```
</details>
Expand Down
2 changes: 1 addition & 1 deletion examples/Aquila/generate_bminf.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
model.eval()

with torch.cuda.device(0):
model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30)
model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30) # n << 30 is equivalent to n GB memory limit

predictor = Predictor(model, tokenizer)

Expand Down

0 comments on commit 781e153

Please sign in to comment.