add docs for lora generate and hostfile setup

Signed-off-by: ftgreat <[email protected]>
FlagAI-Open · Jun 26, 2023 · 781e153 · 781e153
1 parent ab0b549
commit 781e153
Show file tree

Hide file tree

Showing 12 changed files with 137 additions and 58 deletions.
diff --git a/examples/Aquila/Aquila-chat/README.md b/examples/Aquila/Aquila-chat/README.md
@@ -83,32 +83,45 @@ python generate_chat_bminf.py
 
 ### 基础模型微调-SFT
 
-1. 配置`hostfile`文件
+1. 准备微调的初始模型(放在checkpoints_in里)
+
+2. 配置`hostfile`文件
     <details><summary>详情如下：</summary>
     以单机八卡为例
     1. 查看本机ip地址
+
             ```
             ifconfig eth0 | grep "inet " | awk '{print $2}'
             ```
+
     2. 在`hostfile`里填入
+
             ```
             [上一步得到的ip地址] slots=8
             ```
     3. 确认本机可以免密登录,可用如下指令测试
+
             ```
             ssh localhost
             ```
-    
+        如果不能免密登录，可以尝试以下方法配置免密
+
+            ```
+            ssh-keygen -t rsa  
+            cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
+            service sshd restart
+            ```
     </details>
 
-2. 启动训练脚本
+3. 启动训练脚本
     ```
     bash dist_trigger_docker.sh hostfile Aquila-chat.yaml aquilachat-7b aquila_experiment
     ```
     **如果想启动LoRA微调(可在单张V100上运行微调)，上一步改为运行**
     ```
     bash dist_trigger_docker.sh hostfile Aquila-chat-lora.yaml aquilachat-7b aquila_experiment
     ```
+    注：lora训练出来的模型需要用generate_chat_lora.py来推理，并在autoloader加载模型时添加训练时用的lora参数
 
 <details><summary>正确运行输出信息如下所示：</summary>
 

diff --git a/examples/Aquila/Aquila-chat/README_en.md b/examples/Aquila/Aquila-chat/README_en.md
@@ -92,22 +92,30 @@ Note: The Aquila-7B basic model may not perform as well for dialogue reasoning t
 ### (Supervised Fine-tuning(SFT)
 
 1. Configure the `hostfile` file.
-2. 
+ 
     <details><summary>Details are as follows:</summary>
-
     Taking a single machine with eight GPUs as an example:
+
     1. Check the IP address of the local machine:
-            ```
-            ifconfig eth0 | grep "inet " | awk '{print $2}'
-            ```
+        ```
+        ifconfig eth0 | grep "inet " | awk '{print $2}'
+        ```
     2. Fill in the `hostfile` with the following
-            ```
-            [上一步得到的ip地址] slots=8
-            ```
+        ```
+        [ip address from last step] slots=8
+        ```
     3. Confirm that the local machine can log in without a password by testing using the following command: 
-            ```
-            ssh localhost
-            ```
+        ```
+        ssh localhost
+        ```
+
+        You can try the following command to log in without a password 
+
+        ```
+        ssh-keygen -t rsa  
+        cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
+        service sshd restart
+        ```
     
     </details>
 
@@ -119,7 +127,7 @@ Note: The Aquila-7B basic model may not perform as well for dialogue reasoning t
     ```
     bash dist_trigger_docker.sh hostfile Aquila-chat-lora.yaml aquila-7b aquila_experiment
     ```
-
+    The model trained using LoRa needs to be inferred using "generate_chat_lora.py", and the Lora parameters used during training should be added when loading the model in the autoloader.
 <details><summary>The correct output information is shown below:</summary>
 
 The following information will be output. Note that `NODES_NUM` should be equal to the number of nodes, and `LOGFILE` is the log file for the model run.

diff --git a/examples/Aquila/Aquila-chat/generate_chat_bminf.py b/examples/Aquila/Aquila-chat/generate_chat_bminf.py
@@ -24,7 +24,7 @@
 model.eval()
 
 with torch.cuda.device(0):
-    model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30)
+    model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30)  # n << 30 is equivalent to n GB memory limit
 
 predictor = Predictor(model, tokenizer)
 

diff --git a/examples/Aquila/Aquila-code/README.md b/examples/Aquila/Aquila-code/README.md
@@ -94,20 +94,30 @@ python generate_code_bminf.py
     <details><summary>详情如下：</summary>
 
     以单机八卡为例
-
+    
     1. 查看本机ip地址
-        ```
-        ifconfig eth0 | grep "inet " | awk '{print $2}'
-        ```
+
+            ```
+            ifconfig eth0 | grep "inet " | awk '{print $2}'
+            ```
+
     2. 在`hostfile`里填入
-        ```
-        [上一步得到的ip地址] slots=8
-        ```
+
+            ```
+            [上一步得到的ip地址] slots=8
+            ```
     3. 确认本机可以免密登录,可用如下指令测试
-        ```
-        ssh localhost
-        ```
-    
+
+            ```
+            ssh localhost
+            ```
+        如果不能免密登录，可以尝试以下方法配置免密
+
+            ```
+            ssh-keygen -t rsa  
+            cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
+            service sshd restart
+            ```
     </details>
    
 3. 启动训练脚本

diff --git a/examples/Aquila/Aquila-code/README_en.md b/examples/Aquila/Aquila-code/README_en.md
@@ -96,13 +96,20 @@ Currently, the minimum requirement for pre-training the 7B base model is to run
         ```
     2. Fill in the `hostfile` with the following
         ```
-        [上一步得到的ip地址] slots=8
+        [ip address from last step] slots=8
         ```
     3. Confirm that the local machine can log in without a password by testing using the following command: 
         ```
         ssh localhost
         ```
-    
+
+        You can try the following command to log in without a password 
+
+        ```
+        ssh-keygen -t rsa  
+        cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
+        service sshd restart
+        ```
     </details>
    
 3. Run the training script:

diff --git a/examples/Aquila/Aquila-code/generate_code_bminf.py b/examples/Aquila/Aquila-code/generate_code_bminf.py
@@ -25,8 +25,8 @@
 tokenizer = loader.get_tokenizer()
 model.half()
 model.eval()
-model.cuda()
-model.to(device)
+with torch.cuda.device(0):
+    model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30) # n << 30 is equivalent to n GB memory limit
 
 vocab = tokenizer.get_vocab()
 

diff --git a/examples/Aquila/Aquila-pretrain/README.md b/examples/Aquila/Aquila-pretrain/README.md
@@ -89,17 +89,28 @@ python generate_bminf.py
     <details><summary>详情如下：</summary>
     以单机八卡为例
     1. 查看本机ip地址
+
             ```
             ifconfig eth0 | grep "inet " | awk '{print $2}'
             ```
+
     2. 在`hostfile`里填入
+
             ```
             [上一步得到的ip地址] slots=8
             ```
     3. 确认本机可以免密登录,可用如下指令测试
+
             ```
             ssh localhost
             ```
+        如果不能免密登录，可以尝试以下方法配置免密
+
+            ```
+            ssh-keygen -t rsa  
+            cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
+            service sshd restart
+            ```
     
     </details>
 

diff --git a/examples/Aquila/Aquila-pretrain/README_en.md b/examples/Aquila/Aquila-pretrain/README_en.md
@@ -96,22 +96,30 @@ Note: The Aquila-7B base model may not perform as well for dialogue reasoning ta
     ```
 
 2. Configure the `hostfile` file.
-3. 
     <details><summary>Details are as follows:</summary>
 
     Taking a single machine with eight GPUs as an example:
+
     1. Check the IP address of the local machine:
-            ```
-            ifconfig eth0 | grep "inet " | awk '{print $2}'
-            ```
+        ```
+        ifconfig eth0 | grep "inet " | awk '{print $2}'
+        ```
     2. Fill in the `hostfile` with the following
-            ```
-            [上一步得到的ip地址] slots=8
-            ```
+        ```
+        [ip address from last step] slots=8
+        ```
     3. Confirm that the local machine can log in without a password by testing using the following command: 
-            ```
-            ssh localhost
-            ```
+        ```
+        ssh localhost
+        ```
+
+        You can try the following command to log in without a password 
+
+        ```
+        ssh-keygen -t rsa  
+        cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
+        service sshd restart
+        ```
     
     </details>
 

diff --git a/examples/Aquila/Aquila-pretrain/generate_bminf.py b/examples/Aquila/Aquila-pretrain/generate_bminf.py
@@ -22,7 +22,7 @@
 model.eval()
 
 with torch.cuda.device(0):
-    model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30)
+    model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30) # n << 30 is equivalent to n GB memory limit
 
 predictor = Predictor(model, tokenizer)
 

diff --git a/examples/Aquila/README.md b/examples/Aquila/README.md
@@ -81,41 +81,54 @@ python generate_bminf.py
 
 ### 基础模型微调-SFT
 
-1. 进入对话模型微调目录, 并在checkpoints_in目录下准备好需要微调的预训练模型
+1. 准备微调的初始模型(放在checkpoints_in里)
+
+2. 进入对话模型微调目录, 并在checkpoints_in目录下准备好需要微调的预训练模型
   
     假设刚刚在Aquila-pretrain下运行了推理脚本，则可以运行
     ```
     cd ./Aquila-chat
     mv ./Aquila-pretrain/checkpoints_in ./
     ```
 
-2. 配置`hostfile`文件
+3. 配置`hostfile`文件
     <details><summary>详情如下：</summary>
     以单机八卡为例
     1. 查看本机ip地址
+
             ```
             ifconfig eth0 | grep "inet " | awk '{print $2}'
             ```
+
     2. 在`hostfile`里填入
+
             ```
             [上一步得到的ip地址] slots=8
             ```
     3. 确认本机可以免密登录,可用如下指令测试
+
             ```
             ssh localhost
             ```
-    
+        如果不能免密登录，可以尝试以下方法配置免密
+
+            ```
+            ssh-keygen -t rsa  
+            cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
+            service sshd restart
+            ```
+
     </details>
 
-3. 启动训练脚本
+4. 启动训练脚本
     ```
     bash dist_trigger_docker.sh hostfile Aquila-chat.yaml aquila-7b aquila_experiment
     ```
     **如果想启动LoRA微调(可在单张V100上运行微调)，上一步改为运行**
     ```
     bash dist_trigger_docker.sh hostfile Aquila-chat-lora.yaml aquila-7b aquila_experiment
     ```
-
+    注：lora训练出来的模型需要用generate_chat_lora.py来推理，并在autoloader加载模型时添加训练时用的lora参数。
 <details><summary>正确运行输出信息如下所示：</summary>
 
 首先会输出下列信息，注意`NODES_NUM`应该与节点数相等，`LOGFILE`是模型运行的日志文件。

diff --git a/examples/Aquila/README_en.md b/examples/Aquila/README_en.md
@@ -94,22 +94,31 @@ Note: The Aquila-7B basic model may not perform as well for dialogue reasoning t
     ```
 
 2. Configure the `hostfile` file.
-3. 
+
     <details><summary>Details are as follows:</summary>
 
     Taking a single machine with eight GPUs as an example:
+
     1. Check the IP address of the local machine:
-            ```
-            ifconfig eth0 | grep "inet " | awk '{print $2}'
-            ```
+        ```
+        ifconfig eth0 | grep "inet " | awk '{print $2}'
+        ```
     2. Fill in the `hostfile` with the following
-            ```
-            [上一步得到的ip地址] slots=8
-            ```
+        ```
+        [ip address from last step] slots=8
+        ```
     3. Confirm that the local machine can log in without a password by testing using the following command: 
-            ```
-            ssh localhost
-            ```
+        ```
+        ssh localhost
+        ```
+
+        You can try the following command to log in without a password 
+
+        ```
+        ssh-keygen -t rsa  
+        cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
+        service sshd restart
+        ```
     
     </details>
 

diff --git a/examples/Aquila/generate_bminf.py b/examples/Aquila/generate_bminf.py
@@ -22,7 +22,7 @@
 model.eval()
 
 with torch.cuda.device(0):
-    model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30)
+    model = bminf.wrapper(model, quantization=False, memory_limit=2 << 30) # n << 30 is equivalent to n GB memory limit
 
 predictor = Predictor(model, tokenizer)