Aquila

Aquila Language Model is the first open source language model that supports both Chinese and English knowledge, commercial license agreements, and compliance with domestic data regulations.

🌟 Supports open source commercial licenses. The source code of the Aquila series models is based on the Apache 2.0 agreement, while the model weight is based on the BAAI Aquila Model License Agreement. Users can use it for commercial purposes as long as they meet the licensing restrictions.
✍️ Possesses Chinese and English knowledge. The Aquila series model is trained from scratch on a high-quality corpus of Chinese and English languages, with Chinese corpora accounting for about 40%, ensuring that the model accumulates native Chinese world knowledge during the pre-training phase, rather than translated knowledge.
👮‍♀️ Complies with domestic data regulations. The Chinese corpora of the Aquila series models come from Intelligence Source's accumulated Chinese datasets over the years, including Chinese internet data from over 10,000 sources (more than 99% of which are domestic sources), as well as high-quality Chinese literature and book data supported by authoritative domestic organizations. We will continue to accumulate high-quality and diverse datasets and incorporate them into the subsequent training of the Aquila base models.
🎯 Continuous improvements and open sourcing. We will continue to improve training data, optimize training methods, and enhance model performance, cultivate a flourishing "model tree" on a better base model foundation, and continuously update open-source versions.

The additional details of the Aquila model will be presented in the official technical report, which is expected to be released by the end of June 2023. Please stay tuned for updates on official channels, including the FlagAI GitHub repository, FlagAI's Zhihu account and FlagAI's official technical communication group.

Model	Model Type	Description	File Path	Standalone Model Download	Status	GPUs Used
Aquila-7B	Base model, 7 billion parameters	Aquila Base Model inherits the architectural design advantages of GPT-3 and LLaMA. It replaces a batch of more efficient underlying operator implementations, redesigns the implementation of bilingual tokenizer, upgrades BMTrain parallel training method, and achieves nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2.	./examples/Aquila/Aquila-pretrain	Download Aquila-7B	Released	Nvidia-A100
Aquila-33B	Base model, 33 billion parameters	Same as above	——	——	Coming soon	Nvidia-A100
AquilaChat-7B	SFT model, fine-tuned and RL based on Aquila-7B	AquilaChat Dialog Model supports fluent text dialogue and multiple language generation tasks, and realizes the call of AquilaChat to other models and tools by defining an expandable special instruction specification, which is easy to extend. For example, calling the open source AltDiffusion multimodal language image generation model of Flagship Intelligence achieved smooth image generation capability. Together with Flagship Intelligence's InstructFace multi-step controllable text-picture model, it is easy to achieve multi-step controllable editing of human face images.	./examples/Aquila/Aquila-chat	Download AquilaChat-7B	Released	Nvidia-A100
AquilaChat-33B	SFT model, fine-tuned and RL based on Aquila-33B	Same as above	——	——	Coming soon	Nvidia-A100
AquilaCode-multi	Base model, "text-code" generation model, continue-pre-trained based on Aquila-7B.	AquilaCode utilizes high-quality, filtered, and compliant open-source code data for training, with a dataset size of approximately 10-40% compared to other open-source code generation models. By following the provided official guidelines, developers can harness the power of the AquilaCode model to customize their own code assistant.	./examples/Aquila/Aquila-code	Download AquilaCode-multi	Released	Nvidia-A100
AquilaCode-py	Base model, "text-code" generation model, continue-pre-trained based on Aquila-7B, trained on Horizon Robotics chips	Same as above	./examples/Aquila/Aquila-code	Download AquilaCode-py	Released	Nvidia-A100

We will continue to release improved versions of Aquila model as open source. You can start by deleting the checkpoints_in/aquila-7b in the original directory and then download the new weights. Other usage methods remain unchanged. For more details, please refer to the folloing change log:

2023-08-15 ：Released v1.0 checkpoint files，AquilaCode-multi and AquilaCode-python have been released while AquilaCode-7B-NV and AquilaCode-7B-TS are temporarily not maintained. There are no updates for the weights of Aquila-7B and AquilaChat-7B.
- Aquila-7B md5: 5b56d31c8154c9184a38ff7bc6b4d887
- AquilaChat-7B md5: 883e83286ee309dbb624016256e30d4c
- AquilaCode-multi md5：07cfce9440a0fa1ac2768b39d2cf4286
- AquilaCode-py md5：3faa85fc03d8fda70a73064f48d02d85

Aquila-7B has shown improvements in the FlagEval large model evaluation ("Objective") compared to last. It achieved improvements of approximately 10.07% on MMLU_Chinese, 14.84% on TruthfulQA, and 7.94% on MMLU datasets. For detailed evaluation results, please refer to the website http://flageval.baai.ac.cn. For detailed version change history, see Change Log.

If you have any question, please refer to the FAQ first. If you cannot solve them, please submit an issue directly.

Quick Start Aquila-7B（Base model）

Base Model Environment Setup

Clone the FlagAI Github repository locally by running the following command:
```
git clone https://github.com/FlagAI-Open/FlagAI.git
```
Navigate to the repository and install FlagAI from source code.
```
cd FlagAI
python setup.py install
```
Note that we currently support running on Ubuntu, Mac, and Mac. For detailed environment dependency information, please refer to FlagAI requirements and installation.
Navigate to the Aquila-7B Base Model directory.
```
cd examples/Aquila/Aquila-pretrain
```

For the Aquila-7B model, we provide three ways of use: model inference, pre-training, and fine-tuning.

Base Model Inference

Normal model inference (which consumes approximately 14.6GB of GPU memory):

python generate.py

Low-resource inference using BMInf (memory usage can be adjusted):

python generate_bminf.py

Under default parameters, GPU memory consumption is approximately 4.3GB. You can manually set the maximum resource consumption by using the memory_limit parameter, as shown in the following example (where 2 << 30 equals 2GB):

After running the inference program, the Aquila-7B model will be automatically downloaded to ./checkpoints_in.

Example output:：

The model provides a random response to the sample prompt "what is car EDR".

Note: The Aquila-7B base model may not perform as well for dialogue reasoning tasks as the supervised fine-tuned AquilaChat-7B chat model.

Supervised Fine-tuning(SFT)

Navigate to the chat model fine-tuning directory and prepare the pre-trained model that needs to be fine-tuned in the checkpoints_in directory.

Assuming you have just run the inference script under Aquila-pretrain, you can run:
```
cd ../Aquila-chat
mv ../Aquila-pretrain/checkpoints_in ./
```
Configure the hostfile file.
Details are as follows:

Taking a single machine with eight GPUs as an example:
1. Check the IP address of the local machine:
```
ifconfig eth0 | grep "inet " | awk '{print $2}'
```
2. Fill in the hostfile with the following
```
[ip address from last step] slots=8
```
3. Confirm that the local machine can log in without a password by testing using the following command:
```
ssh localhost
```
  You can try the following command to log in without a password
```
ssh-keygen -t rsa  
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
service sshd restart
```
Run the training script:
```
bash dist_trigger_docker.sh hostfile Aquila-chat.yaml aquila-7b aquila_experiment
```
If you want to start LoRA fine-tuning(can be run on single V-100), change the previous step to run.
```
bash dist_trigger_docker.sh hostfile Aquila-chat-lora.yaml aquila-7b aquila_experiment
```
Note: When training Lora, it will generate an adapter_config.json and adapter_model.bin file, located in the output directory (at the same level as the log file). For inference, please run the Aquila-chat/generate_chat_lora.py file. The difference compared to regular inference is that the autoloader, when loading the model for inference, requires specifying the directory of the adapter files in the adapter_dir parameter.

The correct output information is shown below:

The following information will be output. Note that NODES_NUM should be equal to the number of nodes, and LOGFILE is the log file for the model run.

Before successful training, you may see the following information in the log file with parameters that may differ:

Pre-training of the Base Model

Currently, the minimum requirement for pre-training the 7B base model is to run on a single Nvidia-A100-80G card (batch_size needs to be adjusted).

Enter the pre-training directory Aquila-pretrain and configure the hostfile file.

Run the training script:

bash dist_trigger_docker.sh hostfile Aquila-pretrain.yaml aquila-7b aquila_experiment

Adjust the Parameters

For the above examples, you can modify the following parameters to achieve different training and inference effects:：

🌟Before executing pre-training and fine-tuning tasks, you can modify the parameters in the YAML file of the training script.

Parameter Name	Type	Description
batch_size	int	The number of samples extracted from the dataset at each iteration of training. Generally, the larger the batch size, the faster the processing speed, but it will occupy more memory.
gradient_accumulation_steps	int	The number of times to calculate gradients for multiple small batches before updating the model weights. This is mainly used in cases where the GPU memory is small, and a small batch size can be used to achieve the same effect as a large batch size through gradient accumulation.
lr	float	The step or rate at which the model updates parameters. A high learning rate may cause the model not to converge, while a low learning rate may result in long training times or getting stuck in local optimal solutions.
warm_up	float	The ratio of the initial learning rate to the original learning rate.
save_interval	int	The interval at which the model is saved, that is, how often the model is saved every few iterations of training. When the training time is long, the save interval can prevent all training results from being lost due to sudden interruptions or errors.
log_interval	int	The interval at which logs are output, that is, how often log information is output every few iterations of training.
lora	bool	Whether to enable LoRA optimization method during training. By default, it is set to False (no LoRA).
enable_sft_dataset_dir	str	The directory of the SFT training dataset.
enable_sft_dataset_file	str	The file name of the SFT training dataset.

Complete parameter information can be found in https://github.com/FlagAI-Open/FlagAI/blob/master/flagai/env_args.py

🌟For inference tasks, the following parameters can be reset when executing the aquila_generate function in the generate.py file::

Parameter Name	Type	Default Value	Description
temperature	float	0.8	The temperature controls the degree of randomness when the model generates new words. In probabilistic language models, each word has a corresponding probability distribution, and the temperature affects the randomness of the model generating words by increasing or decreasing these probability distributions. A higher temperature will make the model more likely to choose words with lower probabilities, resulting in more adventurous text. Conversely, a lower temperature will force the model to choose the word with the highest probability, resulting in more predictable text. Common temperature values range from 0.5-1.5.
topk	int	30	Top-k controls the number of choices when the model generates new words. When generating each new word, the model predicts several possible words, and the Top-k parameter limits the model to select only one of the top k words with the highest probability as the generated word. Top-k can help stabilize the generation process and prevent the model from randomly choosing words with very low probabilities.
topp	float	0.95	Similar to Top-k, Top-p also controls the number of choices when the model generates new words. When generating each new word, the model predicts several possible words, and the Top-p parameter limits the model to select only some of the most likely candidate words until the total probability of these candidate words reaches a threshold (such as 0.9 or 0.8). Top-p can help avoid the generation of words that do not fit the context.
max_length	int	200	To avoid generating infinite length text, we need to limit the length of the generated text. The max_length parameter controls the maximum length of the generated text. Once this length is reached, the model stops generating. The maximum length of the Aquila series models is 2048 tokens.

License

Aquila-7B and Aquila-33B open-source model is licensed under BAAI Aquila Model Licence Agreement. The source code is under Apache Licence 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_en.md

README_en.md

Aquila

Quick Start Aquila-7B（Base model）

Base Model Environment Setup

Base Model Inference

Supervised Fine-tuning(SFT)

Pre-training of the Base Model

Adjust the Parameters

License

Files

README_en.md

Latest commit

History

README_en.md

File metadata and controls

Aquila

Quick Start Aquila-7B（Base model）

Base Model Environment Setup

Base Model Inference

Supervised Fine-tuning(SFT)

Pre-training of the Base Model

Adjust the Parameters

License