Below is a description of the hyperparameters used in the ai_worker/fine_tune.py
script, with their default values and purposes:
- Default Value: 64
- Description:
This parameter refers to the value of
alpha
in the LORA (Low-Rank Adaptation) setup. It can control the amount of rank adaptation when tuning the model.
- Default Value: 0.05 (5%)
- Description: The dropout rate associated with the LORA layer. Dropout is a regularization technique to prevent overfitting in neural networks. A higher value might increase regularization but could reduce the learning capacity of the model.
- Default Value: 4
- Description: The number of samples processed before the model's internal parameters are updated. A larger batch size can increase the training speed but might require more memory.
- Default Value: 4
- Description:
Gradient accumulation steps are used to simulate a larger batch size when memory limitations prevent using a larger actual batch size. For instance, if you want an effective batch size of 16 but can only fit 4 samples in memory at a time, you'd use a
batch_size
of 4 andaccumulation_steps
of 4. The gradients from each batch are accumulated over the set number of steps before being applied.
- Default Value: 500
- Description: The maximum number of training steps. It limits the total number of iterations the training process will go through, regardless of the number of epochs.
- Default Value: 500
- Description: The number of complete passes through the training dataset. One epoch means that each sample in the training dataset has been used once to update the model's weights.
- Default Value: 2.5e-5
- Description: This hyperparameter multiplies the base learning rate. The learning rate determines the size of the steps taken to adjust the model's weights during training. A smaller value will lead to slower convergence but might achieve better results, while a larger value could speed up convergence but risk overshooting the optimal weights.
- Default Value: 0.8 (80%)
- Description:
This determines the fraction of the dataset that should be used for training. The remaining portion (1 -
training_split
) is typically used for validation. For instance, a value of 0.8 means 80% of the data is used for training and 20% for validation.
- Default Value: 0.01
- Description: Stops training when eval loss drops below this number.
- Default Value: 4
- Description: Can choose 8 or 16 or 4 (this is the bitsandbytes fine tune config)
- Default Value: "q6_k"
- Description: Quantization level post-training, for shipping results. Can choose "f16", which results in a large file (takes a long time to upload), but can be more useful for further training, if needed. This is a llama.cpp quantization level.
When using ai_worker/fine_tune.py
, you can customize these hyperparameters as needed, bearing in mind their potential effects on training performance and model generalization.