You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am new in the field and trying first time to finetune a model. I am working with torchtune on the lora_finetune_single_device . while i was able to do the finetuning using the alpaca built in dataset. Now I am trying to do the fine tuning on a custom dataset. The data is a csv file containing abusive and non abusive tweets and I am trying to fine tune the model on urdu language abuse detection. So one column contains "tweets" other column contains "target" (0,1) . I thought that the instruct_dataset() format would be the most suited format for such a problem. So I created a custom template.
I wrote the following code in it:
from torchtune.data import InstructTemplate
from typing import Mapping, Any, Optional, Dict
class AbusiveLanguageDetectionTemplate(InstructTemplate):
template = (
"You are an abusive language detection model for Urdu. Your job is to detect the abusive language in the Urdu sentences. "
"Output '1' if the sentence is abusive and output '0' if the sentence is non-abusive. No explanation is required.\n\n"
"### Input:\n{tweet}\n\n### Response:\n{target}\n"
)
I have saved the code in the file named "abuse_detection.py" which I consider is my custom template.
Now I am trying to link this template to my custom_config.yaml file. For the dataset field, I have specified the following things:
Dataset and Sampler
dataset: component: torchtune.datasets.instruct_dataset
source: abusive_train.csv
template: abuse_detection.AbusiveLanguageDetectionTemplate
max_seq_len: 4096
train_on_input: False
packed: False
batch_size: 2
seed: null
shuffle: True
where "abusive_train.csv" is the file name of my csv file.
Now my custom_config.yaml file, abusive_train.csv file as well as abuse_detection.py file all are located in the same directory and I am running the following command:
tune run lora_finetune_single_device --config custom_config.yaml but I am getting the following errror:
ModuleNotFoundError("No module named 'abuse_detection'")
Are you sure that module 'abuse_detection' is installed?
Can someone point to me what I am doing wrong. Where should I place the abuse_detection.py file for it to be picked by the system. Please help.
The text was updated successfully, but these errors were encountered:
I tried doing that. It did not work. Also I am using a shared server where we have different users. I am not a root user. But that should not be any problem? I dont know. I have found a work around on the problem. I placed my template inside torchtune repository.. in site-packages in torchtune and then wrote the path as torchtune.abuse_detection.AbusiveLanguageDetectionTemplate and that worked..
For others with this problem, I found the following workaround.
Let's say you are currently in some directory we'll call cwd, and your file with custom function my_function is at some path: cwd/custom/pyfile.py. Then, in your recipe, put: _component_: custom.pyfile.my_function.
And then, when you use tune, prepend the following: PYTHONPATH=${pwd}:PYTHONPATH tune ...
This will tell Python to also look in cwd, which is returned by ${pwd}.
Hi, I am new in the field and trying first time to finetune a model. I am working with torchtune on the lora_finetune_single_device . while i was able to do the finetuning using the alpaca built in dataset. Now I am trying to do the fine tuning on a custom dataset. The data is a csv file containing abusive and non abusive tweets and I am trying to fine tune the model on urdu language abuse detection. So one column contains "tweets" other column contains "target" (0,1) . I thought that the instruct_dataset() format would be the most suited format for such a problem. So I created a custom template.
I wrote the following code in it:
from torchtune.data import InstructTemplate
from typing import Mapping, Any, Optional, Dict
class AbusiveLanguageDetectionTemplate(InstructTemplate):
template = (
"You are an abusive language detection model for Urdu. Your job is to detect the abusive language in the Urdu sentences. "
"Output '1' if the sentence is abusive and output '0' if the sentence is non-abusive. No explanation is required.\n\n"
"### Input:\n{tweet}\n\n### Response:\n{target}\n"
)
I have saved the code in the file named "abuse_detection.py" which I consider is my custom template.
Now I am trying to link this template to my custom_config.yaml file. For the dataset field, I have specified the following things:
Dataset and Sampler
dataset:
component: torchtune.datasets.instruct_dataset
source: abusive_train.csv
template: abuse_detection.AbusiveLanguageDetectionTemplate
max_seq_len: 4096
train_on_input: False
packed: False
batch_size: 2
seed: null
shuffle: True
where "abusive_train.csv" is the file name of my csv file.
Now my custom_config.yaml file, abusive_train.csv file as well as abuse_detection.py file all are located in the same directory and I am running the following command:
tune run lora_finetune_single_device --config custom_config.yaml but I am getting the following errror:
ModuleNotFoundError("No module named 'abuse_detection'")
Are you sure that module 'abuse_detection' is installed?
Can someone point to me what I am doing wrong. Where should I place the abuse_detection.py file for it to be picked by the system. Please help.
The text was updated successfully, but these errors were encountered: