Custom instruct template for task specific finetuning on Llama 3.1 using torchtune : Module not found error for custom template #1295

muniefht · 2024-08-09T06:56:30Z

Hi, I am new in the field and trying first time to finetune a model. I am working with torchtune on the lora_finetune_single_device . while i was able to do the finetuning using the alpaca built in dataset. Now I am trying to do the fine tuning on a custom dataset. The data is a csv file containing abusive and non abusive tweets and I am trying to fine tune the model on urdu language abuse detection. So one column contains "tweets" other column contains "target" (0,1) . I thought that the instruct_dataset() format would be the most suited format for such a problem. So I created a custom template.
I wrote the following code in it:
from torchtune.data import InstructTemplate
from typing import Mapping, Any, Optional, Dict

class AbusiveLanguageDetectionTemplate(InstructTemplate):
template = (
"You are an abusive language detection model for Urdu. Your job is to detect the abusive language in the Urdu sentences. "
"Output '1' if the sentence is abusive and output '0' if the sentence is non-abusive. No explanation is required.\n\n"
"### Input:\n{tweet}\n\n### Response:\n{target}\n"
)

@classmethod
def format(
    cls, sample: Mapping[str, Any], column_map: Optional[Dict[str, str]] = None
) -> str:
    if column_map:
        input_column = column_map.get("tweet", "tweet")
        response_column = column_map.get("target", "target")
    else:
        input_column = "tweet"
        response_column = "target"
    
    return cls.template.format(tweet=sample[input_column], target=str(sample[response_column]))

I have saved the code in the file named "abuse_detection.py" which I consider is my custom template.
Now I am trying to link this template to my custom_config.yaml file. For the dataset field, I have specified the following things:

Dataset and Sampler

dataset:
component: torchtune.datasets.instruct_dataset
source: abusive_train.csv
template: abuse_detection.AbusiveLanguageDetectionTemplate
max_seq_len: 4096
train_on_input: False
packed: False
batch_size: 2
seed: null
shuffle: True
where "abusive_train.csv" is the file name of my csv file.
Now my custom_config.yaml file, abusive_train.csv file as well as abuse_detection.py file all are located in the same directory and I am running the following command:
tune run lora_finetune_single_device --config custom_config.yaml but I am getting the following errror:
ModuleNotFoundError("No module named 'abuse_detection'")
Are you sure that module 'abuse_detection' is installed?
Can someone point to me what I am doing wrong. Where should I place the abuse_detection.py file for it to be picked by the system. Please help.

The text was updated successfully, but these errors were encountered:

felipemello1 · 2024-08-09T14:36:40Z

Hey @muniefht , cab you try passing the whole path? my.path.to.abuse_detection.AbusiveLanguageDetectionTemplate

Not sure if that will solve it, but i think its an easy one to try

muniefht · 2024-08-12T05:31:21Z

I tried doing that. It did not work. Also I am using a shared server where we have different users. I am not a root user. But that should not be any problem? I dont know. I have found a work around on the problem. I placed my template inside torchtune repository.. in site-packages in torchtune and then wrote the path as torchtune.abuse_detection.AbusiveLanguageDetectionTemplate and that worked..

zjost · 2024-10-09T23:32:26Z

For others with this problem, I found the following workaround.

Let's say you are currently in some directory we'll call cwd, and your file with custom function my_function is at some path: cwd/custom/pyfile.py. Then, in your recipe, put: _component_: custom.pyfile.my_function.

And then, when you use tune, prepend the following: PYTHONPATH=${pwd}:PYTHONPATH tune ...

This will tell Python to also look in cwd, which is returned by ${pwd}.

RdoubleA · 2024-10-10T01:35:35Z

This should've been fixed in #1760 and #1731. When you run the same command without modifying PYTHONPATH, do you still run into issues? @zjost

RdoubleA mentioned this issue Aug 20, 2024

Tracker for dataset improvements #1337

Closed

RdoubleA closed this as completed Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom instruct template for task specific finetuning on Llama 3.1 using torchtune : Module not found error for custom template #1295

Custom instruct template for task specific finetuning on Llama 3.1 using torchtune : Module not found error for custom template #1295

muniefht commented Aug 9, 2024

felipemello1 commented Aug 9, 2024

muniefht commented Aug 12, 2024

zjost commented Oct 9, 2024

RdoubleA commented Oct 10, 2024

Custom instruct template for task specific finetuning on Llama 3.1 using torchtune : Module not found error for custom template #1295

Custom instruct template for task specific finetuning on Llama 3.1 using torchtune : Module not found error for custom template #1295

Comments

muniefht commented Aug 9, 2024

Dataset and Sampler

felipemello1 commented Aug 9, 2024

muniefht commented Aug 12, 2024

zjost commented Oct 9, 2024

RdoubleA commented Oct 10, 2024