Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom instruct template for task specific finetuning on Llama 3.1 using torchtune : Module not found error for custom template #1295

Closed
muniefht opened this issue Aug 9, 2024 · 4 comments

Comments

@muniefht
Copy link

muniefht commented Aug 9, 2024

Hi, I am new in the field and trying first time to finetune a model. I am working with torchtune on the lora_finetune_single_device . while i was able to do the finetuning using the alpaca built in dataset. Now I am trying to do the fine tuning on a custom dataset. The data is a csv file containing abusive and non abusive tweets and I am trying to fine tune the model on urdu language abuse detection. So one column contains "tweets" other column contains "target" (0,1) . I thought that the instruct_dataset() format would be the most suited format for such a problem. So I created a custom template.
I wrote the following code in it:
from torchtune.data import InstructTemplate
from typing import Mapping, Any, Optional, Dict

class AbusiveLanguageDetectionTemplate(InstructTemplate):
template = (
"You are an abusive language detection model for Urdu. Your job is to detect the abusive language in the Urdu sentences. "
"Output '1' if the sentence is abusive and output '0' if the sentence is non-abusive. No explanation is required.\n\n"
"### Input:\n{tweet}\n\n### Response:\n{target}\n"
)

@classmethod
def format(
    cls, sample: Mapping[str, Any], column_map: Optional[Dict[str, str]] = None
) -> str:
    if column_map:
        input_column = column_map.get("tweet", "tweet")
        response_column = column_map.get("target", "target")
    else:
        input_column = "tweet"
        response_column = "target"
    
    return cls.template.format(tweet=sample[input_column], target=str(sample[response_column]))

I have saved the code in the file named "abuse_detection.py" which I consider is my custom template.
Now I am trying to link this template to my custom_config.yaml file. For the dataset field, I have specified the following things:

Dataset and Sampler

dataset:
component: torchtune.datasets.instruct_dataset
source: abusive_train.csv
template: abuse_detection.AbusiveLanguageDetectionTemplate
max_seq_len: 4096
train_on_input: False
packed: False
batch_size: 2
seed: null
shuffle: True
where "abusive_train.csv" is the file name of my csv file.
Now my custom_config.yaml file, abusive_train.csv file as well as abuse_detection.py file all are located in the same directory and I am running the following command:
tune run lora_finetune_single_device --config custom_config.yaml but I am getting the following errror:
ModuleNotFoundError("No module named 'abuse_detection'")
Are you sure that module 'abuse_detection' is installed?
Can someone point to me what I am doing wrong. Where should I place the abuse_detection.py file for it to be picked by the system. Please help.

@felipemello1
Copy link
Contributor

Hey @muniefht , cab you try passing the whole path? my.path.to.abuse_detection.AbusiveLanguageDetectionTemplate

Not sure if that will solve it, but i think its an easy one to try

@muniefht
Copy link
Author

I tried doing that. It did not work. Also I am using a shared server where we have different users. I am not a root user. But that should not be any problem? I dont know. I have found a work around on the problem. I placed my template inside torchtune repository.. in site-packages in torchtune and then wrote the path as torchtune.abuse_detection.AbusiveLanguageDetectionTemplate and that worked..

@zjost
Copy link

zjost commented Oct 9, 2024

For others with this problem, I found the following workaround.

Let's say you are currently in some directory we'll call cwd, and your file with custom function my_function is at some path: cwd/custom/pyfile.py. Then, in your recipe, put: _component_: custom.pyfile.my_function.

And then, when you use tune, prepend the following: PYTHONPATH=${pwd}:PYTHONPATH tune ...

This will tell Python to also look in cwd, which is returned by ${pwd}.

@RdoubleA
Copy link
Contributor

This should've been fixed in #1760 and #1731. When you run the same command without modifying PYTHONPATH, do you still run into issues? @zjost

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants