[fix] update pairwise dataloader. #395

Chen9154 · 2023-03-27T03:21:30Z

In forward() of reward_model.py (Line 62), if "chosen" and "rejected" are exactly the same, "inference" would turn to True, which shouldn't happen during the training procedure. However in class PairwiseDataset, "chosen" and "rejected" could be the same after truncation (this would be easily happen when prompts/posts are longer than max_length and we set padding_side = 'right'). So we filter out those cases from training data.

In forward() of reward_model.py (Line 62), if "chosen" and "rejected" are exactly the same, "inference" would turn to True, which should not happen during the training procedure. However in class PairwiseDataset, "chosen" and "rejected" could be the same after truncation. So we filter out those cases from training data.

jon-tow

Hi, @Chen9154! Thanks for catching this edge case; looks good to me 👍

Would you be able to also make this same change to the test file for sake of completeness, please?

trlx/examples/summarize_rlhf/reward_model/gptj_reward_test.py

Lines 40 to 65 in 9fd1f0a

    
           class PairwiseDataset(Dataset): 
        
               def __init__(self, pairs, tokenizer, max_length): 
        
                   self.chosen_input_ids = [] 
        
                   self.chosen_attn_masks = [] 
        
                   self.rejected_input_ids = [] 
        
                   self.rejected_attn_masks = [] 
        
                   for pair in pairs: 
        
                       chosen, rejected = pair["chosen"], pair["rejected"] 
        
                       chosen_encodings_dict = tokenizer( 
        
                           "<|startoftext|>" + chosen + "<|endoftext|>", 
        
                           truncation=True, 
        
                           max_length=max_length, 
        
                           padding="max_length", 
        
                           return_tensors="pt", 
        
                       ) 
        
                       rejected_encodings_dict = tokenizer( 
        
                           "<|startoftext|>" + rejected + "<|endoftext|>", 
        
                           truncation=True, 
        
                           max_length=max_length, 
        
                           padding="max_length", 
        
                           return_tensors="pt", 
        
                       ) 
        
                       self.chosen_input_ids.append(chosen_encodings_dict["input_ids"]) 
        
                       self.chosen_attn_masks.append(chosen_encodings_dict["attention_mask"]) 
        
                       self.rejected_input_ids.append(rejected_encodings_dict["input_ids"]) 
        
                       self.rejected_attn_masks.append(rejected_encodings_dict["attention_mask"])

In forward() of reward_model.py (Line 62), if "chosen" and "rejected" are exactly the same, "inference" would turn to True, which should not happen during the training procedure. However in class PairwiseDataset, "chosen" and "rejected" could be the same after truncation. So we filter out those cases from training data.

Chen9154 · 2023-04-11T12:13:03Z

@jon-tow Thanks for the review! I have also made the same change to the test file.

jon-tow

Awesome! Thank you, @Chen9154!

jon-tow requested changes Apr 6, 2023

View reviewed changes

Chen9154 added 2 commits April 11, 2023 19:49

Merge branch 'CarperAI:main' into fix-pairwise-dataloader

236db00

jon-tow approved these changes Apr 11, 2023

View reviewed changes

jon-tow merged commit adb3be2 into CarperAI:main Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] update pairwise dataloader. #395

[fix] update pairwise dataloader. #395

Chen9154 commented Mar 27, 2023 •

edited

Loading

jon-tow left a comment

Chen9154 commented Apr 11, 2023

jon-tow left a comment

	class PairwiseDataset(Dataset):
	def __init__(self, pairs, tokenizer, max_length):
	self.chosen_input_ids = []
	self.chosen_attn_masks = []
	self.rejected_input_ids = []
	self.rejected_attn_masks = []
	for pair in pairs:
	chosen, rejected = pair["chosen"], pair["rejected"]
	chosen_encodings_dict = tokenizer(
	"<\|startoftext\|>" + chosen + "<\|endoftext\|>",
	truncation=True,
	max_length=max_length,
	padding="max_length",
	return_tensors="pt",
	)
	rejected_encodings_dict = tokenizer(
	"<\|startoftext\|>" + rejected + "<\|endoftext\|>",
	truncation=True,
	max_length=max_length,
	padding="max_length",
	return_tensors="pt",
	)
	self.chosen_input_ids.append(chosen_encodings_dict["input_ids"])
	self.chosen_attn_masks.append(chosen_encodings_dict["attention_mask"])
	self.rejected_input_ids.append(rejected_encodings_dict["input_ids"])
	self.rejected_attn_masks.append(rejected_encodings_dict["attention_mask"])

[fix] update pairwise dataloader. #395

[fix] update pairwise dataloader. #395

Conversation

Chen9154 commented Mar 27, 2023 • edited Loading

jon-tow left a comment

Choose a reason for hiding this comment

Chen9154 commented Apr 11, 2023

jon-tow left a comment

Choose a reason for hiding this comment

Chen9154 commented Mar 27, 2023 •

edited

Loading