You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the line 63 of auto_grade_error_reasons.py, the code to_be_graded_data = [data for data in eval_data if data['error_reason_correctness'] != 'N/A'] has error_reason_correctness, but I run the eval_open_source_models.py, the output json does not have error_reason_correctness, why? this is a code Bug or not?
The text was updated successfully, but these errors were encountered:
oh. thanks for pointing this out. This "error_reason_correctness" is actually a manual labelled field by our annotator that decides the correctness of the error reason returned by the evaluated models. The 'auto_grade_error_reasons.py' is supposed to use GPT4 to replace the human efforts. However, in the paper, we use the human labelling to verify the GPT4 labelling correctness thus used this field for filtering. For your case, you can safely ignore this field, thanks : )
In the line 63 of
auto_grade_error_reasons.py
, the codeto_be_graded_data = [data for data in eval_data if data['error_reason_correctness'] != 'N/A']
haserror_reason_correctness
, but I run theeval_open_source_models.py
, the outputjson
does not haveerror_reason_correctness
, why? this is a code Bug or not?The text was updated successfully, but these errors were encountered: