Adding new task: Boxes #1557

irafayabdul · 2024-03-11T15:19:05Z

A task probing to what extent a language model can infer the final state of an entity given an English description of the initial state and a series of state-changing operations is presented.

CLAassistant · 2024-03-11T15:19:11Z

All committers have signed the CLA.

haileyschoelkopf

Thanks for the PR! I've left some comments, I see that the dataset is still local here which needs to be addressed.

Would you be able to try to replicate a number from the paper and post the result here?

haileyschoelkopf · 2024-03-11T16:19:51Z

lm_eval/tasks/boxes/boxes-base.yaml

+dataset_path: json
+dataset_name: null
+dataset_kwargs:
+  data_files: {'test': 'test-subsample-states-t5.jsonl'}


This points to a local file?

Yes, this points to a file inside dataset dir stored locally. I downloaded dataset from git, it is not on hugging face. How should I go about it excluding uploading data on hugging face?
I can try to replicate only flan-t5-xl results (since GPT3, 3.5, 4 are not an option). Please let me know if I understood it correctly :) Thank you in anticipation

Flan-T5 results would be great!

Could you open an issue on the authors' github repo and ask them if they would be alright with uploading the dataset to Huggingface as a gated repo?

Hi Hailey, apologies for delay
I asked Dr. Sebastian Schuster and create this gated repo Boxes task

Further, I am attaching results for flan-t5 base and xl which are almost same as the results presented in paper.
here I tried to produce results based on number of operations affecting box state just as reported in paper but the task does not necessary need this in general hence implementation is kept same.

lm_eval/tasks/boxes/boxes-base.yaml

…boxes

Adding new task: Boxes

7b863d4

irafayabdul requested review from haileyschoelkopf and lintangsutawika as code owners March 11, 2024 15:19

haileyschoelkopf requested changes Mar 11, 2024

View reviewed changes

irafayabdul added 3 commits March 11, 2024 23:46

switching sampling to false

ab733c1

Merge branch 'EleutherAI:main' into boxes-task

e8d4eda

update dataset: referencing to gated repo on huggingface irafayabdul/…

c6d4930

…boxes

irafayabdul requested a review from haileyschoelkopf April 22, 2024 00:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding new task: Boxes #1557

Adding new task: Boxes #1557

irafayabdul commented Mar 11, 2024

CLAassistant commented Mar 11, 2024 •

edited

Loading

haileyschoelkopf left a comment

haileyschoelkopf Mar 11, 2024

irafayabdul Mar 11, 2024 •

edited

Loading

haileyschoelkopf Mar 12, 2024

irafayabdul Apr 22, 2024

Adding new task: Boxes #1557

Are you sure you want to change the base?

Adding new task: Boxes #1557

Conversation

irafayabdul commented Mar 11, 2024

CLAassistant commented Mar 11, 2024 • edited Loading

haileyschoelkopf left a comment

Choose a reason for hiding this comment

haileyschoelkopf Mar 11, 2024

Choose a reason for hiding this comment

irafayabdul Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

haileyschoelkopf Mar 12, 2024

Choose a reason for hiding this comment

irafayabdul Apr 22, 2024

Choose a reason for hiding this comment

CLAassistant commented Mar 11, 2024 •

edited

Loading

irafayabdul Mar 11, 2024 •

edited

Loading