Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new task: Boxes #1557

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

irafayabdul
Copy link

A task probing to what extent a language model can infer the final state of an entity given an English description of the initial state and a series of state-changing operations is presented.

@CLAassistant
Copy link

CLAassistant commented Mar 11, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I've left some comments, I see that the dataset is still local here which needs to be addressed.

Would you be able to try to replicate a number from the paper and post the result here?

dataset_path: json
dataset_name: null
dataset_kwargs:
data_files: {'test': 'test-subsample-states-t5.jsonl'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This points to a local file?

Copy link
Author

@irafayabdul irafayabdul Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this points to a file inside dataset dir stored locally. I downloaded dataset from git, it is not on hugging face. How should I go about it excluding uploading data on hugging face?
I can try to replicate only flan-t5-xl results (since GPT3, 3.5, 4 are not an option). Please let me know if I understood it correctly :) Thank you in anticipation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flan-T5 results would be great!

Could you open an issue on the authors' github repo and ask them if they would be alright with uploading the dataset to Huggingface as a gated repo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Hailey, apologies for delay
I asked Dr. Sebastian Schuster and create this gated repo Boxes task

Further, I am attaching results for flan-t5 base and xl which are almost same as the results presented in paper.
here I tried to produce results based on number of operations affecting box state just as reported in paper but the task does not necessary need this in general hence implementation is kept same.

flan-pop
flan-test

lm_eval/tasks/boxes/boxes-base.yaml Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants