In this fork I transplanted the LongT5 model to the scrolls repository. I also implemented a memory extension to the LongT5 model using a key-value memory similar to "Memorizing Transformers".
This repository contains the official code of the paper: "SCROLLS: Standardized CompaRison Over Long Language Sequences".
Setup instructions are in the baselines and evaluator folders.
For the live leaderboard, checkout the official website.
-
via 🤗 Datasets (huggingface/datasets) library (recommended):
-
Usage:
from datasets import load_dataset qasper_dataset = load_dataset("tau/scrolls", "qasper") """ Options are: ["gov_report", "summ_screen_fd", "qmsum", "narrative_qa", "qasper", "quality", "contract_nli"] """
-
via ZIP files, where each split is in a JSONL file:
@misc{shaham2022scrolls,
title={SCROLLS: Standardized CompaRison Over Long Language Sequences},
author={Uri Shaham and Elad Segal and Maor Ivgi and Avia Efrat and Ori Yoran and Adi Haviv and Ankit Gupta and Wenhan Xiong and Mor Geva and Jonathan Berant and Omer Levy},
year={2022},
eprint={2201.03533},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
When citing SCROLLS, please make sure to cite all of the original dataset papers. [bibtex]