We have conducted research on existing publicly available legal datasets and compiled approximately 600k records. These primarily cover criminal and civil cases, and also include cases related to constitutional law, social law, economic law, and other legal areas.
Dataset | Size | Domain | Task | Metric |
---|---|---|---|---|
CAIL2018 | 196k | Criminal | Multi-classification | Acc, F1 |
CAIL-2019-ER | 69k | Civil | Multi-classification | Acc, F1 |
CAIL-2021-IE | 5k | Criminal | Named Entity Recognition | F1, P, R |
Criminal-S | 77k | Criminal | Multi-classification | Acc, P, R, F1 |
MLMN | 1k | Criminal | Multi-classification | P, R, F1 |
MSJudeg | 70k | Civil | Multi-classification | F1 |
CAIL2019-SCM | 9k | Civil | Classification | Acc |
CAIL2020-AM | 815 | Criminal, Civil | Multiple-choice questions | Acc |
JEC-QA | 20k | - | Multiple-choice questions | Acc |
CAIL-2020-TS | 9k | Civil | Text summarization | ROUGE |
CAIL-2022-TS | 6k | - | Text summarization | ROUGE |
AC-NLG | 67k | Civil | Text Generation | ROUGE, BLEU |
CJRC | 10k | Criminal, Civil | Text | ROUGE, BLEU |
CrimeKgAssitant | 52k | - | Question-answering | ROUGE, BLEU |
-
CAIL2018: The dataset for criminal judgment prediction in CAIL2018 aims to predict the relevant legal provisions, charges against the defendant, and the length of the defendant's sentence based on the factual descriptions and case facts in criminal legal documents.
-
CAIL-2019-ER: The dataset for criminal judgment prediction in CAIL-2019 requires systems to judge each sentence in judicial documents and identify key case elements. This task involves three domains: marriage and family, labor disputes, and loan contracts.
-
CAIL-2021-IE: Information extraction involves tasks such as named entity recognition and relation extraction. This task focuses on fraud cases and requires precise extraction of key information such as suspects, items involved, and criminal facts.
-
Criminal-S: Each sample in this dataset contains a single charge and aims to predict the relevant charges judged by judges based on the factual determination part of the case.
-
MLMN: This task categorizes sentences into five parts based on the length of the sentence: no criminal punishment, detention, imprisonment for less than 1 year, imprisonment for 1 year or more but less than 3 years, and imprisonment for 3 years or more but less than 10 years. The task involves traffic accident and intentional injury cases, predicting the category of the defendant's sentence based on legal documents.
-
MSJudeg: This task involves civil data on private lending disputes and aims to predict the judge's verdict based on the case facts and the plaintiff's claims.
-
CAIL2019-SCM: This task involves calculating and judging the similarity of multiple legal documents. Specifically, given the title and factual description of each document, participants need to find the most similar document from a candidate set for each query document.
-
CAIL2020-AM: This task aims to extract logical interaction arguments between defense and prosecution in judgment documents, i.e., points of contention.
-
JEC-QA: As a dataset for objective question and answer in the national judicial examination, it contains 7775 knowledge-driven questions and 13297 case analysis questions, each of which is a multiple-choice question or a multiple-choice question.
-
CAIL-2020-TS: This task involves generating judicial summary texts based on the original judgment documents.
-
CAIL-2022-TS: This task involves generating correct, complete, and concise legal sentiment summaries based on original public opinion texts.
-
AC-NLG: This task involves civil data related to private lending and aims to predict relevant court reasoning texts based on factual descriptions of cases.
-
CJRC: As a dataset for judicial reading comprehension, it contains 10,000 cases and 50,000 question-answer pairs, aiming to provide a substantive understanding of legal documents and answer related questions.
-
CrimeKgAssitant: A dataset of real Chinese lawyer consultations, cleaned by LAW-GPT, resulting in 52k single-turn question-answer pairs.
According to the comprehensive benchmark framework LAiW for Chinese legal LLMs, consisting of 14 basic tasks, we constructed the Legal Instruction Tuning Dataset (LIT). The dataset is split with a ratio of train/valid/test=7/1/2
. The train.jsonl
and valid.jsonl
files are used for model training, while test.jsonl
serves as the evaluation dataset for LAiW to guide and advance the development and evaluation of LLMs. At the current stage, only the test datasets test.jsonl
for each task are publicly available, which can be downloaded from LAiW.