请问为什么roberta_large比roberta_middle在CMRC2018上低很多？ #16

ewrfcas · 2019-09-09T01:51:30Z

https://hfl-rc.github.io/cmrc2018/task/#section-1
想测一下roberta在阅读理解上的性能如何。尝试将middle和large转成pytorch在cmrc2018上跑了一下，middle的F1能到86，但是large的F1只能到77，非常奇怪。
直接使用提供的pytorch版本的large权重效果也是一样。

YingZiqiang · 2019-09-09T07:11:32Z

@ewrfcas 请问roberta-middle在哪里，我为什么没有在界面上看见.

ewrfcas · 2019-09-09T07:22:45Z

@YingZiqiang 是Roberta_l24_zh_base，24层，12head，768hidden的。

brightmart · 2019-09-09T13:15:32Z

在我们的测试里large效果比middle要好。你训练的超参数怎么样的，能否贴出来，batch size多少。

ewrfcas · 2019-09-09T14:34:02Z

@brightmart 感谢回复，我large我是用5卡batchsize30训练的，middle是32，一共3个epoch，lr=3e-5/2e-5，warmup=0.1。除了batchsize基本和middle没区别。

ewrfcas · 2019-09-09T14:35:32Z

另外，large和middle的词表应该是相同的吧？那预处理应该不会有问题才对。。

brightmart · 2019-09-09T15:14:36Z

词汇表是一模一样的哦。你看看这两个large和middel的文件夹下的名称。是不是large的checkpoint没有加载成功呢。再跑一次，看看checkpoint加载成功了没，batch size用相同的32。

ymcui · 2019-09-10T01:04:42Z

Same question here.
尝试了三个阅读理解数据集：CMRC 2018, DRCD, CJRC在large上的效果都比较差（不是init_ckpt没加载的问题）。但XNLI可以得到比 @brightmart 报告的更好的结果。或许large不是max_seq_len=512训出来的？

ewrfcas · 2019-09-10T02:45:53Z

加载应该是成功的，我对比过参数，没有加载的只有cls的pooler相关的权重

brightmart · 2019-09-10T11:35:38Z

@ymcui 是的，现有的roberta是在max_seq_len为256上训练的，可以适合处理这范围内的；那么对于长文本，如超过256，可以效果不好。

阅读理解的效果测试结果是怎么样？

@ewrfcas

ymcui · 2019-09-10T11:45:07Z

@brightmart
OK, got it. Thanks.

ewrfcas · 2019-09-10T13:51:42Z

我在CMRC2018上测试结果都是基于512长度的，middle的F1在5次里是86~87，large的F1大概要低10个点，在75~77左右，256长度的large结果正在测试中
@brightmart 希望能够调整下large模型config文件的max_position_embeddings

ewrfcas · 2019-09-10T14:49:37Z

目前测roberta-large长度256在CMRC2018的dev结果为
F1：88.365, EM:69.991
lr=2e-5 epoch1最佳

brightmart · 2019-09-10T15:36:28Z

所有，初步的看，在这个阅读理解任务上，和其他模型比，怎么样呢？为什么阅读理解还能将长度设为这么小。

ewrfcas · 2019-09-11T01:15:03Z

这个结果目前看来在ERNIE2.0 base到ERNIE2.0 large之间，在预训练模型里效果算比较好的了。
长度设为256依靠划窗可以跑，但是效果会有一点下降

brightmart · 2019-09-15T14:56:13Z

好的。 @ewrfcas 是否可以测试对比一下XLNet_zh_Large在CMRC2018数据集上的效果？

（目前的XLNet_zh_Large是尝鲜版，如有问题会协助解决）

ewrfcas · 2019-09-16T01:21:52Z

@brightmart xlnet如果是用sentencepiece的话做阅读理解效果不好，具体可见ymcui/Chinese-XLNet#11

oyjxer · 2019-10-21T12:59:49Z

这个结果目前看来在ERNIE2.0 base到ERNIE2.0 large之间，在预训练模型里效果算比较好的了。
长度设为256依靠划窗可以跑，但是效果会有一点下降

划窗具体怎么操作？@ewrfcas

ahzz1207 · 2019-10-29T07:01:33Z

这个结果目前看来在ERNIE2.0 base到ERNIE2.0 large之间，在预训练模型里效果算比较好的了。
长度设为256依靠划窗可以跑，但是效果会有一点下降

划窗具体怎么操作？@ewrfcas

插个眼..同好奇

ewrfcas · 2019-10-30T06:27:30Z

划窗可以参考google官方squad代码，或者https://github.com/ewrfcas/bert_cn_finetune/blob/master/preprocess/cmrc2018_preprocess.py

brightmart closed this as completed Sep 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请问为什么roberta_large比roberta_middle在CMRC2018上低很多？ #16

请问为什么roberta_large比roberta_middle在CMRC2018上低很多？ #16

ewrfcas commented Sep 9, 2019

YingZiqiang commented Sep 9, 2019

ewrfcas commented Sep 9, 2019

brightmart commented Sep 9, 2019

ewrfcas commented Sep 9, 2019

ewrfcas commented Sep 9, 2019

brightmart commented Sep 9, 2019 •

edited

Loading

ymcui commented Sep 10, 2019

ewrfcas commented Sep 10, 2019

brightmart commented Sep 10, 2019 •

edited

Loading

ymcui commented Sep 10, 2019

ewrfcas commented Sep 10, 2019 •

edited

Loading

ewrfcas commented Sep 10, 2019

brightmart commented Sep 10, 2019

ewrfcas commented Sep 11, 2019

brightmart commented Sep 15, 2019 •

edited

Loading

ewrfcas commented Sep 16, 2019

oyjxer commented Oct 21, 2019

ahzz1207 commented Oct 29, 2019

ewrfcas commented Oct 30, 2019

请问为什么roberta_large比roberta_middle在CMRC2018上低很多？ #16

请问为什么roberta_large比roberta_middle在CMRC2018上低很多？ #16

Comments

ewrfcas commented Sep 9, 2019

YingZiqiang commented Sep 9, 2019

ewrfcas commented Sep 9, 2019

brightmart commented Sep 9, 2019

ewrfcas commented Sep 9, 2019

ewrfcas commented Sep 9, 2019

brightmart commented Sep 9, 2019 • edited Loading

ymcui commented Sep 10, 2019

ewrfcas commented Sep 10, 2019

brightmart commented Sep 10, 2019 • edited Loading

ymcui commented Sep 10, 2019

ewrfcas commented Sep 10, 2019 • edited Loading

ewrfcas commented Sep 10, 2019

brightmart commented Sep 10, 2019

ewrfcas commented Sep 11, 2019

brightmart commented Sep 15, 2019 • edited Loading

ewrfcas commented Sep 16, 2019

oyjxer commented Oct 21, 2019

ahzz1207 commented Oct 29, 2019

ewrfcas commented Oct 30, 2019

brightmart commented Sep 9, 2019 •

edited

Loading

brightmart commented Sep 10, 2019 •

edited

Loading

ewrfcas commented Sep 10, 2019 •

edited

Loading

brightmart commented Sep 15, 2019 •

edited

Loading