-
Notifications
You must be signed in to change notification settings - Fork 306
Configure different training schemes #146
Comments
It would for instance be interesting to do curriculum learning where it starts with random negatives, then hard negatives from BM25, then hard negatives from previous checkpoint |
Hi @Eiriksak , As of your questions list:
My feeling is that random negatives are mostly useless, the model learns lexical matching pretty quickly and then the challenge is to distinguish between semantically different but lexically close passages. random negatives are not that super useful for this purpose. |
Thanks for a great reply @vlad-karpukhin! I guess I still have to include random negatives and hard negatives in the dev set when validate_average_rank runs: Line 309 in 142d448
I can see default is to start this from epoch 30 (val_av_rank_start_epoch) and include 30 hard negatives and 30 other negatives (val_av_rank_hard_neg, val_av_rank_other_neg) per question. Is there any reason why you evaluate with validate_nll for the first 30 epochs instead of this? |
Both NLL and average rank are not perfect validation metrics when we measured its correlation with the final retrieval performance over the entire wikipedia. |
Hi,
I am currently trying to compare different training schemes when training the biencoder for a new downstream task. Can someone please clarify to me how to properly set up data and config in order to have similar experiments as in table 3 in the paper:
The code do already use in-batch negative training, as stated in #110. I wonder if:
I dont know if it makes sense to do experiments with random negatives or pre-computed Gold negatives (adding them to negative_ctxs and set other_negatives=#N) when the IB setting is on.
The text was updated successfully, but these errors were encountered: