GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding A Wang, A Singh, J Michael, F Hill, O Levy, SR Bowman Proceedings of ICLR, 2019 | 7714 | 2019 |
A large annotated corpus for learning natural language inference SR Bowman, G Angeli, C Potts, CD Manning Proceedings of EMNLP, 2015 | 5052 | 2015 |
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference A Williams, N Nangia, SR Bowman Proceedings of NAACL-HLT, 2018 | 4761 | 2018 |
Generating sentences from a continuous space SR Bowman, L Vilnis, O Vinyals, AM Dai, R Jozefowicz, S Bengio Proceedings of CoNLL, 2016 | 2896 | 2016 |
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems A Wang, Y Pruksachatkun, N Nangia, A Singh, J Michael, F Hill, O Levy, ... Proceedings of NeurIPS, 2019 | 2340 | 2019 |
XNLI: Evaluating Cross-lingual Sentence Representations A Conneau, G Lample, R Rinott, A Williams, SR Bowman, H Schwenk, ... Proceedings of EMNLP, 2018 | 1409 | 2018 |
Neural network acceptability judgments A Warstadt, A Singh, SR Bowman TACL 7, 625-641, 2019 | 1404 | 2019 |
Annotation artifacts in natural language inference data S Gururangan, S Swayamdipta, O Levy, R Schwartz, SR Bowman, ... Proceedings of NAACL, 2018 | 1265 | 2018 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... TMLR, 2023 | 1175 | 2023 |
Constitutional AI: Harmlessness from AI feedback Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ... arXiv preprint arXiv:2212.08073, 2022 | 1145 | 2022 |
What do you learn from context? Probing for sentence structure in contextualized word representations I Tenney, P Xia, B Chen, A Wang, A Poliak, RT McCoy, N Kim, ... Proceedings of ICLR, 2019 | 926 | 2019 |
On Measuring Social Biases in Sentence Encoders C May, A Wang, S Bordia, SR Bowman, R Rudinger Proceedings of NAACL-HLT, 2019 | 660 | 2019 |
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models N Nangia, C Vania, R Bhalerao, SR Bowman Proceedings of EMNLP, 2020 | 614 | 2020 |
Sentence encoders on STILTs: Supplementary training on intermediate labeled-data tasks J Phang, T Févry, SR Bowman arXiv preprint 1811.01088, 2018 | 486 | 2018 |
BLiMP: A benchmark of linguistic minimal pairs for english A Warstadt, A Parrish, H Liu, A Mohananey, W Peng, SF Wang, ... TACL, 2020 | 442 | 2020 |
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ... arXiv preprint arXiv:2209.07858, 2022 | 433 | 2022 |
A Fast Unified Model for Parsing and Sentence Understanding SR Bowman, J Gauthier, A Rastogi, R Gupta, CD Manning, C Potts Proceedings of ACL, 2016 | 423 | 2016 |
Identifying and Reducing Gender Bias in Word-Level Language Models S Bordia, SR Bowman Proceedings of the NAACL-HLT Student Research Workshop, 2019 | 374 | 2019 |
Universal Dependencies 2.2 J Nivre, M Abrams, Ž Agić, L Ahrenberg, L Antonsen, MJ Aranzabe, ... | 351* | 2018 |
A Gold Standard Dependency Corpus for English N Silveira, T Dozat, MC de Marneffe, SR Bowman, M Connor, J Bauer, ... Proceedings of LREC, 2014 | 343 | 2014 |