Scaling language models: Methods, analysis & insights from training gopher JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... arXiv preprint arXiv:2112.11446, 2021 | 619* | 2021 |
Competition-level code generation with alphacode Y Li, D Choi, J Chung, N Kushman, J Schrittwieser, R Leblond, T Eccles, ... Science 378 (6624), 1092-1097, 2022 | 551* | 2022 |
Learning and evaluating general linguistic intelligence D Yogatama, CM d'Autume, J Connor, T Kocisky, M Chrzanowski, L Kong, ... arXiv preprint arXiv:1901.11373, 2019 | 181* | 2019 |
Episodic memory in lifelong language learning C de Masson D'Autume, S Ruder, L Kong, D Yogatama Advances in Neural Information Processing Systems 32, 2019 | 160 | 2019 |
A mutual information maximization perspective of language representation learning L Kong, CM d'Autume, W Ling, L Yu, Z Dai, D Yogatama arXiv preprint arXiv:1910.08350, 2019 | 111* | 2019 |
Mind the gap: Assessing temporal generalization in neural language models A Lazaridou, A Kuncoro, E Gribovskaya, D Agrawal, A Liska, T Terzi, ... Advances in Neural Information Processing Systems 34, 29348-29363, 2021 | 94* | 2021 |
Psychlab: a psychology laboratory for deep reinforcement learning agents JZ Leibo, CM d'Autume, D Zoran, D Amos, C Beattie, K Anderson, ... arXiv preprint arXiv:1801.08116, 2018 | 81* | 2018 |
Adaptive semiparametric language models D Yogatama, C de Masson d’Autume, L Kong Transactions of the Association for Computational Linguistics 9, 362-373, 2021 | 80 | 2021 |
Training language gans from scratch C de Masson d'Autume, S Mohamed, M Rosca, J Rae Advances in Neural Information Processing Systems 32, 2019 | 75 | 2019 |
Pitfalls of static language modelling A Lazaridou, A Kuncoro, E Gribovskaya, D Agrawal, A Liska, T Terzi, ... arXiv preprint arXiv:2102.01951, 2021 | 43* | 2021 |
A systematic investigation of commonsense knowledge in large language models XL Li, A Kuncoro, J Hoffmann, C de Masson d’Autume, P Blunsom, ... Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022 | 21 | 2022 |
Episodic memory in lifelong language learning CM d'Autume, S Ruder, L Kong, D Yogatama arXiv preprint arXiv:1906.01076, 2019 | 16 | 2019 |
Streamingqa: A benchmark for adaptation to new knowledge over time in question answering models A Liska, T Kocisky, E Gribovskaya, T Terzi, E Sezener, D Agrawal, ... International Conference on Machine Learning, 13604-13622, 2022 | 13 | 2022 |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher. arXiv 2021 JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... arXiv preprint arXiv:2112.11446, 0 | 7 | |
Do Language Models Learn Commonsense Knowledge? XL Li, A Kuncoro, CM d'Autume, P Blunsom, A Nematzadeh arXiv preprint arXiv:2111.00607, 2021 | 5 | 2021 |
Sentence encoding with tree-constrained relation networks L Yu, CM d'Autume, C Dyer, P Blunsom, L Kong, W Ling arXiv preprint arXiv:1811.10475, 2018 | 4 | 2018 |
A systematic investigation of commonsense understanding in large language models XL Li, A Kuncoro, CM d’Autume, P Blunsom, A Nematzadeh CoRR, abs/2111.00607, 2021 | 2 | 2021 |
StreamingQA: a benchmark for adaptation to new knowledge over time in question answering models A Liška, T Kočiskż, E Gribovskaya, T Terzi, E Sezener, D Agrawal, ... arXiv preprint arXiv:2205.11388, 2022 | 1 | 2022 |
Computer code generation from task descriptions using neural networks Y Li, DH Choi, J Chung, NA Kushman, J Schrittwieser, R Leblond, ... US Patent App. 18/105,211, 2023 | | 2023 |