On the opportunities and risks of foundation models R Bommasani, DA Hudson, E Adeli, R Altman, S Arora, S von Arx, ... arXiv preprint arXiv:2108.07258, 2021 | 4396 | 2021 |
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models A Tamkin, M Brundage, J Clark, D Ganguli arXiv preprint arXiv:2102.02503, https://arxiv.org/abs/2102.02503, 2021 | 329 | 2021 |
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning T Bricken, A Templeton, J Batson, B Chen, A Jermyn, T Conerly, ... https://transformer-circuits.pub/2023/monosemantic-features/index.html, 2023 | 241 | 2023 |
Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet A Templeton, T Conerly, J Marcus, J Lindsey, T Bricken, B Chen, ... Transformer Circuits Thread, 2024 | 153 | 2024 |
Towards measuring the representation of subjective global opinions in language models E Durmus, K Nyugen, TI Liao, N Schiefer, A Askell, A Bakhtin, C Chen, ... arXiv preprint arXiv:2306.16388, 2023 | 139 | 2023 |
Studying large language model generalization with influence functions R Grosse, J Bae, C Anil, N Elhage, A Tamkin, A Tajdini, B Steiner, D Li, ... arXiv preprint arXiv:2308.03296, 2023 | 115 | 2023 |
Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy R Keramati, C Dann, A Tamkin, E Brunskill Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 2020 | 93 | 2020 |
Viewmaker Networks: Learning Views for Unsupervised Representation Learning A Tamkin, M Wu, N Goodman ICLR 2021, 2020 | 73 | 2020 |
Many-shot jailbreaking C Anil, E Durmus, N Rimsky, M Sharma, J Benton, S Kundu, J Batson, ... The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 | 67 | 2024 |
Drone.io: A Gestural and Visual Interface for Human-Drone Interaction JR Cauchard, A Tamkin, CY Wang, L Vink, M Park, T Fang, JA Landay 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019 | 62 | 2019 |
Investigating transferability in pretrained language models A Tamkin, T Singh, D Giovanardi, N Goodman Findings of EMNLP 2020, 2020 | 51 | 2020 |
Language Through a Prism: A Spectral Approach for Multiscale Language Representations A Tamkin, D Jurafsky, N Goodman NeurIPS 2020, 2020 | 41 | 2020 |
Evaluating and mitigating discrimination in language model decisions A Tamkin, A Askell, L Lovitt, E Durmus, N Joseph, S Kravec, K Nguyen, ... arXiv preprint arXiv:2312.03689, 2023 | 40 | 2023 |
Eliciting human preferences with language models BZ Li, A Tamkin, N Goodman, J Andreas arXiv preprint arXiv:2310.11589, 2023 | 40 | 2023 |
DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning A Tamkin, V Liu, R Lu, D Fein, C Schultz, N Goodman NeurIPS 2021, 2021 | 39 | 2021 |
Active Learning Helps Pretrained Models Learn the Intended Task A Tamkin, D Nguyen, S Deshpande, J Mu, N Goodman NeurIPS 2022, 2022 | 38 | 2022 |
Distributionally-Aware Exploration for CVaR Bandits A Tamkin, R Keramati, C Dann, E Brunskill NeurIPS 2019 Workshop on Safety and Robustness in Decision Making, 2019 | 38 | 2019 |
C5t5: Controllable generation of organic molecules with transformers D Rothchild, A Tamkin, J Yu, U Misra, J Gonzalez arXiv preprint arXiv:2108.10307, 2021 | 33 | 2021 |
Recursive Routing Networks: Learning to Compose Modules for Language Understanding I Cases, C Rosenbaum, M Riemer, A Geiger, T Klinger, A Tamkin, O Li, ... NAACL 2019, 2019 | 30 | 2019 |
Task Ambiguity in Humans and Language Models A Tamkin, K Handa, A Shrestha, N Goodman ICLR 2023, 2023 | 27 | 2023 |