Seguir
Catherine Olsson
Catherine Olsson
Anthropic
Email confirmado em mit.edu
Título
Citado por
Citado por
Ano
Estimating the reproducibility of psychological science
Open Science Collaboration
Science 349 (6251), aac4716, 2015
92582015
Dota 2 with large scale deep reinforcement learning
C Berner, G Brockman, B Chan, V Cheung, P Dębiak, C Dennison, ...
arXiv preprint arXiv:1912.06680, 2019
16992019
An open, large-scale, collaborative effort to estimate the reproducibility of psychological science
Open Science Collaboration
Perspectives on Psychological Science 7, 657-660, 2012
7302012
Training a helpful and harmless assistant with reinforcement learning from human feedback
Y Bai, A Jones, K Ndousse, A Askell, A Chen, N DasSarma, D Drain, ...
arXiv preprint arXiv:2204.05862, 2022
7112022
Constitutional ai: Harmlessness from ai feedback
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2022
6022022
Tensorfuzz: Debugging neural networks with coverage-guided fuzzing
A Odena, C Olsson, D Andersen, I Goodfellow
International Conference on Machine Learning, 4901-4911, 2019
3452019
Language models (mostly) know what they know
S Kadavath, T Conerly, A Askell, T Henighan, D Drain, E Perez, ...
arXiv preprint arXiv:2207.05221, 2022
2312022
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned
D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ...
arXiv preprint arXiv:2209.07858, 2022
2222022
A general language assistant as a laboratory for alignment
A Askell, Y Bai, A Chen, D Drain, D Ganguli, T Henighan, A Jones, ...
arXiv preprint arXiv:2112.00861, 2021
2182021
In-context learning and induction heads
C Olsson, N Elhage, N Nanda, N Joseph, N DasSarma, T Henighan, ...
arXiv preprint arXiv:2209.11895, 2022
1932022
Predictability and surprise in large generative models
D Ganguli, D Hernandez, L Lovitt, A Askell, Y Bai, A Chen, T Conerly, ...
Proceedings of the 2022 ACM Conference on Fairness, Accountability, and …, 2022
1742022
A mathematical framework for transformer circuits
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
Transformer Circuits Thread 1, 1, 2021
1532021
Discriminator rejection sampling
S Azadi, C Olsson, T Darrell, I Goodfellow, A Odena
arXiv preprint arXiv:1810.06758, 2018
1482018
Toy models of superposition
N Elhage, T Hume, C Olsson, N Schiefer, T Henighan, S Kravec, ...
arXiv preprint arXiv:2209.10652, 2022
1452022
Is generator conditioning causally related to GAN performance?
A Odena, J Buckman, C Olsson, T Brown, C Olah, C Raffel, I Goodfellow
International conference on machine learning, 3849-3858, 2018
1362018
Discovering language model behaviors with model-written evaluations
E Perez, S Ringer, K Lukošiūtė, K Nguyen, E Chen, S Heiner, C Pettit, ...
arXiv preprint arXiv:2212.09251, 2022
1302022
Dawn Drain
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson …, 2021
1202021
Dawn Drain
C Olsson, N Elhage, NJ Neel Nanda, N DasSarma, T Henighan, B Mann, ...
Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy …, 2022
1122022
Dota 2 with large scale deep reinforcement learning
CB OpenAI, G Brockman, B Chan, V Cheung, P Debiak, C Dennison, ...
arXiv preprint arXiv:1912.06680 2, 2019
1042019
Unrestricted adversarial examples
TB Brown, N Carlini, C Zhang, C Olsson, P Christiano, I Goodfellow
arXiv preprint arXiv:1809.08352, 2018
972018
O sistema não pode efectuar a operação agora. Tente mais tarde.
Artigos 1–20