Johannes von Oswald
Johannes von Oswald
Research Scientist, Google Research
Verified email at - Homepage
Cited by
Cited by
Continual learning with hypernetworks
J von Oswald, C Henning, BF Grewe, J Sacramento
International Conference on Learning Representation (ICLR 2020), 2019
Transformers learn in-context by gradient descent
J Von Oswald, E Niklasson, E Randazzo, J Sacramento, A Mordvintsev, ...
International Conference on Machine Learning, 35151-35174, 2023
Learning where to learn: Gradient sparsity in meta and continual learning
J Von Oswald, D Zhao, S Kobayashi, S Schug, M Caccia, N Zucchet, ...
Advances in Neural Information Processing Systems 34, 5250-5263, 2021
Posterior meta-replay for continual learning
C Henning, M Cervera, F D'Angelo, J Von Oswald, R Traber, B Ehret, ...
Advances in neural information processing systems 34, 14135-14149, 2021
Continual learning in recurrent neural networks
B Ehret, C Henning, MR Cervera, A Meulemans, J Von Oswald, BF Grewe
arXiv preprint arXiv:2006.12109, 2020
Meta-Learning via Hypernetworks
D Zhao, S Kobayashi, J Sacramento, J von Oswald
4th Workshop on Meta-Learning at NeurIPS 2020, Vancouver, Canada, 2020
Neural networks with late-phase weights
J von Oswald, S Kobayashi, A Meulemans, C Henning, BF Grewe, ...
International Conference on Learning Representation (ICLR 2021), arXiv: 2007 …, 2020
Approximating the predictive distribution via adversarially-trained hypernetworks
C Henning, J von Oswald, J Sacramento, SC Surace, JP Pfister, ...
Yarin, 2018
A contrastive rule for meta-learning
N Zucchet, S Schug, J Von Oswald, D Zhao, J Sacramento
Advances in Neural Information Processing Systems 35, 25921-25936, 2022
Random initialisations performing above chance and how to find them
F Benzing, S Schug, R Meier, J Von Oswald, Y Akram, N Zucchet, ...
arXiv preprint arXiv:2209.07509, 2022
Uncovering mesa-optimization algorithms in transformers
J von Oswald, E Niklasson, M Schlegel, S Kobayashi, N Zucchet, ...
arXiv preprint arXiv:2309.05858, 2023
The least-control principle for local learning at equilibrium
A Meulemans, N Zucchet, S Kobayashi, J Von Oswald, J Sacramento
Advances in Neural Information Processing Systems 35, 33603-33617, 2022
On the reversed bias-variance tradeoff in deep ensembles
S Kobayashi, J von Oswald, BF Grewe
ICML, 2021
Gated recurrent neural networks discover attention
N Zucchet, S Kobayashi, Y Akram, J Von Oswald, M Larcher, A Steger, ...
arXiv preprint arXiv:2309.01775, 2023
Discovering modular solutions that generalize compositionally
S Schug, S Kobayashi, Y Akram, M Wołczyk, A Proca, J Von Oswald, ...
arXiv preprint arXiv:2312.15001, 2023
Linear Transformers are Versatile In-Context Learners
M Vladymyrov, J von Oswald, M Sandler, R Ge
arXiv preprint arXiv:2402.14180, 2024
Interpretability of Learning Algorithms Encoded in Deep Neural Networks
J von Oswald
ETH Zurich, 2024
A complementary systems theory of meta-learning
S Schug, N Zucchet, J von Oswald, J Sacramento
Cosyne 2023, 2023
Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
S Kobayashi, P Vilimelis Aceituno, J Von Oswald
Advances in Neural Information Processing Systems 35, 25335-25348, 2022
The system can't perform the operation now. Try again later.
Articles 1–19