Towards deep learning models resistant to adversarial attacks A Madry, A Makelov, L Schmidt, D Tsipras, A Vladu arXiv preprint arXiv:1706.06083, 2017 | 11720 | 2017 |
Towards deep learning models resistant to adversarial attacks A Mądry, A Makelov, L Schmidt, D Tsipras, A Vladu stat 1050, 9, 2017 | 22 | 2017 |
Expansion in lifts of graphs AA Makelov | 7 | 2015 |
Is this the subspace you are looking for? An interpretability illusion for subspace activation patching A Makelov, G Lange, A Geiger, N Nanda The Twelfth International Conference on Learning Representations, 2023 | 3 | 2023 |
Rethinking backdoor attacks A Khaddaj, G Leclerc, A Makelov, K Georgiev, H Salman, A Ilyas, A Madry International Conference on Machine Learning, 16216-16236, 2023 | 3 | 2023 |
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control A Makelov, G Lange, N Nanda arXiv preprint arXiv:2405.08366, 2024 | | 2024 |
Backdoor or Feature? A New Perspective on Data Poisoning A Khaddaj, G Leclerc, A Makelov, K Georgiev, A Ilyas, H Salman, A Madry | | 2022 |