CamemBERT: a Tasty French Language Model L Martin, B Muller, PJ Ortiz Suárez, Y Dupont, L Romary, ... Proceedings of the 58th Annual Meeting of the Association for Computational …, 2020 | 431 | 2020 |
Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures PJ Ortiz Suárez, B Sagot, L Romary 7th Workshop on the Challenges in the Management of Large Corpora, 2019 | 147* | 2019 |
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages PJ Ortiz Suárez, L Romary, B Sagot Proceedings of the 58th Annual Meeting of the Association for Computational …, 2020 | 79* | 2020 |
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets J Kreutzer, I Caswell, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ... Transactions of the Association for Computational Linguistics 10, 50-72, 2022 | 41* | 2022 |
Building a user-generated content north-african arabizi treebank: Tackling hell D Seddah, F Essaidi, A Fethi, M Futeral, B Muller, PJ Ortiz Suárez, ... Proceedings of the 58th Annual Meeting of the Association for Computational …, 2020 | 22 | 2020 |
Establishing a New State-of-the-Art for French Named Entity Recognition PJ Ortiz Suárez, Y Dupont, B Muller, L Romary, B Sagot Proceedings of The 12th Language Resources and Evaluation Conference, 4631–4638, 2020 | 10* | 2020 |
Les modčles de langue contextuels Camembert pour le français: impact de la taille et de l'hétérogénéité des données d'entrainement L Martin, B Muller, PJ Ortiz Suárez, Y Dupont, L Romary, E Clergerie, ... Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP …, 2020 | 4 | 2020 |
Ungoliant: An optimized pipeline for the generation of a very large-scale multilingual web corpus J Abadji, PJ Ortiz Suárez, L Romary, B Sagot Proceedings of the Workshop on Challenges in the Management of Large Corpora …, 2021 | 2 | 2021 |
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources A McMillan-Major, Z Alyafeai, S Biderman, K Chen, F De Toni, G Dupont, ... arXiv preprint arXiv:2201.10066, 2022 | 1 | 2022 |
SinNer@CLEF-HIPE2020: Sinful Adaptation of SotA models for Named Entity Recognition in Historical French and German Newspapers PJ Ortiz Suárez, Y Dupont, G Lejeune, T Tian CLEF 2020 Working Notes 2696, 2020 | 1* | 2020 |
French Contextualized Word-Embeddings with a sip of CaBeRnet: a New French Balanced Reference Corpus M Popa-Fabre, PJ Ortiz Suárez, B Sagot, ÉV de la Clergerie Proceedings of the 8th Workshop on Challenges in the Management of Large …, 2020 | 1 | 2020 |
How OCR Performance can Impact on the Automatic Extraction of Dictionary Content Structures M Khemakhem, I Galleron, G Williams, L Romary, PJ Ortiz Suárez | 1 | 2019 |
From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French S Gabay, P Ortiz Suarez, A Bartz, A Chagué, R Bawden, P Gambette, ... arXiv preprint arXiv:2202.09452, 2022 | | 2022 |
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus J Abadji, P Ortiz Suarez, L Romary, B Sagot arXiv preprint arXiv:2201.06642, 2022 | | 2022 |
Expanding the content model of annotationBlock A Bartz, J Janes, L Romary, P Gambette, R Bawden, PJ Ortiz Suárez, ... Next Gen TEI, 2021-TEI Conference and Members’ Meeting, 2021 | | 2021 |
A dataset for automatic detection of places in (early) modern French texts S Gabay, P Ortiz Suarez NASSCFL 2021-50th Annual North American Society for Seventeenth-Century …, 2021 | | 2021 |
Preparing the Dictionnaire Universel for Automatic Enrichment PJ Ortiz Suárez, L Romary, B Sagot 10th International Conference on Historical Lexicography and Lexicology (ICHLL), 2019 | | 2019 |