Laurençon, H, Saulnier, L, Wang, T, Akiki, C, Villanova del Moral, A, Le Scao, T, von Werra, L, Mou, C, González Ponferrada, E, Nguyen, H, Frohberg, J, Šaško, M, Lhoest, Q, Mcmillan-Major, A, Dupont, G, Biderman, S, Rogers, A, Ben Allal, L
, de Toni, F, Pistilli, G, Nguyen, O, Nikpoor, S, Masoud, M, Colombo, P, de la Rosa, J, Villegas, P, Thrush, T, Longpre, S, Nagel, S, Weber, L, Romero Muñoz, M, Zhu, J, van Strien, D, Alyafeai, Z, Almubarak, K, Chien, VM, Gonzalez-Dios, I, Soroa, A, Lo, K, Dey, M, Ortiz Suarez, P, Gokaslan, A, Bose, S, Adelani, DI, Phan, L, Tran, H, Yu, I, Pai, S, Chim, J, Lepercq, V, Ilić, S, Mitchell, M, Luccioni, S & Jernite, Y 2022,
The BigScience ROOTS Corpus: A 1.6TB composite multilingual dataset. in
Advances in Neural Information Processing Systems 35 (NeurIPS 2022). Neural Information Processing Systems Foundation, Inc. (NeurIPS), New Orleans, United States, 36th Conference on Neural Information Processing Systems, New Orleans, Louisiana, United States,
28/11/22. <
https://openreview.net/forum?id=UoEw6KigkUn>