Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.10785
Cited By
SERENGETI: Massively Multilingual Language Models for Africa
21 December 2022
Ife Adebara
AbdelRahim Elmadany
Muhammad Abdul-Mageed
Alcides Alcoba Inciarte
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SERENGETI: Massively Multilingual Language Models for Africa"
24 / 24 papers shown
Title
Lugha-Llama: Adapting Large Language Models for African Languages
Happy Buzaaba
Alexander Wettig
David Ifeoluwa Adelani
Christiane Fellbaum
34
0
0
09 Apr 2025
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies
Laurie Burchell
Ona de Gibert
Nikolay Arefyev
Mikko Aulamo
Marta Bañón
...
Pavel Stepachev
and Jörg Tiedemann
Dušan Variš
Tereza Vojtěchová
Jaume Zaragoza-Bernabeu
43
1
0
13 Mar 2025
Where Are We? Evaluating LLM Performance on African Languages
Ife Adebara
Hawau Olamide Toyin
Nahom Tesfu Ghebremichael
AbdelRahim Elmadany
Muhammad Abdul-Mageed
57
0
0
26 Feb 2025
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Amir Hossein Kargaran
François Yvon
Hinrich Schutze
VLM
36
5
0
31 Oct 2024
State of NLP in Kenya: A Survey
Cynthia Jayne Amol
Everlyn Asiko Chimoto
Rose Delilah Gesicho
Antony M. Gitau
Naome A. Etori
...
Catherine Gitau
Antony Ndolo
Lilian D. A. Wanzare
Albert Njoroge Kahira
Ronald Tombe
29
1
0
13 Oct 2024
From N-grams to Pre-trained Multilingual Models For Language Identification
Thapelo Sindane
Vukosi Marivate
24
1
0
11 Oct 2024
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
David Ifeoluwa Adelani
Jessica Ojo
Israel Abebe Azime
Jian Yun Zhuang
Jesujoba Oluwadara Alabi
...
Salomey Osei
Sokhar Samb
Tadesse Kebede Guge
Pontus Stenetorp
Pontus Stenetorp
ELM
65
7
0
05 Jun 2024
ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model
Osvaldo Luamba Quinjica
David Ifeoluwa Adelani
35
0
0
03 Apr 2024
Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context
Antoine Caubrière
Elodie Gauthier
26
2
0
02 Apr 2024
EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation
A. Tonja
Israel Abebe Azime
Tadesse Destaw Belay
M. Yigezu
Moges Ahmed Mehamed
...
Olga Kolesnikova
Philipp Slusallek
Dietrich Klakow
Shengwu Xiong
Seid Muhie Yimam
51
5
0
20 Mar 2024
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean
Eunsu Kim
Juyoung Suk
Philhoon Oh
Haneul Yoo
James Thorne
Alice H. Oh
ELM
72
15
0
11 Mar 2024
MaLA-500: Massive Language Adaptation of Large Language Models
Peiqin Lin
Shaoxiong Ji
Jörg Tiedemann
André F. T. Martins
Hinrich Schütze
ELM
31
15
0
24 Jan 2024
TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models
Yihong Liu
Chunlan Ma
Haotian Ye
Hinrich Schütze
30
1
0
12 Jan 2024
PuoBERTa: Training and evaluation of a curated language model for Setswana
Vukosi Marivate
Moseli Motsóehli
Valencia Wagner
Richard Lastrucci
Isheanesu Dzingirai
30
8
0
13 Oct 2023
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
David Ifeoluwa Adelani
Hannah Liu
Xiaoyu Shen
Nikita Vassilyev
Jesujoba Oluwadara Alabi
Yanke Mao
Haonan Gao
Annie En-Shiun Lee
ELM
38
60
0
14 Sep 2023
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Sneha Kudugunta
Isaac Caswell
Biao Zhang
Xavier Garcia
Christopher A. Choquette-Choo
...
Derrick Xin
Aditya Kusupati
Romi Stella
Ankur Bapna
Orhan Firat
67
118
0
09 Sep 2023
NaijaRC: A Multi-choice Reading Comprehension Dataset for Nigerian Languages
Anuoluwapo Aremu
Jesujoba Oluwadara Alabi
Daud Abolade
Nkechinyere F. Aguobi
Shamsuddeen Hassan Muhammad
David Ifeoluwa Adelani
26
3
0
18 Aug 2023
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
Ayyoob Imani
Peiqin Lin
Amir Hossein Kargaran
Silvia Severini
Masoud Jalili Sabet
...
Chunlan Ma
Helmut Schmid
André F. T. Martins
François Yvon
Hinrich Schütze
ALM
LRM
33
95
0
20 May 2023
Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages
Chunlan Ma
Ayyoob Imani
Haotian Ye
Renhao Pei
Ehsaneddin Asgari
Hinrich Schütze
27
23
0
15 May 2023
How Good are Commercial Large Language Models on African Languages?
Jessica Ojo
Kelechi Ogueji
26
5
0
11 May 2023
UBC-DLNLP at SemEval-2023 Task 12: Impact of Transfer Learning on African Sentiment Analysis
Gagan Bhatia
Ife Adebara
AbdelRahim Elmadany
Muhammad Abdul-Mageed
29
1
0
21 Apr 2023
SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)
Shamsuddeen Hassan Muhammad
Idris Abdulmumin
Seid Muhie Yimam
David Ifeoluwa Adelani
I. Ahmad
N. Ousidhoum
A. Ayele
Saif M. Mohammad
Meriem Beloucif
Sebastian Ruder
38
67
0
13 Apr 2023
Larger-Scale Transformers for Multilingual Masked Language Modeling
Naman Goyal
Jingfei Du
Myle Ott
Giridhar Anantharaman
Alexis Conneau
90
98
0
02 May 2021
Diversity and Inclusion Metrics in Subset Selection
Margaret Mitchell
Dylan K. Baker
Nyalleng Moorosi
Emily L. Denton
Ben Hutchinson
A. Hanna
Timnit Gebru
Jamie Morgenstern
FaML
150
85
0
09 Feb 2020
1