Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.07445
Cited By
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
14 September 2023
David Ifeoluwa Adelani
Hannah Liu
Xiaoyu Shen
Nikita Vassilyev
Jesujoba Oluwadara Alabi
Yanke Mao
Haonan Gao
Annie En-Shiun Lee
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects"
50 / 50 papers shown
Title
Attention on Multiword Expressions: A Multilingual Study of BERT-based Models with Regard to Idiomaticity and Microsyntax
Iuliia Zaitova
Vitalii Hirak
Badr M. Abdullah
Dietrich Klakow
Bernd Möbius
T. Avgustinova
29
0
0
09 May 2025
HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization
Enes Özeren
Yihong Liu
Hinrich Schütze
31
0
0
21 Apr 2025
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis
Yizhong Geng
Jizhuo Xu
Zeyu Liang
Jinghan Yang
Xiaoyi Shi
Xiaoyu Shen
19
0
0
10 Apr 2025
Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources
Zihao Li
Shaoxiong Ji
Hengyu Luo
Jörg Tiedemann
CLL
122
0
0
05 Apr 2025
GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models
Hengyu Luo
Zihao Li
Joseph Attieh
Sawal Devkota
Ona de Gibert
...
Ananda Sreenidhi
Raúl Vázquez
Mengjie Wang
Samea Yusofi
Jörg Tiedemann
ELM
38
0
0
05 Apr 2025
Overcoming Vocabulary Constraints with Pixel-level Fallback
Jonas F. Lotz
Hendra Setiawan
Stephan Peitz
Yova Kementchedjhieva
43
0
0
02 Apr 2025
Who Wrote This? Identifying Machine vs Human-Generated Text in Hausa
Babangida Sani
Aakansha Soy
Sukairaj Hafiz Imam
A. Mustapha
L. Aliyu
Idris Abdulmumin
I. Ahmad
Shamsuddeen Hassan Muhammad
DeLMO
77
0
0
17 Mar 2025
MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages
Chen Zhang
Mingxu Tao
Zhiyuan Liao
Yansong Feng
41
0
0
03 Mar 2025
Where Are We? Evaluating LLM Performance on African Languages
Ife Adebara
Hawau Olamide Toyin
Nahom Tesfu Ghebremichael
AbdelRahim Elmadany
Muhammad Abdul-Mageed
52
0
0
26 Feb 2025
AFRIDOC-MT: Document-level MT Corpus for African Languages
Jesujoba Oluwadara Alabi
Israel Abebe Azime
Miaoran Zhang
C. España-Bonet
Rachel Bawden
...
Shamsuddeen Hassan Muhammad
Neo Putini
David O. Ademuyiwa
Andrew Caines
Dietrich Klakow
34
0
0
10 Jan 2025
LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models
Hieu Man
Nghia Trung Ngo
Viet Dac Lai
Ryan Rossi
Franck Dernoncourt
T. Nguyen
154
0
0
01 Jan 2025
Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models
Sina Bagheri Nezhad
Ameeta Agrawal
Rhitabrat Pokharel
LRM
74
2
0
17 Dec 2024
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages
Edward Bayes
Israel Abebe Azime
Jesujoba Oluwadara Alabi
Jonas Kgomo
Tyna Eloundou
...
Shamsuddeen Hassan Muhammad
Choice Mpanza
Igneciah Pocia Thete
Dietrich Klakow
David Ifeoluwa Adelani
HILM
ELM
76
6
0
01 Dec 2024
Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense
Samuel Cahyawijaya
Ruochen Zhang
Holy Lovenia
Jan Christian Blaise Cruz
Elisa Gilbert
Hiroki Nomoto
Alham Fikri Aji
LRM
36
0
0
28 Oct 2024
Better to Ask in English: Evaluation of Large Language Models on English, Low-resource and Cross-Lingual Settings
Krishno Dey
Prerona Tarannum
Md. Arid Hasan
Imran Razzak
Usman Naseem
35
3
0
17 Oct 2024
State of NLP in Kenya: A Survey
Cynthia Jayne Amol
Everlyn Asiko Chimoto
Rose Delilah Gesicho
Antony M. Gitau
Naome A. Etori
...
Catherine Gitau
Antony Ndolo
Lilian D. A. Wanzare
Albert Njoroge Kahira
Ronald Tombe
21
1
0
13 Oct 2024
LangSAMP: Language-Script Aware Multilingual Pretraining
Yihong Liu
Haotian Ye
Chunlan Ma
Mingyang Wang
Hinrich Schütze
VLM
31
0
0
26 Sep 2024
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Shaoxiong Ji
Zihao Li
Indraneil Paul
Jaakko Paavola
Peiqin Lin
...
Dayyán O'Brien
Hengyu Luo
Hinrich Schütze
Jörg Tiedemann
Barry Haddow
CLL
37
3
0
26 Sep 2024
How Transliterations Improve Crosslingual Alignment
Yihong Liu
Mingyang Wang
Amir Hossein Kargaran
Ayyoob Imani
Orgest Xhelili
Haotian Ye
Chunlan Ma
François Yvon
Hinrich Schütze
34
2
0
25 Sep 2024
InkubaLM: A small language model for low-resource African languages
A. Tonja
Bonaventure F. P. Dossou
Jessica Ojo
Jenalea Rajab
Fadel Thior
...
Anuoluwapo Aremu
Pelonomi Moiloa
Jade Z. Abbott
Vukosi Marivate
Benjamin Rosman
41
8
0
30 Aug 2024
MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Orevaoghene Ahia
Sachin Kumar
Hila Gonen
Valentin Hoffman
Tomasz Limisiewicz
Yulia Tsvetkov
Noah A. Smith
43
4
0
11 Jul 2024
Unlocking the Potential of Model Merging for Low-Resource Languages
Mingxu Tao
Chen Zhang
Quzhe Huang
Tianyao Ma
Songfang Huang
Dongyan Zhao
Yansong Feng
CLL
MoMe
27
3
0
04 Jul 2024
Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts
Chunlan Ma
Yihong Liu
Haotian Ye
Hinrich Schütze
28
2
0
02 Jul 2024
GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning
Hasna Chouikhi
Manel Aloui
Cyrine Ben Hammou
Ghaith Chaabane
Haithem Kchaou
Chehir Dhaouadi
41
0
0
02 Jul 2024
A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models
Peiqin Lin
André F. T. Martins
Hinrich Schütze
51
2
0
29 Jun 2024
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment
Orgest Xhelili
Yihong Liu
Hinrich Schütze
31
6
0
28 Jun 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James Validad Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
87
9
0
14 Jun 2024
MINERS: Multilingual Language Models as Semantic Retrievers
Genta Indra Winata
Ruochen Zhang
David Ifeoluwa Adelani
RALM
47
5
0
11 Jun 2024
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
David Ifeoluwa Adelani
Jessica Ojo
Israel Abebe Azime
Jian Yun Zhuang
Jesujoba Oluwadara Alabi
...
Salomey Osei
Sokhar Samb
Tadesse Kebede Guge
Pontus Stenetorp
Pontus Stenetorp
ELM
62
7
0
05 Jun 2024
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data
Yihong Liu
Chunlan Ma
Haotian Ye
Hinrich Schütze
36
4
0
16 May 2024
UCCIX: Irish-eXcellence Large Language Model
Khanh-Tung Tran
Barry O’Sullivan
Hoang D. Nguyen
33
3
0
13 May 2024
XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
Peiqin Lin
André F. T. Martins
Hinrich Schütze
RALM
45
2
0
08 May 2024
What Drives Performance in Multilingual Language Models?
Sina Bagheri Nezhad
Ameeta Agrawal
LRM
37
9
0
29 Apr 2024
Translation of Multifaceted Data without Re-Training of Machine Translation Systems
Hyeonseok Moon
Seungyoon Lee
Seongtae Hong
Seungjun Lee
Chanjun Park
Heu-Jeoung Lim
28
0
0
25 Apr 2024
SambaLingo: Teaching Large Language Models New Languages
Zoltan Csaki
Bo Li
Jonathan Li
Qiantong Xu
Pian Pawakapan
Leon Zhang
Yun Du
Hengyu Zhao
Changran Hu
Urmish Thakker
32
6
0
08 Apr 2024
Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish
Fred Philippy
Shohreh Haddadan
Siwen Guo
27
0
0
05 Apr 2024
ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model
Osvaldo Luamba Quinjica
David Ifeoluwa Adelani
29
0
0
03 Apr 2024
AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness
Miaoran Zhang
Mingyang Wang
Jesujoba Oluwadara Alabi
Dietrich Klakow
VLM
38
4
0
01 Apr 2024
DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages
Fahim Faisal
Orevaoghene Ahia
Aarohi Srivastava
Kabir Ahuja
David Chiang
Yulia Tsvetkov
Antonios Anastasopoulos
56
27
0
16 Mar 2024
LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons
Zheng-Xin Yong
Cristina Menghini
Stephen H. Bach
43
3
0
21 Feb 2024
Soft Prompt Tuning for Cross-Lingual Transfer: When Less is More
Fred Philippy
Siwen Guo
Shohreh Haddadan
Cedric Lothritz
Jacques Klein
Tegawende F. Bissyande
AAML
VLM
22
1
0
06 Feb 2024
MaLA-500: Massive Language Adaptation of Large Language Models
Peiqin Lin
Shaoxiong Ji
Jörg Tiedemann
André F. T. Martins
Hinrich Schütze
ELM
23
15
0
24 Jan 2024
Exploring the Maze of Multilingual Modeling
Sina Bagheri Nezhad
Ameeta Agrawal
16
1
0
09 Oct 2023
Scaling Speech Technology to 1,000+ Languages
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
77
298
0
22 May 2023
xPQA: Cross-Lingual Product Question Answering across 12 Languages
Xiaoyu Shen
Akari Asai
Bill Byrne
Adria de Gispert
22
7
0
16 May 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Xia Hu
LM&MA
131
622
0
26 Apr 2023
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
David Ifeoluwa Adelani
Graham Neubig
Sebastian Ruder
Shruti Rijhwani
Michael Beukman
...
Idris Abdulmumin
Odunayo Ogundepo
Oreen Yousuf
Tatiana Moteu Ngoli
Dietrich Klakow
36
43
0
22 Oct 2022
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Alexis Conneau
Min Ma
Simran Khanuja
Yu Zhang
Vera Axelrod
Siddharth Dalmia
Jason Riesa
Clara E. Rivera
Ankur Bapna
VLM
78
282
0
25 May 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
213
1,657
0
15 Oct 2021
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
246
491
0
16 Oct 2019
1