ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.10356
  4. Cited By
MultiLoKo: a multilingual local knowledge benchmark for LLMs spanning 31 languages

MultiLoKo: a multilingual local knowledge benchmark for LLMs spanning 31 languages

14 April 2025
Dieuwke Hupkes
Nikolay Bogoychev
ArXivPDFHTML

Papers citing "MultiLoKo: a multilingual local knowledge benchmark for LLMs spanning 31 languages"

33 / 33 papers shown
Title
MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs
MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs
Jaap Jumelet
Leonie Weissweiler
Arianna Bisazza
68
3
0
03 Apr 2025
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
Weihao Xuan
Rui Yang
Heli Qi
Qingcheng Zeng
Yunze Xiao
...
Edison Marrese-Taylor
Shijian Lu
Yusuke Iwasawa
Yutaka Matsuo
Irene Li
ELM
171
7
0
13 Mar 2025
INCLUDE: Evaluating Multilingual Language Understanding with Regional
  Knowledge
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Angelika Romanou
Negar Foroutan
Anna Sotnikova
Zeming Chen
Sree Harsha Nelaturu
...
Mike Zhang
Imanol Schlag
Marzieh Fadaee
Sara Hooker
Antoine Bosselut
ELM
147
8
0
29 Nov 2024
Linguini: A benchmark for language-agnostic linguistic reasoning
Linguini: A benchmark for language-agnostic linguistic reasoning
Eduardo Sánchez
Belen Alastruey
C. Ropers
Pontus Stenetorp
Mikel Artetxe
Marta R. Costa-jussá
ReLM
ELM
LRM
84
7
0
18 Sep 2024
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
Fakhraddin Alwajih
Gagan Bhatia
Muhammad Abdul-Mageed
51
6
0
25 Jul 2024
Is It Good Data for Multilingual Instruction Tuning or Just Bad
  Multilingual Evaluation for Large Language Models?
Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?
Pinzhen Chen
Simon Yu
Zhicheng Guo
Barry Haddow
ELM
86
3
0
18 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
119
64
0
18 Jun 2024
Quantifying Variance in Evaluation Benchmarks
Quantifying Variance in Evaluation Benchmarks
Lovish Madaan
Aaditya K. Singh
Rylan Schaeffer
Andrew Poulton
Sanmi Koyejo
Pontus Stenetorp
Sharan Narang
Dieuwke Hupkes
67
13
0
14 Jun 2024
From Form(s) to Meaning: Probing the Semantic Depths of Language Models
  Using Multisense Consistency
From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
Xenia Ohmer
Elia Bruni
Dieuwke Hupkes
AI4CE
83
7
0
18 Apr 2024
MERA: A Comprehensive LLM Evaluation in Russian
MERA: A Comprehensive LLM Evaluation in Russian
Alena Fenogenova
Artem Chervyakov
Nikita Martynov
Anastasia Kozlova
Maria Tikhonova
...
Nikita Savushkin
Polina Mikhailova
Denis Dimitrov
Alexander Panchenko
Sergey Markov
ELM
56
12
0
09 Jan 2024
State of What Art? A Call for Multi-Prompt LLM Evaluation
State of What Art? A Call for Multi-Prompt LLM Evaluation
Moran Mizrahi
Guy Kaplan
Daniel Malkin
Rotem Dror
Dafna Shahaf
Gabriel Stanovsky
ELM
75
139
0
31 Dec 2023
Mind the instructions: a holistic evaluation of consistency and
  interactions in prompt-based learning
Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning
Lucas Weber
Elia Bruni
Dieuwke Hupkes
65
28
0
20 Oct 2023
Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
Jirui Qi
Raquel Fernández
Arianna Bisazza
KELM
HILM
89
71
0
16 Oct 2023
Large Language Models Only Pass Primary School Exams in Indonesia: A
  Comprehensive Test on IndoMMLU
Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU
Fajri Koto
Nurul Aisyah
Haonan Li
Timothy Baldwin
AI4Ed
LRM
ELM
72
44
0
07 Oct 2023
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122
  Language Variants
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Lucas Bandarkar
Davis Liang
Benjamin Muller
Mikel Artetxe
Satya Narayan Shukla
Don Husa
Naman Goyal
Abhinandan Krishnan
Luke Zettlemoyer
Madian Khabsa
61
152
0
31 Aug 2023
CMMLU: Measuring massive multitask language understanding in Chinese
CMMLU: Measuring massive multitask language understanding in Chinese
Haonan Li
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALM
ELM
87
258
0
15 Jun 2023
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining
  Large Language Models
M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models
Wenxuan Zhang
Sharifah Mahani Aljunied
Chang Gao
Yew Ken Chia
Lidong Bing
ELM
87
85
0
08 Jun 2023
Separating form and meaning: Using self-consistency to quantify task
  understanding across multiple senses
Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses
Xenia Ohmer
Elia Bruni
Dieuwke Hupkes
LRM
47
15
0
19 May 2023
Language Models are Multilingual Chain-of-Thought Reasoners
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
220
363
0
06 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
235
97
0
06 Oct 2022
Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End
  Question Answering
Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering
Priyanka Sen
Alham Fikri Aji
Amir Saffari
LRM
130
65
0
04 Oct 2022
Few-shot Learning with Multilingual Language Models
Few-shot Learning with Multilingual Language Models
Xi Lin
Todor Mihaylov
Mikel Artetxe
Tianlu Wang
Shuohui Chen
...
Luke Zettlemoyer
Zornitsa Kozareva
Mona T. Diab
Ves Stoyanov
Xian Li
BDL
ELM
LRM
95
305
0
20 Dec 2021
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual
  Machine Translation
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
Naman Goyal
Cynthia Gao
Vishrav Chaudhary
Peng-Jen Chen
Guillaume Wenzek
Da Ju
Sanjan Krishnan
MarcÁurelio Ranzato
Francisco Guzman
Angela Fan
88
582
0
06 Jun 2021
Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained
  Language Models
Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models
Nora Kassner
Philipp Dufter
Hinrich Schütze
69
141
0
01 Feb 2021
EXAMS: A Multi-Subject High School Examinations Dataset for
  Cross-Lingual and Multilingual Question Answering
EXAMS: A Multi-Subject High School Examinations Dataset for Cross-Lingual and Multilingual Question Answering
Momchil Hardalov
Todor Mihaylov
Dimitrina Zlatkova
Yoan Dinkov
Ivan Koychev
Preslav Nakov
AI4Ed
ELM
115
53
0
05 Nov 2020
X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained
  Language Models
X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models
Zhengbao Jiang
Antonios Anastasopoulos
Jun Araki
Haibo Ding
Graham Neubig
HILM
KELM
57
143
0
13 Oct 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
702
41,736
0
28 May 2020
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
Edoardo Ponti
Goran Glavaš
Olga Majewska
Qianchu Liu
Ivan Vulić
Anna Korhonen
LRM
61
320
0
01 May 2020
TyDi QA: A Benchmark for Information-Seeking Question Answering in
  Typologically Diverse Languages
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
J. Clark
Eunsol Choi
Michael Collins
Dan Garrette
Tom Kwiatkowski
Vitaly Nikolaev
J. Palomaki
130
607
0
10 Mar 2020
On the Cross-lingual Transferability of Monolingual Representations
On the Cross-lingual Transferability of Monolingual Representations
Mikel Artetxe
Sebastian Ruder
Dani Yogatama
177
793
0
25 Oct 2019
PAWS: Paraphrase Adversaries from Word Scrambling
PAWS: Paraphrase Adversaries from Word Scrambling
Yuan Zhang
Jason Baldridge
Luheng He
68
543
0
01 Apr 2019
XNLI: Evaluating Cross-lingual Sentence Representations
XNLI: Evaluating Cross-lingual Sentence Representations
Alexis Conneau
Guillaume Lample
Ruty Rinott
Adina Williams
Samuel R. Bowman
Holger Schwenk
Veselin Stoyanov
ELM
55
1,379
0
13 Sep 2018
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
246
8,113
0
16 Jun 2016
1