ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16591
  4. Cited By
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering

Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering

22 May 2025
Bowen Jiang
Runchuan Zhu
Jiang Wu
Zinco Jiang
Yifan He
Junyuan Gao
Jia Yu
Rui Min
Yinfan Wang
Haote Yang
Songyang Zhang
Dahua Lin
Lijun Wu
Conghui He
    ELM
ArXiv (abs)PDFHTML

Papers citing "Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering"

21 / 21 papers shown
Title
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
Weihao Xuan
Rui Yang
Heli Qi
Qingcheng Zeng
Yunze Xiao
...
Edison Marrese-Taylor
Shijian Lu
Yusuke Iwasawa
Yutaka Matsuo
Irene Li
ELM
198
7
0
13 Mar 2025
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
Xu Huang
Wenhao Zhu
Hanxu Hu
Zeang Sheng
Lei Li
Shujian Huang
Fei Yuan
ELM
126
4
0
11 Feb 2025
Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: A Systematic Scoping Review
Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: A Systematic Scoping Review
Hongyi Yang
Fangyuan Chang
Dian Zhu
Muroi Fumie
Zhao Liu
83
22
0
28 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
380
1,970
0
22 Jan 2025
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages
Junho Myung
Nayeon Lee
Yi Zhou
Jiho Jin
Rifki Afina Putri
...
Seid Muhie Yimam
Mohammad Taher Pilehvar
N. Ousidhoum
Jose Camacho-Collados
Alice Oh
155
55
0
17 Jan 2025
MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge
MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge
Jie He
Nan Hu
Wanqiu Long
Jiaoyan Chen
Jeff Z. Pan
ELMLRM
164
9
0
22 Dec 2024
Measuring short-form factuality in large language models
Measuring short-form factuality in large language models
Jason W. Wei
Nguyen Karina
Hyung Won Chung
Yunxin Joy Jiao
Spencer Papay
Amelia Glaese
John Schulman
W. Fedus
ELMKELMHILM
66
78
0
07 Nov 2024
GPT-4o System Card
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
204
1,019
0
25 Oct 2024
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Shane Arora
Marzena Karpinska
Hung-Ting Chen
Ipsita Bhattacharjee
Mohit Iyyer
Eunsol Choi
HILM
122
14
0
25 Jun 2024
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art
Chen Cecilia Liu
Iryna Gurevych
Anna Korhonen
137
6
0
06 Jun 2024
Benchmarking Chinese Commonsense Reasoning of LLMs: From
  Chinese-Specifics to Reasoning-Memorization Correlations
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations
Jiaxing Sun
Weiquan Huang
Jiang Wu
Chenya Gu
Wei Li
Songyang Zhang
Hang Yan
Conghui He
LRM
68
8
0
21 Mar 2024
Can LLM Generate Culturally Relevant Commonsense QA Data? Case Study in
  Indonesian and Sundanese
Can LLM Generate Culturally Relevant Commonsense QA Data? Case Study in Indonesian and Sundanese
Rifki Afina Putri
Faiz Ghifari Haznitrama
Dea Adhista
Alice Oh
74
19
0
27 Feb 2024
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
Fajri Koto
Haonan Li
Sara Shatnawi
Jad Doughman
Abdelrahman Boda Sadallah
...
Neha Sengupta
Shady Shehata
Nizar Habash
Preslav Nakov
Timothy Baldwin
ELMLRM
126
44
0
20 Feb 2024
KMMLU: Measuring Massive Multitask Language Understanding in Korean
KMMLU: Measuring Massive Multitask Language Understanding in Korean
Guijin Son
Hanwool Albert Lee
Sungdong Kim
Seungone Kim
Niklas Muennighoff
Taekyoon Choi
Cheonbok Park
Kang Min Yoo
Stella Biderman
ALMRALMELM
84
44
0
18 Feb 2024
Do Llamas Work in English? On the Latent Language of Multilingual
  Transformers
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
Chris Wendler
V. Veselovsky
Giovanni Monea
Robert West
108
130
0
16 Feb 2024
Large Language Models Only Pass Primary School Exams in Indonesia: A
  Comprehensive Test on IndoMMLU
Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU
Fajri Koto
Nurul Aisyah
Haonan Li
Timothy Baldwin
AI4EdLRMELM
79
46
0
07 Oct 2023
CMMLU: Measuring massive multitask language understanding in Chinese
CMMLU: Measuring massive multitask language understanding in Chinese
Haonan Li
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALMELM
97
272
0
15 Jun 2023
Having Beer after Prayer? Measuring Cultural Bias in Large Language
  Models
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
Tarek Naous
Michael Joseph Ryan
Alan Ritter
Wei Xu
69
94
0
23 May 2023
Language Models are Multilingual Chain-of-Thought Reasoners
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLMLRM
244
369
0
06 Oct 2022
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
Edoardo Ponti
Goran Glavaš
Olga Majewska
Qianchu Liu
Ivan Vulić
Anna Korhonen
LRM
84
326
0
01 May 2020
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
  Cross-lingual Generalization
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
Junjie Hu
Sebastian Ruder
Aditya Siddhant
Graham Neubig
Orhan Firat
Melvin Johnson
ELM
195
975
0
24 Mar 2020
1