Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models
Xuanqi Gao
Siyi Xie
Juan Zhai
Shqing Ma
Chao Shen
ELM
119
0
0
22 May 2025
LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead
Yifan Zhang
Xinkui Zhao
Zuxin Wang
Guanjie Cheng
Yueshen Xu
Shuiguang Deng
Yuxiang Cai
93
0
0
22 May 2025
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering
Bowen Jiang
Runchuan Zhu
Jiang Wu
Zinco Jiang
Yifan He
...
Haote Yang
Songyang Zhang
Dahua Lin
Lijun Wu
Conghui He
ELM
54
0
0
22 May 2025
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs
Amr Hegazy
Mostafa Elhoushi
Amr Alanwar
LLMSV
58
0
0
22 May 2025
Finetuning-Activated Backdoors in LLMs
Thibaud Gloaguen
Mark Vero
Robin Staab
Martin Vechev
AAML
202
0
0
22 May 2025
INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling
Haochen Shi
Tianshi Zheng
Weiqi Wang
Baixuan Xu
Chunyang Li
Chunkit Chan
Tao Fan
Yangqiu Song
Qiang Yang
95
1
0
22 May 2025
SPaRC: A Spatial Pathfinding Reasoning Challenge
Lars Benedikt Kaesberg
Jan Philip Wahle
Terry Ruas
Bela Gipp
LRM
73
0
0
22 May 2025
URLs Help, Topics Guide: Understanding Metadata Utility in LLM Training
Dongyang Fan
Vinko Sabolčec
Martin Jaggi
59
0
0
22 May 2025
NAN: A Training-Free Solution to Coefficient Estimation in Model Merging
Chongjie Si
Kangtao Lv
Jingjing Jiang
Yadao Wang
Yongwei Wang
Xiaokang Yang
Wenbo Su
Bo Zheng
Wei Shen
MoMe
52
0
0
22 May 2025
Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability
Punya Syon Pandey
Samuel Simko
Kellin Pelrine
Zhijing Jin
AAML
52
0
0
22 May 2025
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs
Meng-Hao Guo
Xuanyu Chu
Qianrui Yang
Zhe-Han Mo
Yiqing Shen
...
Kiyohiro Nakayama
Zhengyang Geng
Houwen Peng
Han Hu
Shi-Min Hu
LRM
197
0
0
22 May 2025
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs
Rui Ye
Xiangrui Liu
Qimin Wu
Xianghe Pang
Zhenfei Yin
Lei Bai
Siheng Chen
LLMAG
81
0
0
22 May 2025
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios
Bin Xu
Yu Bai
Huashan Sun
Yiguan Lin
Siming Liu
Xinyue Liang
Yaolin Li
Yang Gao
Heyan Huang
AI4Ed
ELM
212
0
0
22 May 2025
Robust LLM Fingerprinting via Domain-Specific Watermarks
Thibaud Gloaguen
Robin Staab
Nikola Jovanović
Martin Vechev
WaLM
114
0
0
22 May 2025
MPL: Multiple Programming Languages with Large Language Models for Information Extraction
Bo Li
Gexiang Fang
Wei Ye
Zhenghua Xu
Jinglei Zhang
Hao Cheng
Shikun Zhang
54
0
0
22 May 2025
Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs
Kangda Wei
Hasnat Md Abdullah
Ruihong Huang
74
0
0
22 May 2025
MixAT: Combining Continuous and Discrete Adversarial Training for LLMs
Csaba Dékány
Stefan Balauca
Robin Staab
Dimitar I. Dimitrov
Martin Vechev
AAML
55
0
0
22 May 2025
Shape it Up! Restoring LLM Safety during Finetuning
ShengYun Peng
Pin-Yu Chen
Jianfeng Chi
Seongmin Lee
Duen Horng Chau
66
0
0
22 May 2025
CHART-6: Human-Centered Evaluation of Data Visualization Understanding in Vision-Language Models
Arnav Verma
Kushin Mukherjee
Christopher Potts
Elisa Kreiss
Judith E. Fan
34
0
0
22 May 2025
MPO: Multilingual Safety Alignment via Reward Gap Optimization
Weixiang Zhao
Yulin Hu
Yang Deng
Tongtong Wu
Wenxuan Zhang
...
An Zhang
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
100
2
0
22 May 2025
From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs
Muhammad Farid Adilazuarda
Chen Cecilia Liu
Iryna Gurevych
Alham Fikri Aji
219
0
0
22 May 2025
ReCopilot: Reverse Engineering Copilot in Binary Analysis
Guoqiang Chen
Huiqi Sun
Daguang Liu
Zhiqi Wang
Qiang Wang
Bin Yin
Lu Liu
Lingyun Ying
27
0
0
22 May 2025
A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP
Issey Sukeda
Takuro Fujii
Kosei Buma
Shunsuke Sasaki
Shinnosuke Ono
ELM
78
1
0
22 May 2025
Transfer of Structural Knowledge from Synthetic Languages
Mikhail Budnikov
Ivan Yamshchikov
68
0
0
21 May 2025
Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack
Silvia Cappelletti
Tobia Poppi
Samuele Poppi
Zheng-Xin Yong
Diego Garcia-Olano
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
KELM
AAML
59
0
0
21 May 2025
TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games
Yuan Yuan
Muyu He
Muhammad Adil Shahid
Jiani Huang
Ziyang Li
Li Zhang
LRM
55
0
0
21 May 2025
LyapLock: Bounded Knowledge Preservation in Sequential Large Language Model Editing
Peng Wang
Biyu Zhou
Xuehai Tang
Jizhong Han
Songlin Hu
KELM
126
0
0
21 May 2025
KaFT: Knowledge-aware Fine-tuning for Boosting LLMs' Domain-specific Question-Answering Performance
Qihuang Zhong
Liang Ding
Xiantao Cai
Juhua Liu
Bo Du
Dacheng Tao
100
0
0
21 May 2025
Cost-aware LLM-based Online Dataset Annotation
Eray Can Elumar
Cem Tekin
Osman Yagan
93
0
0
21 May 2025
MAPS: A Multilingual Benchmark for Global Agent Performance and Security
Omer Hofman
Oren Rachmil
Shamik Bose
Vikas Pahuja
Jonathan Brokman
Toshiya Shimizu
Trisha Starostina
Kelly Marchisio
Seraphina Goldfarb-Tarrant
Roman Vainshtein
50
0
0
21 May 2025
dKV-Cache: The Cache for Diffusion Language Models
Xinyin Ma
Runpeng Yu
Gongfan Fang
Xinchao Wang
DiffM
103
3
0
21 May 2025
VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction Models
Heyang Liu
Yuhao Wang
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Yanfeng Wang
Yu Wang
AuLLM
72
0
0
21 May 2025
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval
Taiye Chen
Zeming Wei
Ang Li
Yisen Wang
AAML
69
2
0
21 May 2025
Likelihood Variance as Text Importance for Resampling Texts to Map Language Models
Momose Oyama
Ryo Kishino
Hiroaki Yamagiwa
Hidetoshi Shimodaira
VLM
48
0
0
21 May 2025
ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy
Gengyang Li
Yifeng Gao
Yuming Li
Yunfang Wu
ReLM
OffRL
LRM
128
3
0
21 May 2025
Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model
Mehrdad Ghassabi
Pedram Rostami
Hamidreza Baradaran Kashani
Amirhossein Poursina
Zahra Kazemi
Milad Tavakoli
LM&MA
191
0
0
21 May 2025
From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
David Dinucu-Jianu
Jakub Macina
Nico Daheim
Ido Hakimi
Iryna Gurevych
Mrinmaya Sachan
KELM
LRM
98
0
0
21 May 2025
lmgame-Bench: How Good are LLMs at Playing Games?
Lanxiang Hu
Mingjia Huo
Yu Zhang
Haoyang Yu
Eric P. Xing
Ion Stoica
Tajana Rosing
Haojian Jin
Hao Zhang
136
1
0
21 May 2025
The Effects of Data Augmentation on Confidence Estimation for LLMs
Rui Wang
Renyu Zhu
Minmin Lin
R. Wu
Tangjie Lv
Changjie Fan
Haobo Wang
21
0
0
21 May 2025
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
Zhexin Zhang
Yuhao Sun
Junxiao Yang
Shiyao Cui
Hongning Wang
Minlie Huang
AAML
87
0
0
21 May 2025
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners
Weixiang Zhao
Jiahe Guo
Yang Deng
Tongtong Wu
Wenxuan Zhang
...
Yanyan Zhao
Wanxiang Che
Bing Qin
Tat-Seng Chua
Ting Liu
LRM
142
0
0
21 May 2025
An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations
Yiming Huang
Biquan Bie
Zuqiu Na
Weilin Ruan
Songxin Lei
Yutao Yue
Xinlei He
78
0
0
21 May 2025
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
Zihao Li
Xu Wang
Yuzhe Yang
Ziyu Yao
Haoyi Xiong
Jundong Li
LLMSV
LRM
122
3
0
21 May 2025
Revealing Language Model Trajectories via Kullback-Leibler Divergence
Ryo Kishino
Yusuke Takase
Momose Oyama
Hiroaki Yamagiwa
Hidetoshi Shimodaira
92
0
0
21 May 2025
Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory
Hongli Zhou
Hui Huang
Ziqing Zhao
Lvyuan Han
Huicheng Wang
...
Jian Dong
Bing Xu
Conghui Zhu
Hailong Cao
Tiejun Zhao
ALM
33
0
0
21 May 2025
Fragments to Facts: Partial-Information Fragment Inference from LLMs
Lucas Rosenblatt
Bin Han
Robert Wolfe
Bill Howe
AAML
61
0
0
20 May 2025
sudoLLM : On Multi-role Alignment of Language Models
Soumadeep Saha
Akshay Chaturvedi
Joy Mahapatra
Utpal Garain
45
0
0
20 May 2025
Incorporating Token Usage into Prompting Strategy Evaluation
Chris Sypherd
Sergei Petrov
Sonny George
Vaishak Belle
LLMAG
54
0
0
20 May 2025
Enhancing LLMs via High-Knowledge Data Selection
Feiyu Duan
Xuemiao Zhang
Sirui Wang
Haoran Que
Yuqi Liu
Wenge Rong
Xunliang Cai
237
0
0
20 May 2025
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering
Jennifer D'Souza
Hamed Babaei Giglou
Quentin Münch
ELM
109
0
0
20 May 2025
Previous
1
2
3
...
6
7
8
...
67
68
69
Next