Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Mutual-Taught for Co-adapting Policy and Reward Models
Tianyuan Shi
Canbin Huang
Fanqi Wan
Longguang Zhong
Ziyi Yang
Weizhou Shen
Xiaojun Quan
Ming Yan
36
0
0
17 May 2025
LLM-BABYBENCH: Understanding and Evaluating Grounded Planning and Reasoning in LLMs
Omar Choukrani
Idriss Malek
Daniil Orel
Zhuohan Xie
Zangir Iklassov
Martin Takáč
Salem Lahlou
LLMAG
ELM
LRM
79
0
0
17 May 2025
ReviewInstruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models
Jian Wu
Cong Wang
TianHuang Su
Jun Yang
Haozhi Lin
...
Steve Yang
BinQing Pan
Zehan Li
Ni Yang
ZhenYu Yang
ALM
64
0
0
16 May 2025
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
Feijiang Han
Xiaodong Yu
Jianheng Tang
Lyle Ungar
102
0
0
16 May 2025
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models
Xiaomin Li
Mingye Gao
Yuexing Hao
Taoran Li
Guangya Wan
Zihan Wang
Yijun Wang
LM&MA
ELM
AI4MH
141
0
0
16 May 2025
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning Evaluation
Khanh-Tung Tran
Barry O'Sullivan
Hoang D. Nguyen
ELM
LRM
115
0
0
16 May 2025
InfiJanice: Joint Analysis and In-situ Correction Engine for Quantization-Induced Math Degradation in Large Language Models
Zhen Li
Yupeng Su
Songmiao Wang
Runming Yang
C. Xie
...
Ming Li
Jiannong Cao
Yuan Xie
Ngai Wong
Hongxia Yang
MQ
112
0
0
16 May 2025
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Ziyi Wang
Jiaqi Zeng
Olivier Delalleau
Hoo-Chang Shin
Felipe Soares
Alexander Bukharin
Ellie Evans
Yi Dong
Oleksii Kuchaiev
101
2
0
16 May 2025
PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning
Falong Fan
Xi Li
LLMAG
AAML
90
0
0
16 May 2025
GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge Subtraction
Mohammadtaha Bagherifard
Sahar Rajabi
Ali Edalat
Yadollah Yaghoobzadeh
KELM
69
0
0
16 May 2025
A Systematic Analysis of Base Model Choice for Reward Modeling
Kian Ahrabian
Pegah Jandaghi
Negar Mokhberian
Sai Praneeth Karimireddy
Jay Pujara
134
0
0
16 May 2025
Parallel Scaling Law for Language Models
Mouxiang Chen
Binyuan Hui
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
Jianling Sun
Junyang Lin
Zhongxin Liu
MoE
LRM
91
2
0
15 May 2025
Interpretable Risk Mitigation in LLM Agent Systems
Jan Chojnacki
LLMAG
161
1
0
15 May 2025
Evaluations at Work: Measuring the Capabilities of GenAI in Use
Brandon Lepine
Gawesha Weerantunga
Juho Kim
Pamela Mishkin
Matthew Beane
76
0
0
15 May 2025
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
Jean-Philippe Corbeil
Amin Dada
Jean-Michel Attendu
Asma Ben Abacha
Alessandro Sordoni
Lucas Caccia
François Beaulieu
Thomas Lin
Jens Kleesiek
Paul Vozila
LM&MA
113
0
0
15 May 2025
Evaluating Model Explanations without Ground Truth
Kaivalya Rawal
Zihao Fu
Eoin Delaney
Chris Russell
FAtt
XAI
137
0
0
15 May 2025
AI-enhanced semantic feature norms for 786 concepts
Siddharth Suresh
Kushin Mukherjee
Tyler Giallanza
Xizheng Yu
Mia Patil
Jonathan Cohen
Timothy T. Rogers
67
0
0
15 May 2025
Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning
Yoichi Ishibashi
Taro Yano
Masafumi Oyamada
SyDa
LRM
108
2
0
15 May 2025
On the Evaluation of Engineering Artificial General Intelligence
Sandeep Neema
Susmit Jha
Adam Nagel
Ethan Lew
Chandrasekar Sureshkumar
Aleksa Gordic
Chase Shimmin
Hieu Nguygen
Paul Eremenko
ELM
55
0
0
15 May 2025
Qwen3 Technical Report
An Yang
A. Li
Baosong Yang
Beichen Zhang
Binyuan Hui
...
Zekun Wang
Zeyu Cui
Zhenru Zhang
Zhenhong Zhou
Zihan Qiu
LLMAG
OSLM
LRM
116
100
0
14 May 2025
Layered Unlearning for Adversarial Relearning
Timothy Qian
Vinith Suriyakumar
Ashia Wilson
Dylan Hadfield-Menell
MU
91
1
0
14 May 2025
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"
Pedro M. P. Curvo
Mara Dragomir
Salvador Torpes
Mohammadmahdi Rahimi
LLMAG
111
0
0
14 May 2025
Analog Foundation Models
Julian Büchel
Iason Chalas
Giovanni Acampa
An Chen
Omobayode Fagbohungbe
Sidney Tsai
Kaoutar El Maghraoui
Manuel Le Gallo
Abbas Rahimi
Abu Sebastian
MQ
115
0
0
14 May 2025
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
Abdullah Mushtaq
Imran Taj
Rafay Naeem
Ibrahim Ghaznavi
Junaid Qadir
63
0
0
14 May 2025
Evaluating LLM Metrics Through Real-World Capabilities
Justin K Miller
Wenjia Tang
ELM
ALM
96
1
0
13 May 2025
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models
Xiaoyang Chen
Xinan Dai
Yu Du
Qian Feng
Naixu Guo
...
Jinfeng Xu
Yiyang Yu
Zhiyong Yang
Hongji Zha
Ruichong Zhang
LRM
68
1
0
13 May 2025
Small but Significant: On the Promise of Small Language Models for Accessible AIED
Yumou Wei
Paulo Carvalho
John Stamper
SyDa
101
1
0
13 May 2025
SEM: Reinforcement Learning for Search-Efficient Large Language Models
Zeyang Sha
Shiwen Cui
Weiqiang Wang
KELM
OffRL
LRM
84
0
0
12 May 2025
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
Junjie Ye
Caishuang Huang
Zhaoyu Chen
Wenjie Fu
Chenyuan Yang
...
Tao Gui
Qi Zhang
Zhongchao Shi
Jianping Fan
Xuanjing Huang
ALM
96
0
0
12 May 2025
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Kai Hua
Steven Wu
Ge Zhang
Ke Shen
LRM
85
0
0
12 May 2025
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Xiaomi LLM-Core Team
Bingquan Xia
Bo Shen
Cici
Dawei Zhu
...
Yun Wang
Yue Yu
Zhenru Lin
Zhichao Song
Zihao Yue
MoE
ReLM
LRM
AI4CE
171
7
0
12 May 2025
Assessing the Chemical Intelligence of Large Language Models
Nicholas T. Runcie
Charlotte M. Deane
Fergus Imrie
ELM
LRM
112
0
0
12 May 2025
Benchmarking Retrieval-Augmented Generation for Chemistry
Xianrui Zhong
Bowen Jin
Siru Ouyang
Yanzhen Shen
Qiao Jin
Yin Fang
Zhiyong Lu
Jiawei Han
3DV
89
2
0
12 May 2025
DeltaEdit: Enhancing Sequential Editing in Large Language Models by Controlling Superimposed Noise
Ding Cao
Yuchen Cai
Rongxi Guo
Xiaoxiao He
Guiquan Liu
KELM
168
0
0
12 May 2025
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Rei Higuchi
Taiji Suzuki
124
1
0
12 May 2025
Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity
Guang Yan
Yuhui Zhang
Zimu Guo
Lutan Zhao
Xiaojun Chen
Chen Wang
Wenhao Wang
Dan Meng
Rui Hou
76
0
0
12 May 2025
Circuit Partitioning Using Large Language Models for Quantum Compilation and Simulations
Pranav Sinha
Sumit Kumar Jha
Sunny Raj
45
0
0
12 May 2025
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning
Xiaotian Lin
Yanlin Qi
Yizhang Zhu
Themis Palpanas
Chengliang Chai
Nan Tang
Yuyu Luo
88
2
0
12 May 2025
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
Zhehao Zhang
Weijie Xu
Fanyou Wu
Chandan K. Reddy
118
2
0
12 May 2025
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
Jinuk Kim
Marwa El Halabi
W. Park
Clemens JS Schaefer
Deokjae Lee
Yeonhong Park
Jae W. Lee
Hyun Oh Song
MQ
148
1
0
11 May 2025
Prompt Engineering: How Prompt Vocabulary affects Domain Knowledge
Dimitri Schreiter
94
0
0
10 May 2025
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
Patrick Blumenberg
Thomas Graave
Tim Fingscheidt
MQ
102
0
0
10 May 2025
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
HamidReza Imani
Jiaxin Peng
Peiman Mohseni
Abdolah Amirany
Tarek A. El-Ghazawi
MoE
129
0
0
10 May 2025
xGen-small Technical Report
Erik Nijkamp
Bo Pang
Egor Pakhomov
Akash Gokul
Jin Qu
Silvio Savarese
Yingbo Zhou
Caiming Xiong
LLMAG
159
0
0
10 May 2025
Attention Is Not All You Need: The Importance of Feedforward Networks in Transformer Models
Isaac Gerber
66
1
0
10 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Zihan Qiu
Zhaoxiang Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
95
0
0
10 May 2025
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Joshua Harris
Fan Grayson
Felix Feldman
Timothy Laurence
Toby Nonnenmacher
...
Leo Loman
Selina Patel
Thomas Finnie
Samuel Collins
Michael Borowitz
AI4MH
LM&MA
ELM
141
0
0
09 May 2025
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Stefan Vasilev
Christian Herold
Baohao Liao
Seyyed Hadi Hashemi
Shahram Khadivi
Christof Monz
MU
490
0
0
09 May 2025
Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding
Takamitsu Omasa
Ryo Koshihara
Masumi Morishige
73
0
0
09 May 2025
LLMs Outperform Experts on Challenging Biology Benchmarks
Lennart Justen
ELM
77
1
0
09 May 2025
Previous
1
2
3
...
8
9
10
...
67
68
69
Next