Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Human-Centric Evaluation for Foundation Models
Yijin Guo
Kaiyuan Ji
Xiaorong Zhu
Junying Wang
Farong Wen
Chunyi Li
Zicheng Zhang
Guangtao Zhai
ALM
ELM
56
0
0
02 Jun 2025
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
Yifan Hao
Xingyuan Pan
Hanning Zhang
Chenlu Ye
Boyao Wang
Tong Zhang
100
0
0
02 Jun 2025
Not Every Token Needs Forgetting: Selective Unlearning to Limit Change in Utility in Large Language Model Unlearning
Yixin Wan
Anil Ramakrishna
Kai-Wei Chang
Volkan Cevher
Rahul Gupta
MU
CLL
29
0
0
01 Jun 2025
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
Thinh Pham
Nguyen Nguyen
Pratibha Zunjare
Weiyuan Chen
Yu-Min Tseng
Tu Vu
RALM
ReLM
ELM
ALM
LRM
76
0
0
01 Jun 2025
CC-Tuning: A Cross-Lingual Connection Mechanism for Improving Joint Multilingual Supervised Fine-Tuning
Yangfan Ye
Xiaocheng Feng
Zekun Yuan
Xiachong Feng
L. Qin
...
Yunfei Lu
Xiaohui Yan
Duyu Tang
Dandan Tu
Bing Qin
35
0
0
01 Jun 2025
Conformal Arbitrage: Risk-Controlled Balancing of Competing Objectives in Language Models
William Overman
Mohsen Bayati
27
0
0
01 Jun 2025
LoRA-BAM: Input Filtering for Fine-tuned LLMs via Boxed Abstraction Monitors over LoRA Layers
Changshun Wu
Tianyi Duan
Saddek Bensalem
Chih-Hong Cheng
39
0
0
01 Jun 2025
Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks
Yuntai Bao
Xuhong Zhang
Tianyu Du
Xinkui Zhao
Zhengwen Feng
Hao Peng
Jianwei Yin
HILM
56
0
0
01 Jun 2025
Evaluating the Unseen Capabilities: How Many Theorems Do LLMs Know?
Xiang Li
Jiayi Xin
Qi Long
Weijie J. Su
ELM
23
0
0
01 Jun 2025
SafeTy Reasoning Elicitation Alignment for Multi-Turn Dialogues
Martin Kuo
Jianyi Zhang
Aolin Ding
Louis DiValentin
Amin Hass
...
Bhavna Gopal
Maziyar Baran Pouyan
Changwei Liu
H. Li
Yiran Chen
AAML
28
0
0
31 May 2025
Enhancing Clinical Multiple-Choice Questions Benchmarks with Knowledge Graph Guided Distractor Generation
Running Yang
Wenlong Deng
Minghui Chen
Yuyin Zhou
Xiaoxiao Li
29
0
0
31 May 2025
Data Swarms: Optimizable Generation of Synthetic Evaluation Data
Shangbin Feng
Yike Wang
Weijia Shi
Yulia Tsvetkov
57
0
0
31 May 2025
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
Y. Fu
Yuanheng Zhu
Jiajun Chai
Guojun Yin
Wei Lin
Qichao Zhang
Dongbin Zhao
25
0
0
31 May 2025
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions
Weijie Xu
Shixian Cui
Xi Fang
Chi Xue
Stephanie Eckman
Chandan K. Reddy
ELM
33
0
0
31 May 2025
FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts
Xinyi Wang
Lirong Gao
Haobo Wang
Yiming Zhang
Junbo Zhao
MoE
40
0
0
31 May 2025
SkillVerse : Assessing and Enhancing LLMs with Tree Evaluation
Yufei Tian
Jiao Sun
Nanyun Peng
Zizhao Zhang
25
0
0
31 May 2025
FinS-Pilot: A Benchmark for Online Financial System
Feng Wang
Yiding Sun
Jiaxin Mao
Wei Xue
Danqing Xu
AIFin
20
0
0
31 May 2025
Model Unlearning via Sparse Autoencoder Subspace Guided Projections
Xu Wang
Zihao Li
Benyou Wang
Yan Hu
Difan Zou
MU
36
0
0
30 May 2025
LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text
Li yunhan
Wu gengshen
AILaw
ELM
ALM
20
0
0
30 May 2025
An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring
Sana Ebrahimi
Mohsen Dehghankar
Abolfazl Asudeh
41
0
0
30 May 2025
Recipes for Pre-training LLMs with MXFP8
Asit K. Mishra
Dusan Stosic
Simon Layton
MQ
17
0
0
30 May 2025
Stepsize anything: A unified learning rate schedule for budgeted-iteration training
Anda Tang
Yiming Dong
Yutao Zeng
zhou Xun
Zhouchen Lin
365
0
0
30 May 2025
PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models
Yinggan Xu
Yue Liu
Zhiqiang Gao
Changnan Peng
Di Luo
LRM
28
0
0
30 May 2025
Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
Jiayu Liu
Qing Zong
Weiqi Wang
Yangqiu Song
38
0
0
30 May 2025
Disentangling Language and Culture for Evaluating Multilingual Large Language Models
Jiahao Ying
Wei Tang
Yiran Zhao
Yixin Cao
Yu Rong
Wenxuan Zhang
ELM
18
0
0
30 May 2025
SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought
Guanghao Li
Wenhao Jiang
Mingfeng Chen
Yan Li
Hao Yu
Shuting Dong
Tao Ren
Ming Tang
Chun Yuan
ReLM
LRM
30
0
0
30 May 2025
Evaluating Gemini in an arena for learning
LearnLM Team Google
Abhinit Modi
Aditya Srikanth Veerubhotla
Aliya Rysbek
Andrea Huber
...
Theofilos Strinopoulos
Wei-Jen Ko
Yael Gold-Zamir
Yael Haramaty
Yannis Assael
ELM
30
0
0
30 May 2025
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
Neil He
Rishabh Anand
Hiren Madhu
Ali Maatouk
Smita Krishnaswamy
Leandros Tassiulas
Menglin Yang
Rex Ying
41
0
0
30 May 2025
Learning Safety Constraints for Large Language Models
Xin Chen
Yarden As
Andreas Krause
38
0
0
30 May 2025
Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Q. Xiao
Alan Ansell
Boqian Wu
Lu Yin
Mykola Pechenizkiy
Shiwei Liu
Decebal Constantin Mocanu
41
0
0
29 May 2025
Actor-Critic based Online Data Mixing For Language Model Pre-Training
Jing Ma
Chenhao Dang
Mingjie Liao
28
0
0
29 May 2025
A Mathematical Framework for AI-Human Integration in Work
Elisa Celis
Lingxiao Huang
Nisheeth K. Vishnoi
56
0
0
29 May 2025
AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora
Jiaxin Bai
Wei Fan
Qi Hu
Qing Zong
Chunyang Li
...
Leijie Wu
Yi Ji
Gong Zhang
Renhai Chen
Yangqiu Song
55
0
0
29 May 2025
Two Is Better Than One: Rotations Scale LoRAs
Hongcan Guo
Guoshun Nan
Yuan Yang
Diyang Zhang
Haotian Li
...
Yuhan Ran
Xinye Cao
Sicong Leng
Xiaofeng Tao
Xudong Jiang
38
0
0
29 May 2025
UAQFact: Evaluating Factual Knowledge Utilization of LLMs on Unanswerable Questions
Chuanyuan Tan
Wenbiao Shao
Hao Xiong
Tong Zhu
Zhenhua Liu
Kai Shi
Wenliang Chen
40
0
0
29 May 2025
Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration
Yilong Li
Chen Qian
Yu Xia
Ruijie Shi
Yufan Dang
...
Ye Tian
Xuantang Xiong
Lei Han
Zhiyuan Liu
Maosong Sun
LLMAG
76
0
0
29 May 2025
Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent Systems
Xu Shen
Yixin Liu
Yiwei Dai
Yili Wang
Rui Miao
Yue Tan
Shirui Pan
Xin Wang
56
0
0
29 May 2025
The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets
Shenzhe Zhu
Jiao Sun
Yi Nian
Tobin South
Alex Pentland
Jiaxin Pei
49
0
0
29 May 2025
From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs
Xuan Gong
Hanbo Huang
Shiyu Liang
37
0
0
29 May 2025
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
Ziyin Zhang
Jiahao Xu
Zhiwei He
Tian Liang
Qiuzhi Liu
...
Zhuosheng Zhang
Rui Wang
Zhaopeng Tu
Haitao Mi
Dong Yu
OffRL
LRM
73
1
0
29 May 2025
Scalable Complexity Control Facilitates Reasoning Ability of LLMs
Liangkai Hang
Junjie Yao
Zhiwei Bai
Tianyi Chen
Yang Chen
...
Feiyu Xiong
Y. Zhang
Weinan E
Hongkang Yang
Zhi-hai Xu
LRM
60
0
0
29 May 2025
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs
Yinong Oliver Wang
N. Sivakumar
Falaah Arif Khan
Rin Metcalf Susa
Adam Goliñski
Natalie Mackraz
B. Theobald
Luca Zappella
N. Apostoloff
31
0
0
29 May 2025
Evaluating the Sensitivity of LLMs to Prior Context
Robert Hankache
Kingsley Nketia Acheampong
Liang Song
Marek Brynda
Raad Khraishi
Greig A. Cowan
30
0
0
29 May 2025
LoLA: Low-Rank Linear Attention With Sparse Caching
Luke McDermott
Robert W. Heath Jr.
Rahul Parhi
RALM
53
0
0
29 May 2025
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
William Merrill
Shane Arora
Dirk Groeneveld
Hannaneh Hajishirzi
51
0
0
29 May 2025
Daunce: Data Attribution through Uncertainty Estimation
Xingyuan Pan
Chenlu Ye
Joseph Melkonian
Jiaqi W. Ma
Tong Zhang
TDI
UQCV
51
0
0
29 May 2025
Differential Information: An Information-Theoretic Perspective on Preference Optimization
Yunjae Won
Hyunji Lee
Hyeonbin Hwang
Minjoon Seo
27
0
0
29 May 2025
Revisiting Uncertainty Estimation and Calibration of Large Language Models
Linwei Tao
Yi-Fan Yeh
Minjing Dong
Tao Huang
Philip Torr
Chang Xu
27
0
0
29 May 2025
LLM Performance for Code Generation on Noisy Tasks
Radzim Sendyka
Christian Cabrera
Andrei Paleyes
Diana Robinson
Neil D. Lawrence
60
1
0
29 May 2025
Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert
Zhaokun Wang
Jinyu Guo
Jingwen Pu
Lingfeng Chen
Hongli Pu
Jie Ou.Libo Qin
Libo Qin
Wenhong Tian
AAML
37
0
0
29 May 2025
Previous
1
2
3
4
5
...
67
68
69
Next