Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models
Thao Nguyen
Yang Li
O. Yu. Golovneva
Luke Zettlemoyer
Sewoong Oh
Ludwig Schmidt
Xian Li
OnRL
149
0
0
05 Jun 2025
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
Chih-Kai Yang
Neo Ho
Yi-Jyun Lee
Hung-yi Lee
AuLLM
97
0
0
05 Jun 2025
SoK: Are Watermarks in LLMs Ready for Deployment?
Kieu Dang
Phung Lai
Nhathai Phan
Yelong Shen
Ruoming Jin
Abdallah Khreishah
My T. Thai
37
0
0
05 Jun 2025
Truly Self-Improving Agents Require Intrinsic Metacognitive Learning
Tennison Liu
M. Schaar
AIFin
LRM
126
0
0
05 Jun 2025
Demonstrations of Integrity Attacks in Multi-Agent Systems
Can Zheng
Yuhan Cao
Xiaoning Dong
Tianxing He
LLMAG
AAML
94
0
0
05 Jun 2025
Revisiting Test-Time Scaling: A Survey and a Diversity-Aware Method for Efficient Reasoning
Ho-Lam Chung
Teng-Yun Hsiao
Hsiao-Ying Huang
Chunerh Cho
Jian-Ren Lin
Zhang Ziwei
Yun-Nung Chen
LRM
107
0
0
05 Jun 2025
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
Yuru Jiang
Wenxuan Ding
Shangbin Feng
Greg Durrett
Yulia Tsvetkov
88
0
0
05 Jun 2025
Quantifying Cross-Modality Memorization in Vision-Language Models
Yuxin Wen
Yangsibo Huang
Tom Goldstein
Ravi Kumar
Badih Ghazi
Chiyuan Zhang
115
0
0
05 Jun 2025
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers
Yutao Hou
Zeguan Xiao
Fei Yu
Yihan Jiang
Xuetao Wei
Hailiang Huang
Yun-Nung Chen
Guanhua Chen
LRM
111
0
0
05 Jun 2025
Inference-Time Hyper-Scaling with KV Cache Compression
Adrian Łańcucki
Konrad Staniszewski
Piotr Nawrot
Edoardo Ponti
74
0
0
05 Jun 2025
Dissecting Long Reasoning Models: An Empirical Study
Yongyu Mu
Jiali Zeng
Bei Li
Xinyan Guan
Fandong Meng
Jie Zhou
Tong Xiao
Jingbo Zhu
OffRL
LRM
107
0
0
05 Jun 2025
MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark
Junjie Xing
Yeye He
Mengyu Zhou
Haoyu Dong
Shi Han
Lingjiao Chen
Dongmei Zhang
S. Chaudhuri
H. V. Jagadish
LMTD
ELM
LRM
41
0
0
05 Jun 2025
Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models
Ruiqi Zhang
Changyi Xiao
Yixin Cao
LRM
99
0
0
04 Jun 2025
APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference Training
Jun Rao
Zepeng Lin
Xuebo Liu
Xiaopeng Ke
Lian Lian
Dong Jin
Shengjun Cheng
Jun Yu
Min Zhang
106
0
0
04 Jun 2025
Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences
Hadi Hosseini
Samarth Khanna
Ronak Singh
LRM
47
0
0
04 Jun 2025
Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration
Junqi Gao
Zhichang Guo
Dazhi Zhang
Dong Li
Runze Liu
Pengfei Li
Kai Tian
Biqing Qi
14
0
0
04 Jun 2025
Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis
Kejian Zhu
Shangqing Tu
Zhuoran Jin
Lei Hou
Juanzi Li
Jun Zhao
KELM
86
0
0
04 Jun 2025
RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing
Ruihan Jin
Pengpeng Shao
Zhengqi Wen
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Jianhua Tao
50
0
0
04 Jun 2025
Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models
Seungcheol Park
Jeongin Bae
Beomseok Kwon
Minjun Kim
Byeongwook Kim
S. Kwon
U. Kang
Dongsoo Lee
MQ
139
0
0
04 Jun 2025
Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs
Aleksey Kudelya
Alexander Shirnin
MU
60
0
0
04 Jun 2025
Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing
Shigeng Chen
Linhao Luo
Zhangchi Qiu
Yanan Cao
Carl Yang
Shirui Pan
KELM
102
0
0
04 Jun 2025
GEM: Empowering LLM for both Embedding Generation and Language Understanding
Caojin Zhang
Qiang Zhang
Ke Li
Sai Vidyaranya Nuthalapati
Benyu Zhang
Jason Liu
Serena Li
Lizhu Zhang
Xiangjun Fan
35
0
0
04 Jun 2025
From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models
Viktor Hangya
Fabian Küch
Darina Gold
ELM
59
0
0
04 Jun 2025
MANBench: Is Your Multimodal Model Smarter than Human?
Han Zhou
Qitong Xu
Yiheng Dong
Xin Yang
19
0
0
04 Jun 2025
Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights
Jakub Krajewski
Marcin Chochowski
Daniel Korzekwa
MoE
ALM
64
0
0
03 Jun 2025
Pruning General Large Language Models into Customized Expert Models
Yirao Zhao
Guizhen Chen
Kenji Kawaguchi
Lidong Bing
Wenxuan Zhang
76
0
0
03 Jun 2025
MASTER: Enhancing Large Language Model via Multi-Agent Simulated Teaching
Liang Yue
Yihong Tang
Kehai Chen
Jie Liu
Min Zhang
LLMAG
63
0
0
03 Jun 2025
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
Yan Gao
Massimo Roberto Scamarcia
Javier Fernandez-Marques
Mohammad Naseri
Chong Shen Ng
...
Junyan Wang
Zheyuan Liu
Daniel J. Beutel
Lingjuan Lyu
Nicholas D. Lane
ALM
52
1
0
03 Jun 2025
How do Pre-Trained Models Support Software Engineering? An Empirical Study in Hugging Face
Alexandra González
Xavier Franch
David Lo
Silverio Martínez-Fernández
VLM
58
0
0
03 Jun 2025
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
Shihan Dou
Ming Zhang
Chenhao Huang
Jiayi Chen
F. Chen
...
Wei Chengzhi
Lin Yan
Qi Zhang
Xuanjing Huang
Xuanjing Huang
ELM
82
0
0
03 Jun 2025
Beyond Text Compression: Evaluating Tokenizers Across Scales
Jonas F. Lotz
António V. Lopes
Stephan Peitz
Hendra Setiawan
Leonardo Emili
57
0
0
03 Jun 2025
KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG
Yongjian Li
HaoCheng Chu
Yukun Yan
Zhenghao Liu
S. Yu
Zheni Zeng
Ruobing Wang
Sen Song
Zhiyuan Liu
Maosong Sun
47
0
0
03 Jun 2025
Understanding Gender Bias in AI-Generated Product Descriptions
Markelle Kelly
Mohammad Tahaei
Padhraic Smyth
Lauren Wilcox
27
0
0
03 Jun 2025
Adaptive Graph Pruning for Multi-Agent Communication
Boyi Li
Zhonghan Zhao
Der-Horng Lee
Gaoang Wang
LLMAG
45
0
0
03 Jun 2025
Understanding the Impact of Sampling Quality in Direct Preference Optimization
Kyung Rok Kim
Yumo Bai
Chonghuan Wang
Guanting Chen
22
0
0
03 Jun 2025
TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network
Guangxin He
Yuan Cao
Yutong He
Tianyi Bai
Kun Yuan
Binhang Yuan
MQ
57
0
0
02 Jun 2025
Data Pruning by Information Maximization
Haoru Tan
Sitong Wu
Wei Huang
Shizhen Zhao
Xiaojuan Qi
61
1
0
02 Jun 2025
An Empirical Study of Group Conformity in Multi-Agent Systems
Min Choi
Keonwoo Kim
Sungwon Chae
Sangyeob Baek
LLMAG
AI4CE
61
0
0
02 Jun 2025
VM14K: First Vietnamese Medical Benchmark
T. Nguyen
Duc Duy Nguyen
Minh Dang
Thai Dao
L. T. Nguyen
Quan H. Nguyen
D. Q. Nguyen
Kien Tran
M. Tran
ELM
59
0
0
02 Jun 2025
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
Yifan Hao
Xingyuan Pan
Hanning Zhang
Chenlu Ye
Boyao Wang
Tong Zhang
100
0
0
02 Jun 2025
From Guidelines to Practice: A New Paradigm for Arabic Language Model Evaluation
Serry Sibaee
Omer Nacar
Adel Ammar
Yasser Habashi
Abdulrahman S. Al-Batati
W. Boulila
ELM
52
0
0
02 Jun 2025
Multilingual Definition Modeling
Edison Marrese-Taylor
Erica K. Shimomoto
Alfredo Solano
Enrique Reid
59
0
0
02 Jun 2025
Human-Centric Evaluation for Foundation Models
Yijin Guo
Kaiyuan Ji
Xiaorong Zhu
Junying Wang
Farong Wen
Chunyi Li
Zicheng Zhang
Guangtao Zhai
ALM
ELM
56
0
0
02 Jun 2025
Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and Analysis
J. Mok
Ik-hwan Kim
Sangkwon Park
Sungroh Yoon
58
0
0
02 Jun 2025
DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
Jennifer Chen
Aidar Myrzakhan
Yaxin Luo
Hassaan Muhammad Khan
Sondos Mahmoud Bsharat
Zhiqiang Shen
VLM
48
0
0
02 Jun 2025
T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning
Yanjun Fu
Faisal Hamman
Sanghamitra Dutta
ALM
71
0
0
02 Jun 2025
Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning
Changsheng Wang
Yihua Zhang
Jinghan Jia
Parikshit Ram
Dennis L. Wei
Yuguang Yao
Soumyadeep Pal
Nathalie Baracaldo
Sijia Liu
MU
67
0
0
02 Jun 2025
Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D
Artemis Panagopoulou
Le Xue
Honglu Zhou
Silvio Savarese
Ran Xu
Caiming Xiong
Chris Callison-Burch
Mark Yatskar
Juan Carlos Niebles
50
0
0
02 Jun 2025
Is Extending Modality The Right Path Towards Omni-Modality?
Tinghui Zhu
Kai Zhang
Muhao Chen
Yu Su
VLM
54
0
0
02 Jun 2025
ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding
Junliang Ye
Zhengyi Wang
Ruowen Zhao
Shenghao Xie
Jun Zhu
54
0
0
02 Jun 2025
Previous
1
2
3
4
5
6
...
67
68
69
Next