Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
Junxuan Zhang
Flood Sung
Zhiyong Yang
Yang Gao
Chongjie Zhang
LLMAG
126
0
0
28 Apr 2025
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
Nan Lu
Ethan X. Fang
Junwei Lu
420
0
0
27 Apr 2025
Bi-directional Model Cascading with Proxy Confidence
David Warren
Mark Dras
83
0
0
27 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
253
7
0
26 Apr 2025
When2Call: When (not) to Call Tools
Hayley Ross
Ameya Sunil Mahabaleshwarkar
Yoshi Suhara
141
1
0
26 Apr 2025
Efficient Single-Pass Training for Multi-Turn Reasoning
Ritesh Goru
Shanay Mehta
Prateek Jain
LRM
63
0
0
25 Apr 2025
Stabilizing Reasoning in Medical LLMs with Continued Pretraining and Reasoning Preference Optimization
Wataru Kawakami
Keita Suzuki
Junichiro Iwasawa
LRM
138
0
0
25 Apr 2025
Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
Thomas F Burns
Letitia Parcalabescu
Stephan Wäldchen
Michael Barlow
Gregor Ziegltrum
Volker Stampa
Bastian Harren
Björn Deiseroth
SyDa
138
0
0
24 Apr 2025
Auditing the Ethical Logic of Generative AI Models
W. Russell Neuman
Chad Coleman
Ali Dasdan
Safinah Ali
Manan Shah
ELM
LRM
121
1
0
24 Apr 2025
FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation
Yulia Otmakhova
Hung Thinh Truong
Rahmad Mahendra
Zenan Zhai
Rongxin Zhu
Daniel Beck
Jey Han Lau
ELM
159
0
0
24 Apr 2025
Information Leakage of Sentence Embeddings via Generative Embedding Inversion Attacks
Antonios Tragoudaras
Theofanis Aslanidis
Emmanouil Georgios Lionis
Marina Orozco González
Panagiotis Eustratiadis
MIACV
SILM
86
0
0
23 Apr 2025
UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models
Yu Zheng
Longyi Liu
Yuming Lin
Jie Feng
Guozhen Zhang
Depeng Jin
Yong Li
ELM
132
1
0
23 Apr 2025
Param
Δ
Δ
Δ
for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
139
0
0
23 Apr 2025
The Rise of Small Language Models in Healthcare: A Comprehensive Survey
Muskan Garg
Shaina Raza
Shebuti Rayana
Xingyi Liu
Sunghwan Sohn
LM&MA
AILaw
164
2
0
23 Apr 2025
Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation
Luca Moroni
Giovanni Puccetti
Pere-Lluís Huguet Cabot
Andrei Stefan Bejgu
Edoardo Barba
Alessio Miaschi
F. Dell’Orletta
Andrea Esuli
Roberto Navigli
81
2
0
23 Apr 2025
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining
Fengze Liu
Weidong Zhou
Binbin Liu
Zhimiao Yu
Yifan Zhang
...
Yifeng Yu
Bingni Zhang
Xiaohuan Zhou
Taifeng Wang
Yong Cao
134
1
0
23 Apr 2025
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Shi Qiu
Shaoyang Guo
Zhuo-Yang Song
Yizhou Sun
Zeyu Cai
...
Ming-xing Luo
Muhan Zhang
Yaodong Yang
Muhan Zhang
Hua Xing Zhu
AIMat
LRM
132
9
0
22 Apr 2025
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Yuxin Jiang
Yijiao Wang
Chuhan Wu
Xinyi Dai
Yan Xu
...
Yucheng Wang
Xin Jiang
Lifeng Shang
Ruiming Tang
Wenjie Wang
138
0
0
22 Apr 2025
Compass-V2 Technical Report
Sophia Maria
MoE
LRM
113
0
0
22 Apr 2025
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
146
1
0
22 Apr 2025
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
Minghao Wu
Weixuan Wang
Sinuo Liu
Huifeng Yin
Xintong Wang
Yu Zhao
Chenyang Lyu
Longyue Wang
Weihua Luo
Kaifu Zhang
ELM
154
5
0
22 Apr 2025
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Jasper Götting
Pedro Medeiros
Jon G Sanders
Nathaniel Li
Long Phan
Karam Elabd
Lennart Justen
Dan Hendrycks
Seth Donoughe
ELM
139
5
0
21 Apr 2025
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception
Yuan-Hong Liao
Sven Elflein
Liu He
Laura Leal-Taixe
Yejin Choi
Sanja Fidler
David Acuna
ReLM
LRM
VLM
465
2
0
21 Apr 2025
Med-CoDE: Medical Critique based Disagreement Evaluation Framework
Mohit Gupta
Akiko Aizawa
R. Shah
LM&MA
ELM
56
1
0
21 Apr 2025
A Self-Improving Coding Agent
Maxime Robeyns
Martin Szummer
Laurence Aitchison
LLMAG
136
1
0
21 Apr 2025
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
477
0
0
21 Apr 2025
ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data
Tong Chen
Faeze Brahman
Jiacheng Liu
Niloofar Mireshghallah
Weijia Shi
Pang Wei Koh
Luke Zettlemoyer
Hannaneh Hajishirzi
93
1
0
20 Apr 2025
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
Yongbin Li
Zhiting Fan
Ruizhe Chen
Xiaotang Gai
Luqi Gong
Yan Zhang
Zuozhu Liu
LLMSV
97
6
0
20 Apr 2025
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines
Reya Vir
Shreya Shankar
Harrison Chase
Will Fu-Hinthorn
Aditya G. Parameswaran
AI4TS
85
0
0
20 Apr 2025
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models
Xinlin Zhuang
Jiahui Peng
Ren Ma
Yucheng Wang
Tianyi Bai
Xingjian Wei
Jiantao Qiu
Chi Zhang
Ying Qian
Conghui He
151
0
0
19 Apr 2025
Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion
Yejun Yoon
Jaeyoon Jung
Seunghyun Yoon
Kunwoo Park
63
0
0
19 Apr 2025
MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
Yicheng Chen
Yining Li
Kai Hu
Zerun Ma
Haochen Ye
Kai Chen
70
2
0
18 Apr 2025
STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings
Saksham Rastogi
Pratyush Maini
Danish Pruthi
169
0
0
18 Apr 2025
A mean teacher algorithm for unlearning of language models
Yegor Klochkov
MU
368
0
0
18 Apr 2025
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Jiliang Ni
Jiachen Pu
Zhongyi Yang
Kun Zhou
Hui Wang
Xiaoliang Xiao
Dakui Wang
Xin Li
Jingfeng Luo
Conggang Hu
129
0
0
18 Apr 2025
Continual Pre-Training is (not) What You Need in Domain Adaption
Pin-Er Chen
Da-Chen Lian
S. Hsieh
Sieh-Chuen Huang
Hsuan-Lei Shao
...
Yang-Hsien Lin
Zih-Ching Chen
Cheng-Kuang
Eddie TC Huang
Simon See
CLL
AILaw
147
1
0
18 Apr 2025
CPG-EVAL: A Multi-Tiered Benchmark for Evaluating the Chinese Pedagogical Grammar Competence of Large Language Models
Dong Wang
ELM
57
0
0
17 Apr 2025
Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation
Linda He
Jue Wang
Maurice Weber
Shang Zhu
Ben Athiwaratkun
Ce Zhang
SyDa
LRM
78
1
0
17 Apr 2025
MAIN: Mutual Alignment Is Necessary for instruction tuning
Fanyi Yang
Jianfeng Liu
Xinsong Zhang
Haoyu Liu
Xixin Cao
Yuefeng Zhan
H. Sun
Weiwei Deng
Feng Sun
Qi Zhang
ALM
58
0
0
17 Apr 2025
GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning
Liangyu Xu
Yingxiu Zhao
Jiadong Wang
Yingyao Wang
Bu Pi
...
Jihao Gu
Xinfeng Li
Xiaoyong Zhu
Jun Song
Jian Xu
LRM
504
6
0
17 Apr 2025
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Shizhe Diao
Yu Yang
Y. Fu
Xin Dong
Jane Polak Scowcroft
...
Hongxu Yin
M. Patwary
Yingyan
Jan Kautz
Pavlo Molchanov
122
2
0
17 Apr 2025
SHA256 at SemEval-2025 Task 4: Selective Amnesia -- Constrained Unlearning for Large Language Models via Knowledge Isolation
Saransh Agrawal
Kuan-Hao Huang
MU
KELM
101
0
0
17 Apr 2025
Information Gain-Guided Causal Intervention for Autonomous Debiasing Large Language Models
Zhouhao Sun
Xiao Ding
Li Du
Yunpeng Xu
Yixuan Ma
Yang Zhao
Bing Qin
Ting Liu
82
0
0
17 Apr 2025
ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition
Haidar Khan
H. A. Alyahya
Yazeed Alnumay
M Saiful Bari
B. Yener
ELM
LRM
92
0
0
17 Apr 2025
Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models
Yuanbo Tang
Yan Tang
N. Zhang
Meixuan Chen
Yang Li
MoE
135
1
0
16 Apr 2025
Gauging Overprecision in LLMs: An Empirical Study
Adil Bahaj
Hamed Rahimi
Mohamed Chetouani
Mounir Ghogho
127
0
0
16 Apr 2025
Activated LoRA: Fine-tuned LLMs for Intrinsics
Kristjan Greenewald
Luis A. Lastras
Thomas Parnell
Vraj Shah
Lucian Popa
Giulio Zizzo
Chulaka Gunasekara
Ambrish Rawat
David D. Cox
106
0
0
16 Apr 2025
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
Zhongxi Qiu
Zhang Zhang
Yan Hu
Heng Li
Jiang-Dong Liu
OffRL
454
1
0
16 Apr 2025
The Digital Cybersecurity Expert: How Far Have We Come?
Dawei Wang
Geng Zhou
Xianglong Li
Yu Bai
Li Chen
Ting Qin
Jian Sun
Didong Li
ELM
113
0
0
16 Apr 2025
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation
Shizhan Cai
Liang Ding
Dacheng Tao
WaLM
93
0
0
16 Apr 2025
Previous
1
2
3
...
10
11
12
...
67
68
69
Next