Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets
Dannong Wang
Jaisal Patel
Daochen Zha
Steve Yang
Xiao-Yang Liu
31
0
0
26 May 2025
CP-Router: An Uncertainty-Aware Router Between LLM and LRM
Jiayuan Su
Fulin Lin
Zhaopeng Feng
Han Zheng
Teng Wang
Zhenyu Xiao
Xinlong Zhao
Zuozhu Liu
Lu Cheng
Hongwei Wang
54
0
0
26 May 2025
The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models
Shashata Sawmya
Micah Adler
Nir Shavit
MILM
31
0
0
26 May 2025
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking
Pengxiang Li
Shilin Yan
Joey Tsai
Renrui Zhang
Ruichuan An
Ziyu Guo
Xiaowei Gao
63
1
0
26 May 2025
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models
Hao Kang
Zichun Yu
Chenyan Xiong
MoE
76
0
0
26 May 2025
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
J. Yang
Dongfu Jiang
Lipeng He
Sherman Siu
Yuxuan Zhang
...
Yi Lu
Quy Duc Do
Ziyan Jiang
Ping Nie
Wenhu Chen
35
0
0
26 May 2025
Lifelong Safety Alignment for Language Models
Haoyu Wang
Zeyu Qin
Yifei Zhao
C. Du
Min Lin
Xueqian Wang
Tianyu Pang
KELM
CLL
70
1
0
26 May 2025
CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis
Ruixiang Feng
Shen Gao
Xiuying Chen
Lisi Chen
Shuo Shang
41
0
0
26 May 2025
Token-Importance Guided Direct Preference Optimization
Yang Ning
Lin Hai
Liu Yibo
Tian Baoliang
Liu Guoqing
Zhang Haijun
71
0
0
26 May 2025
BnMMLU: Measuring Massive Multitask Language Understanding in Bengali
Saman Sarker Joy
ELM
52
0
0
25 May 2025
An Embarrassingly Simple Defense Against LLM Abliteration Attacks
Harethah Shairah
Hasan Hammoud
Bernard Ghanem
G. Turkiyyah
63
0
0
25 May 2025
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
Jialong Zhou
L. Wang
Xiao Yang
LLMAG
67
0
0
25 May 2025
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
Fengqi Zhu
Rongzhen Wang
Shen Nie
Xiaolu Zhang
Chunwei Wu
...
Jun Zhou
Jianfei Chen
Yankai Lin
Ji-Rong Wen
Chongxuan Li
192
2
0
25 May 2025
Do Large Language Models (Really) Need Statistical Foundations?
Weijie Su
274
0
0
25 May 2025
The Price of Format: Diversity Collapse in LLMs
Longfei Yun
Chenyang An
Zilong Wang
Letian Peng
Jingbo Shang
47
0
0
25 May 2025
ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language Models
Benjamin Clavié
Florian Brand
VLM
CoGe
51
0
0
25 May 2025
AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models
Miguel Angel Peñaloza Perez
Bruno Lopez Orozco
Jesus Tadeo Cruz Soto
Michelle Bruno Hernandez
Miguel Angel Alvarado Gonzalez
Sandra Malagon
LRM
ELM
32
0
0
25 May 2025
Paying Alignment Tax with Contrastive Learning
Buse Sibel Korkmaz
Rahul Nair
Elizabeth M. Daly
Antonio del Rio Chanona
74
0
0
25 May 2025
Efficient Data Selection at Scale via Influence Distillation
Mahdi Nikdan
Vincent Cohen-Addad
Dan Alistarh
Vahab Mirrokni
TDI
71
0
0
25 May 2025
RvLLM: LLM Runtime Verification with Domain Knowledge
Yedi Zhang
Sun Yi Emma
Annabelle Lee Jia En
Jin Song Dong
39
0
0
24 May 2025
From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation
Zhihao Zhang
Yiran Zhang
Xiyue Zhou
Liting Huang
Imran Razzak
Preslav Nakov
Usman Naseem
22
0
0
24 May 2025
Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
Meng Li
Guangda Huzhang
Haibo Zhang
Xiting Wang
Anxiang Zeng
42
0
0
24 May 2025
Multilingual Question Answering in Low-Resource Settings: A Dzongkha-English Benchmark for Foundation Models
Md. Tanzib Hosain
Rajan Das Gupta
Md. Kishor Morol
24
0
0
24 May 2025
ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models
Hao Chen
Haoze Li
Zhiqing Xiao
Lirong Gao
Qi Zhang
Xiaomeng Hu
Ningtao Wang
Xing Fu
Junbo Zhao
206
0
0
24 May 2025
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
Sicheng Feng
Song Wang
Shuyi Ouyang
Lingdong Kong
Zikai Song
Jianke Zhu
Huan Wang
Xinchao Wang
LRM
104
0
0
24 May 2025
How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation
Xin Lu
Yanyan Zhao
Si Wei
Shijin Wang
Bing Qin
Ting Liu
46
0
0
24 May 2025
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
Junyu Chen
Junzhuo Li
Zhen Peng
Wenjie Wang
Yuxiang Ren
Long Shi
Xuming Hu
MQ
33
0
0
24 May 2025
Hybrid Latent Reasoning via Reinforcement Learning
Zhenrui Yue
Bowen Jin
Huimin Zeng
Honglei Zhuang
Zhen Qin
Jinsung Yoon
Lanyu Shang
Jiawei Han
Dong Wang
OffRL
BDL
LRM
68
0
0
24 May 2025
PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval
Zehua Pei
Ying Zhang
Hui-Ling Zhen
Xianzhi Yu
Wulong Liu
Sinno Jialin Pan
Mingxuan Yuan
Bei Yu
MoE
47
0
0
23 May 2025
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Xiaohao Liu
Xiaobo Xia
Weixiang Zhao
Manyi Zhang
Xianzhi Yu
Xiu Su
Shuo Yang
See-Kiong Ng
Tat-Seng Chua
KELM
LRM
94
0
0
23 May 2025
PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language
Naghmeh Jamali
Milad Mohammadi
Danial Baledi
Zahra Rezvani
Hesham Faili
LM&MA
ELM
58
0
0
23 May 2025
Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens
Xixian Yong
Xiao Zhou
Yingying Zhang
Jinlin Li
Yefeng Zheng
X. Wu
LRM
73
0
0
23 May 2025
TAGS: A Test-Time Generalist-Specialist Framework with Retrieval-Augmented Reasoning and Verification
Jianghao Wu
Feilong Tang
Yulong Li
Ming Hu
Haochen Xue
Shoaib Jameel
Yutong Xie
Imran Razzak
LRM
52
0
0
23 May 2025
Language Matters: How Do Multilingual Input and Reasoning Paths Affect Large Reasoning Models?
Zhi Rui Tam
Cheng-Kuang Wu
Yu Ying Chiu
Chieh-Yen Lin
Yun-Nung Chen
Hung-yi Lee
LRM
85
0
0
23 May 2025
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms
Mengru Wang
Ziwen Xu
Shengyu Mao
Shumin Deng
Zhaopeng Tu
Ningyu Zhang
N. Zhang
LLMSV
135
0
0
23 May 2025
NileChat: Towards Linguistically Diverse and Culturally Aware LLMs for Local Communities
Abdellah El Mekki
Houdaifa Atou
Omer Nacar
Shady Shehata
Muhammad Abdul-Mageed
69
0
0
23 May 2025
Explaining Sources of Uncertainty in Automated Fact-Checking
Jingyi Sun
Greta Warren
Irina Shklovski
Isabelle Augenstein
65
1
0
23 May 2025
ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs
Landon Butler
Abhineet Agarwal
Justin Singh Kang
Yigit Efe Erginbas
Bin Yu
Kannan Ramchandran
141
0
0
23 May 2025
ELSPR: Evaluator LLM Training Data Self-Purification on Non-Transitive Preferences via Tournament Graph Reconstruction
Yan Yu
Yilun Liu
Minggui He
Shimin Tao
Weibin Meng
...
Li Zhang
Hongxia Ma
Chang Su
Hao Yang
Fuliang Li
40
0
0
23 May 2025
DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies
Ning Yang
Fangxin Liu
Junjie Wang
Tao Yang
Kan Liu
Haibing Guan
Li Jiang
AI4CE
93
0
0
23 May 2025
Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models
Viktoriia Chekalina
Daniil Moskovskiy
Daria Cherniuk
Maxim Kurkin
Andrey Kuznetsov
Evgeny Frolov
217
0
0
23 May 2025
NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling
Bram Grooten
Farid Hasanov
Chenxiang Zhang
Q. Xiao
Boqian Wu
...
Shiwei Liu
L. Yin
Elena Mocanu
Mykola Pechenizkiy
Decebal Constantin Mocanu
60
0
0
23 May 2025
ScholarBench: A Bilingual Benchmark for Abstraction, Comprehension, and Reasoning Evaluation in Academic Contexts
Dongwon Noh
Donghyeok Koh
Junghun Yuk
Gyuwan Kim
Jaeyong Lee
Kyungtae Lim
Cheoneum Park
ELM
71
0
0
22 May 2025
MixAT: Combining Continuous and Discrete Adversarial Training for LLMs
Csaba Dékány
Stefan Balauca
Robin Staab
Dimitar I. Dimitrov
Martin Vechev
AAML
55
0
0
22 May 2025
Robust LLM Fingerprinting via Domain-Specific Watermarks
Thibaud Gloaguen
Robin Staab
Nikola Jovanović
Martin Vechev
WaLM
114
0
0
22 May 2025
URLs Help, Topics Guide: Understanding Metadata Utility in LLM Training
Dongyang Fan
Vinko Sabolčec
Martin Jaggi
59
0
0
22 May 2025
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios
Bin Xu
Yu Bai
Huashan Sun
Yiguan Lin
Siming Liu
Xinyue Liang
Yaolin Li
Yang Gao
Heyan Huang
AI4Ed
ELM
212
0
0
22 May 2025
SPaRC: A Spatial Pathfinding Reasoning Challenge
Lars Benedikt Kaesberg
Jan Philip Wahle
Terry Ruas
Bela Gipp
LRM
71
0
0
22 May 2025
CHART-6: Human-Centered Evaluation of Data Visualization Understanding in Vision-Language Models
Arnav Verma
Kushin Mukherjee
Christopher Potts
Elisa Kreiss
Judith E. Fan
34
0
0
22 May 2025
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs
Rui Ye
Xiangrui Liu
Qimin Wu
Xianghe Pang
Zhenfei Yin
Lei Bai
Siheng Chen
LLMAG
81
0
0
22 May 2025
Previous
1
2
3
...
5
6
7
...
67
68
69
Next