Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use
Shan Chen
Pedro Moreira
Yuxin Xiao
Sam Schmidgall
J. Warner
Hugo J. W. L. Aerts
Thomas Hartvigsen
Jack Gallifant
Danielle S. Bitterman
ELM
65
0
0
20 May 2025
sudoLLM : On Multi-role Alignment of Language Models
Soumadeep Saha
Akshay Chaturvedi
Joy Mahapatra
Utpal Garain
45
0
0
20 May 2025
s3: You Don't Need That Much Data to Train a Search Agent via RL
Pengcheng Jiang
Xueqiang Xu
Jiacheng Lin
Jinfeng Xiao
Zifeng Wang
Jimeng Sun
Jiawei Han
OffRL
RALM
AI4TS
LRM
113
1
0
20 May 2025
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora
Yingli Shen
Wen Lai
Shuo Wang
Kangyang Luo
Alexander Fraser
Maosong Sun
84
0
0
20 May 2025
Incorporating Token Usage into Prompting Strategy Evaluation
Chris Sypherd
Sergei Petrov
Sonny George
Vaishak Belle
LLMAG
58
0
0
20 May 2025
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
Yanggan Gu
Zhaoyi Yan
Yuanyi Wang
Yiming Zhang
Qi Zhou
Leilei Gan
Hongxia Yang
72
0
0
20 May 2025
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
Wonje Jeung
Sangyeon Yoon
Minsuk Kahng
Albert No
LRM
LLMSV
198
1
0
20 May 2025
Cross-Lingual Optimization for Language Transfer in Large Language Models
Jungseob Lee
Seongtae Hong
Hyeonseok Moon
Heuiseok Lim
64
0
0
20 May 2025
ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data
Xinzhe Zheng
Sijie Ji
Jiawei Sun
Ruoxin Chen
Wei Gao
Mani Srivastava
AI4MH
LRM
58
0
0
20 May 2025
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models
Yakun Zhu
Zhongzhen Huang
Linjie Mu
Yutong Huang
Wei Nie
Jiaji Liu
Shaoting Zhang
Pengfei Liu
Xiaofan Zhang
LM&MA
ELM
LRM
167
0
0
20 May 2025
Enhancing LLMs via High-Knowledge Data Selection
Feiyu Duan
Xuemiao Zhang
Sirui Wang
Haoran Que
Yuqi Liu
Wenge Rong
Xunliang Cai
237
0
0
20 May 2025
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
Tomer Gafni
Asaf Karnieli
Yair Hanani
MQ
74
0
0
20 May 2025
The Energy Cost of Reasoning: Analyzing Energy Usage in LLMs with Test-time Compute
Yunho Jin
Gu-Yeon Wei
David Brooks
LRM
115
0
0
20 May 2025
Fragments to Facts: Partial-Information Fragment Inference from LLMs
Lucas Rosenblatt
Bin Han
Robert Wolfe
Bill Howe
AAML
61
0
0
20 May 2025
Safety Alignment Can Be Not Superficial With Explicit Safety Signals
Jianwei Li
Jung-Eng Kim
AAML
189
1
0
19 May 2025
Learnware of Language Models: Specialized Small Language Models Can Do Big
Zhi-Hao Tan
Zi-Chen Zhao
Hao-Yu Shi
Xin-Yu Zhang
Peng Tan
Yang Yu
Zhi Zhou
140
0
0
19 May 2025
Incentivizing Truthful Language Models via Peer Elicitation Games
Baiting Chen
Tong Zhu
Jiale Han
Lexin Li
Gang Li
Xiaowu Dai
124
0
0
19 May 2025
Shadow-FT: Tuning Instruct via Base
Taiqiang Wu
Runming Yang
Jiayi Li
Pengfei Hu
Ngai Wong
Yujiu Yang
239
0
0
19 May 2025
An Empirical Study of Many-to-Many Summarization with Large Language Models
Jiaan Wang
Fandong Meng
Zengkui Sun
Yunlong Liang
Yuxuan Cao
Jiarong Xu
Haoxiang Shi
Jie Zhou
47
0
0
19 May 2025
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Yu Fan
Jingwei Ni
Jakob Merane
Etienne Salimbeni
Yang Tian
...
Mrinmaya Sachan
Alexander Stremitzer
Christoph Engel
Elliott Ash
Joel Niklaus
AILaw
ELM
128
0
0
19 May 2025
Improving Multilingual Language Models by Aligning Representations through Steering
Omar Mahmoud
B. L. Semage
Thommen George Karimpanal
Santu Rana
LLMSV
84
0
0
19 May 2025
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
Debarpan Bhattacharya
Apoorva Kulkarni
Sriram Ganapathy
84
0
0
19 May 2025
ProDS: Preference-oriented Data Selection for Instruction Tuning
Wenya Guo
Zhengkun Zhang
Xumeng Liu
Ying Zhang
Ziyu Lu
Haoze Zhu
Xubo Liu
Ruxue Yan
121
0
0
19 May 2025
CoT-Kinetics: A Theoretical Modeling Assessing LRM Reasoning Process
Jinhe Bi
Danqi Yan
Yifan Wang
Wenke Huang
Haokun Chen
...
Mang Ye
Xun Xiao
Hinrich Schuetze
Volker Tresp
Yunpu Ma
LRM
116
9
0
19 May 2025
R3: Robust Rubric-Agnostic Reward Models
David Anugraha
Zilu Tang
Lester James V. Miranda
Hanyang Zhao
Mohammad Rifqi Farhansyah
Garry Kuwanto
Derry Wijaya
Genta Indra Winata
215
1
0
19 May 2025
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
Safal Shrestha
Minwu Kim
Aadim Nepal
Anubhav Shrestha
Keith Ross
OffRL
ReLM
LRM
79
0
0
19 May 2025
Krikri: Advancing Open Large Language Models for Greek
Dimitris Roussis
Leon Voukoutis
Georgios Paraskevopoulos
Sokratis Sofianopoulos
Prokopis Prokopidis
Vassilis Papavasileiou
Athanasios Katsamanis
Stelios Piperidis
Vassilis Katsouros
ALM
103
1
0
19 May 2025
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference
Shuqing Luo
Pingzhi Li
Jie Peng
Hanrui Wang
Yang
Zhao
Yu Cheng
Tianlong Chen
MoE
96
0
0
19 May 2025
Relation Extraction or Pattern Matching? Unravelling the Generalisation Limits of Language Models for Biographical RE
Varvara Arzt
Allan Hanbury
Michael Wiegand
Gábor Recski
Terra Blevins
70
0
0
18 May 2025
ExpertSteer: Intervening in LLMs through Expert Knowledge
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
LLMSV
181
0
0
18 May 2025
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
Wenqiao Zhu
Ji Liu
Lulu Wang
Jun Wu
Yulun Zhang
106
0
0
18 May 2025
Truth Neurons
Haohang Li
Yupeng Cao
Yangyang Yu
Jordan W. Suchow
Zining Zhu
HILM
MILM
KELM
73
0
0
18 May 2025
HybridServe: Efficient Serving of Large AI Models with Confidence-Based Cascade Routing
Leyang Xue
Yao Fu
Luo Mai
Mahesh K. Marina
136
0
0
18 May 2025
PSC: Extending Context Window of Large Language Models via Phase Shift Calibration
Wenqiao Zhu
Chao Xu
Lulu Wang
Jun Wu
107
1
0
18 May 2025
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Yang Zhao
Kai Xiong
Xiao Ding
Li Du
YangouOuyang
...
Wentao Zhang
Bin Liu
Dong Hu
Bing Qin
Ting Liu
OffRL
85
0
0
18 May 2025
HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
146
0
0
18 May 2025
Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration
Tatia Tsmindashvili
Ana Kolkhidashvili
Dachi Kurtskhalia
Nino Maghlakelidze
Elene Mekvabishvili
Guram Dentoshvili
Orkhan Shamilov
Zaal Gachechiladze
Steven Saporta
David Dachi Choladze
185
0
0
18 May 2025
Wisdom from Diversity: Bias Mitigation Through Hybrid Human-LLM Crowds
Axel Abels
Tom Lenaerts
56
0
0
18 May 2025
Towards Budget-Friendly Model-Agnostic Explanation Generation for Large Language Models
Junhao Liu
Haonan Yu
Xin Zhang
LRM
185
0
0
18 May 2025
Teach2Eval: An Indirect Evaluation Method for LLM by Judging How It Teaches
Yuhang Zhou
Xutian Chen
Yixin Cao
Yuchen Ni
Yu He
...
Xiang Liu
Jian Zhang
Chuanjun Ji
Guangnan Ye
Xipeng Qiu
ELM
61
0
0
18 May 2025
Mutual-Taught for Co-adapting Policy and Reward Models
Tianyuan Shi
Canbin Huang
Fanqi Wan
Longguang Zhong
Ziyi Yang
Weizhou Shen
Xiaojun Quan
Ming Yan
36
0
0
17 May 2025
OMAC: A Broad Optimization Framework for LLM-Based Multi-Agent Collaboration
Shijun Li
Hilaf Hasson
Joydeep Ghosh
LLMAG
112
0
0
17 May 2025
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Jing Huang
Junyi Tao
Thomas Icard
Diyi Yang
Christopher Potts
OODD
87
0
0
17 May 2025
Evaluating the Logical Reasoning Abilities of Large Reasoning Models
Hanmeng Liu
Yiran Ding
Zhizhang Fu
Chaoli Zhang
Xiaozhang Liu
Yue Zhang
ELM
LRM
71
1
0
17 May 2025
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities
Jingxue Chen
Qingkun Tang
Qianchun Lu
Siyuan Fang
98
0
0
17 May 2025
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
Vincent Koc
LM&MA
82
0
0
17 May 2025
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
Puning Yang
Qizhou Wang
Zhuo Huang
Tongliang Liu
Chengqi Zhang
Bo Han
MU
120
0
0
17 May 2025
Why Not Act on What You Know? Unleashing Safety Potential of LLMs via Self-Aware Guard Enhancement
Peng Ding
Jun Kuang
Zongyu Wang
Xuezhi Cao
Xunliang Cai
Jiajun Chen
Shujian Huang
99
0
0
17 May 2025
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
Ning Lu
Shengcai Liu
Jiahao Wu
Weiyu Chen
Zhirui Zhang
Yew-Soon Ong
Qi Wang
Ke Tang
106
3
0
17 May 2025
The AI Gap: How Socioeconomic Status Affects Language Technology Interactions
Elisa Bassignana
Amanda Cercas Curry
Dirk Hovy
68
0
0
17 May 2025
Previous
1
2
3
...
7
8
9
...
67
68
69
Next