Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers
Diana Abagyan
Alejandro Salamanca
Andres Felipe Cruz-Salinas
Kris Cao
Hangyu Lin
Acyr Locatelli
Marzieh Fadaee
Ahmet Üstün
Sara Hooker
CLL
131
0
0
12 Jun 2025
Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning
Jikai Jin
Vasilis Syrgkanis
Sham Kakade
Hanlin Zhang
ELM
122
1
0
12 Jun 2025
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Xiaozhe Li
Jixuan Chen
Xinyu Fang
Shengyuan Ding
Haodong Duan
Qingwen Liu
Kai-xiang Chen
LLMAG
LRM
106
0
0
12 Jun 2025
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Yu Sun
Xingyu Qian
Weiwen Xu
Hao Zhang
Chenghao Xiao
Long Li
Yu Rong
Wenbing Huang
Qifeng Bai
Tingyang Xu
LRM
69
0
0
11 Jun 2025
GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture
GigaChat team
Mamedov Valentin
Evgenii Kosarev
Gregory Leleytner
Ilya Shchuckin
...
Ruslan Gaitukiev
Arkadiy Shatenov
Alena Fenogenova
Nikita Savushkin
Fedor Minkin
88
0
0
11 Jun 2025
LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge
Songze Li
Chuokun Xu
Jiaying Wang
Xueluan Gong
Chen Chen
J. Zhang
Jun Wang
K. Lam
Shouling Ji
AAML
ELM
85
0
0
11 Jun 2025
Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods
Yeonwoo Jang
Shariqah Hossain
Ashwin Sreevatsa
Diogo Cruz
AAML
MU
54
0
0
11 Jun 2025
The Emergence of Abstract Thought in Large Language Models Beyond Any Language
Yuxin Chen
Yiran Zhao
Yang Zhang
An Zhang
Kenji Kawaguchi
Shafiq Joty
Junnan Li
Tat-Seng Chua
Michael Shieh
Wenxuan Zhang
LRM
63
0
0
11 Jun 2025
TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding
Bingheng Wu
Jingze Shi
Yifan Wu
Nan Tang
Yuyu Luo
93
0
0
11 Jun 2025
DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts
Yuchen Feng
Bowen Shen
Naibin Gu
Jiaxuan Zhao
Peng Fu
Zheng Lin
Weiping Wang
MoMe
MoE
52
0
0
11 Jun 2025
Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models
Jui-Ming Yao
Hao-Yuan Chen
Zi-Xian Tang
Bing-Jia Tan
Sheng-Wei Peng
Bing-Cheng Xie
Shun-Feng Su
AAML
71
0
0
11 Jun 2025
Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models
Haoyi Song
Ruihan Ji
Naichen Shi
Fan Lai
Raed Al Kontar
84
0
0
11 Jun 2025
Is Fine-Tuning an Effective Solution? Reassessing Knowledge Editing for Unstructured Data
Hao Xiong
Chuanyuan Tan
Wenliang Chen
KELM
52
0
0
11 Jun 2025
Feature Engineering for Agents: An Adaptive Cognitive Architecture for Interpretable ML Monitoring
Gusseppe Bravo Rocca
Peini Liu
Jordi Guitart
Rodrigo M Carrillo-Larco
Ajay Dholakia
David Ellison
LLMAG
82
0
0
11 Jun 2025
Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning
Xiangning Yu
Zhuohan Wang
Linyi Yang
Haoxuan Li
Anjie Liu
Xiao Xue
Jun Wang
Mengyue Yang
ReLM
LRM
ELM
77
0
0
11 Jun 2025
Can A Gamer Train A Mathematical Reasoning Model?
Andrew Shin
ReLM
LRM
34
0
0
10 Jun 2025
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
Chenlong Wang
Yuanning Feng
Dongping Chen
Zhaoyang Chu
Ranjay Krishna
Tianyi Zhou
LRM
25
0
0
10 Jun 2025
A Survey on Large Language Models for Mathematical Reasoning
Peng-Yuan Wang
Tian-Shuo Liu
Chenyang Wang
Yi-Di Wang
Shu Yan
...
Xu-Hui Liu
Xin-Wei Chen
Jia-Cheng Xu
Ziniu Li
Yang Yu
LRM
25
0
0
10 Jun 2025
SoK: Machine Unlearning for Large Language Models
Jie Ren
Yue Xing
Yingqian Cui
Charu C. Aggarwal
Hui Liu
MU
42
0
0
10 Jun 2025
TableDreamer: Progressive and Weakness-guided Data Synthesis from Scratch for Table Instruction Tuning
Mingyu Zheng
Zhifan Feng
Jia Wang
Lanrui Wang
Zheng Lin
Yang Hao
Weiping Wang
LMTD
53
0
0
10 Jun 2025
Flow Matching Meets PDEs: A Unified Framework for Physics-Constrained Generation
Giacomo Baldan
Qiang Liu
Alberto Guardone
Nils Thuerey
AI4CE
23
1
0
10 Jun 2025
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Yizhao Gao
Shuming Guo
Shijie Cao
Yuqing Xia
Yu Cheng
...
Hayden Kwok-Hay So
Yu Hua
Ting Cao
Fan Yang
Mao Yang
VLM
LRM
21
0
0
10 Jun 2025
Unifying Block-wise PTQ and Distillation-based QAT for Progressive Quantization toward 2-bit Instruction-Tuned LLMs
Jung Hyun Lee
Seungjae Shin
Vinnam Kim
Jaeseong You
An Chen
MQ
28
0
0
10 Jun 2025
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
Polina Kirichenko
Mark Ibrahim
Kamalika Chaudhuri
Samuel J. Bell
LRM
25
0
0
10 Jun 2025
FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models
Hariharan Ramesh
Jyotikrishna Dass
25
0
0
10 Jun 2025
Reinforce LLM Reasoning through Multi-Agent Reflection
Yurun Yuan
Tengyang Xie
LRM
25
0
0
10 Jun 2025
RAISE: Enhancing Scientific Reasoning in LLMs via Step-by-Step Retrieval
Minhae Oh
Jeonghye Kim
Nakyung Lee
Donggeon Seo
Taeuk Kim
Jungwoo Lee
ReLM
LRM
29
0
0
10 Jun 2025
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Mickel Liu
L. Jiang
Yancheng Liang
S. Du
Yejin Choi
Tim Althoff
Natasha Jaques
AAML
LRM
21
0
0
09 Jun 2025
LLM Unlearning Should Be Form-Independent
Xiaotian Ye
Mengqi Zhang
Shu Wu
MU
17
0
0
09 Jun 2025
Beyond Benchmarks: A Novel Framework for Domain-Specific LLM Evaluation and Knowledge Mapping
Nitin Sharma
Thomas Wolfers
Çağatay Yıldız
ALM
22
0
0
09 Jun 2025
From Calibration to Collaboration: LLM Uncertainty Quantification Should Be More Human-Centered
Siddartha Devic
Tejas Srinivasan
Jesse Thomason
Willie Neiswanger
Vatsal Sharan
26
0
0
09 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
32
0
0
09 Jun 2025
Correlated Errors in Large Language Models
Elliot Kim
Avi Garg
Kenny Peng
Nikhil Garg
27
0
0
09 Jun 2025
Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models
Kyeonghyun Kim
Jinhee Jang
Juhwan Choi
Yoonji Lee
Kyohoon Jin
Youngbin Kim
24
0
0
09 Jun 2025
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
Yongkang Li
Kaixin Xiong
Xiangyu Guo
Fang Li
Sixu Yan
...
Bing Wang
Guang Chen
Hangjun Ye
Wenyu Liu
Xinggang Wang
VLM
48
0
0
09 Jun 2025
Enhancing Watermarking Quality for LLMs via Contextual Generation States Awareness
Peiru Yang
Xintian Li
Wanchun Ni
Jinhua Yin
Huili Wang
Guoshun Nan
Shangguang Wang
Yongfeng Huang
Tao Qi
24
0
0
09 Jun 2025
Federated In-Context Learning: Iterative Refinement for Improved Answer Quality
Ruhan Wang
Zhiyong Wang
Chengkai Huang
Rui Wang
Tong Yu
Lina Yao
John C. S. Lui
Dongruo Zhou
15
0
0
09 Jun 2025
Infinity Instruct: Scaling Instruction Selection and Synthesis to Enhance Language Models
Jijie Li
Li Du
hanyu Zhao
Bo Zhang
Liangdong Wang
Boyan Gao
Guang Liu
Yonghua Lin
ALM
SyDa
27
0
0
09 Jun 2025
SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition
Mengsong Wu
Di Zhang
Yuqiang Li
Dongzhan Zhou
Wenliang Chen
ReLM
LRM
18
0
0
09 Jun 2025
ConfQA: Answer Only If You Are Confident
Yin Huang
Yifan Ethan Xu
Kai Sun
Vera Yan
Alicia Sun
...
Yue Liu
Aaron Colak
Anuj Kumar
Wen-tau Yih
Xin Luna Dong
HILM
20
0
0
08 Jun 2025
MedCite: Can Language Models Generate Verifiable Text for Medicine?
Xiao Wang
Mengjue Tan
Qiao Jin
Guangzhi Xiong
Yu Hu
Aidong Zhang
Zhiyong Lu
Minjia Zhang
23
0
0
07 Jun 2025
What Makes a Good Natural Language Prompt?
Do Xuan Long
Duy Dinh
Ngoc-Hai Nguyen
Kenji Kawaguchi
Nancy F. Chen
Shafiq Joty
Min-Yen Kan
33
0
0
07 Jun 2025
Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit
Charles Goddard
Fernando Fernandes Neto
30
0
0
07 Jun 2025
From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
Kyubyung Chae
Hyunbin Jin
Taesup Kim
27
0
0
07 Jun 2025
On the Adaptive Psychological Persuasion of Large Language Models
Tianjie Ju
Yujia Chen
Hao Fei
Mong Li Lee
Wynne Hsu
Pengzhou Cheng
Zongru Wu
Zhuosheng Zhang
Gongshen Liu
17
0
0
07 Jun 2025
Contextual Experience Replay for Self-Improvement of Language Agents
Yitao Liu
Chenglei Si
Karthik Narasimhan
Shunyu Yao
LLMAG
23
0
0
07 Jun 2025
MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?
Zhitao He
Zongwei Lyu
Dazhong Chen
Dadi Guo
Yi R. Fung
LRM
54
0
0
06 Jun 2025
Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness
Rongzhe Wei
Peizhi Niu
Hans Hao-Hsun Hsu
Ruihan Wu
Haoteng Yin
...
Vamsi K. Potluru
Eli Chien
Kamalika Chaudhuri
Olgica Milenković
P. Li
MU
KELM
64
0
0
06 Jun 2025
dots.llm1 Technical Report
Bi Huo
Bin Tu
Cheng Qin
Da Zheng
Debing Zhang
...
Yuqiu Ji
Ze Wen
Zhenhai Liu
Zichao Li
Zilong Liao
MoE
49
0
0
06 Jun 2025
Benchmarking Misuse Mitigation Against Covert Adversaries
Davis Brown
Mahdi Sabbaghi
Luze Sun
Alexander Robey
George Pappas
Eric Wong
Hamed Hassani
28
0
0
06 Jun 2025
Previous
1
2
3
4
5
...
67
68
69
Next