Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Compression Laws for Large Language Models
Ayan Sengupta
Siddhant Chaudhary
Tanmoy Chakraborty
57
0
0
06 Apr 2025
Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models
Yuheng Wu
Wentao Guo
Zirui Liu
Heng Ji
Zhaozhuo Xu
Denghui Zhang
84
0
0
05 Apr 2025
PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models
Haofei Yin
Mengbai Xiao
Rouzhou Lu
Xiao Zhang
Dongxiao Yu
Guanghui Zhang
AI4CE
79
0
0
05 Apr 2025
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Satvik Golechha
Adrià Garriga-Alonso
LLMAG
85
2
0
05 Apr 2025
Efficient Evaluation of Large Language Models via Collaborative Filtering
Xu-Xiang Zhong
Chao Yi
Han-Jia Ye
118
0
0
05 Apr 2025
Entropy-Based Block Pruning for Efficient Large Language Models
Liangwei Yang
Yuhui Xu
Juntao Tan
Doyen Sahoo
Siyang Song
Caiming Xiong
Han Wang
Shelby Heinecke
AAML
64
0
0
04 Apr 2025
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELM
LRM
89
3
0
04 Apr 2025
Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs
Pedro Sandoval-Segura
Xijun Wang
Ashwinee Panda
Micah Goldblum
Ronen Basri
Tom Goldstein
David Jacobs
100
1
0
04 Apr 2025
Think When You Need: Self-Adaptive Chain-of-Thought Learning
Junjie Yang
Ke Lin
Xing Yu
ReLM
LRM
AI4CE
123
2
0
04 Apr 2025
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams
Ruoxin Xiong
Yanyu Wang
Suat Gunhan
Yimin Zhu
Charles Berryman
ELM
53
0
0
04 Apr 2025
Layers at Similar Depths Generate Similar Activations Across LLM Architectures
Christopher Wolfram
Aaron Schein
100
2
0
03 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
279
0
0
03 Apr 2025
MegaMath: Pushing the Limits of Open Math Corpora
Fan Zhou
Zengzhi Wang
Nikhil Ranjan
Zhoujun Cheng
Liping Tang
Guowei He
Zhengzhong Liu
Eric P. Xing
LRM
135
3
0
03 Apr 2025
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
Yifan Wang
Runjin Chen
Bolian Li
David Cho
Yihe Deng
Ruqi Zhang
Tianlong Chen
Zhangyang Wang
A. Grama
Junyuan Hong
SyDa
72
2
0
03 Apr 2025
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
Liangjie Huang
Dawei Li
Huan Liu
Lu Cheng
LRM
112
0
0
03 Apr 2025
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
Beichen Huang
Yueming Yuan
Zelei Shao
Minjia Zhang
MQ
MoE
152
0
0
03 Apr 2025
How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices?
Andres Algaba
Vincent Holst
Floriano Tori
Melika Mobini
Brecht Verbeken
Sylvia Wenmackers
Vincent Ginis
123
1
0
03 Apr 2025
YourBench: Easy Custom Evaluation Sets for Everyone
Shivalika Singh
Clémentine Fourrier
Alina Lozovskia
Thomas Wolf
Gokhan Tur
Dilek Hakkani-Tur
91
4
0
02 Apr 2025
PROPHET: An Inferable Future Forecasting Benchmark with Causal Intervened Likelihood Estimation
Zhengwei Tao
Zhi Jin
Bincheng Li
Xiaoying Bai
Haiyan Zhao
Chengfeng Dou
Xiancai Chen
Jia Li
Linyu Li
Chongyang Tao
AI4TS
73
1
0
02 Apr 2025
LRAGE: Legal Retrieval Augmented Generation Evaluation Tool
Minhu Park
Hongseok Oh
Eunkyung Choi
Wonseok Hwang
AILaw
RALM
ELM
170
0
0
02 Apr 2025
Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation
A. Myntti
Erik Henriksson
Veronika Laippala
S. Pyysalo
146
0
0
02 Apr 2025
SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models
Anil Ramakrishna
Yixin Wan
Xiaomeng Jin
Kai-Wei Chang
Zhiqi Bu
Bhanukiran Vinzamuri
Volkan Cevher
Mingyi Hong
Rahul Gupta
AILaw
MU
481
1
0
02 Apr 2025
Exploring LLM Reasoning Through Controlled Prompt Variations
Giannis Chatziveroglou
Richard Yun
Maura Kelleher
AAML
LRM
48
2
0
02 Apr 2025
Representation Bending for Large Language Model Safety
Ashkan Yousefpour
Taeheon Kim
Ryan S. Kwon
Seungbeen Lee
Wonje Jeung
Seungju Han
Alvin Wan
Harrison Ngan
Youngjae Yu
Jonghyun Choi
AAML
ALM
KELM
129
4
0
02 Apr 2025
Biomedical Question Answering via Multi-Level Summarization on a Local Knowledge Graph
Lingxiao Guan
Yanwen Huang
Jie Liu
94
0
0
02 Apr 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Ziyi Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
200
5
0
01 Apr 2025
RECKON: Large-scale Reference-based Efficient Knowledge Evaluation for Large Language Model
Lin Zhang
Zhouhong Gu
Xiaoran Shi
Hongwei Feng
Yanghua Xiao
55
0
0
01 Apr 2025
MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs
Juncheng Wu
Wenlong Deng
Xiaochen Li
Sheng Liu
Taomian Mi
...
Yihan Cao
Hui Ren
Xuzhao Li
Xiaoxiao Li
Yuyin Zhou
AI4MH
LRM
134
16
0
01 Apr 2025
On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation
Jirui Qi
Raquel Fernández
Arianna Bisazza
RALM
135
0
0
01 Apr 2025
Are you really listening? Boosting Perceptual Awareness in Music-QA Benchmarks
Yongyi Zang
Sean O'Brien
Taylor Berg-Kirkpatrick
Julian McAuley
Cheng-i Wang
AuLLM
142
2
0
01 Apr 2025
Multi-Token Attention
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
89
1
0
01 Apr 2025
Pay More Attention to the Robustness of Prompt for Instruction Data Mining
Qiang Wang
Dawei Feng
Xu Zhang
Ao Shen
Yang Xu
Bo Ding
H. Wang
AAML
87
0
0
31 Mar 2025
MKA: Leveraging Cross-Lingual Consensus for Model Abstention
Sharad Duwal
100
0
0
31 Mar 2025
Adaptive Layer-skipping in Pre-trained LLMs
Xuan Luo
Weizhi Wang
Xifeng Yan
461
1
0
31 Mar 2025
Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning
J. Lin
Tian Wang
Kun Qian
LRM
127
7
0
31 Mar 2025
Order Independence With Finetuning
Katrina Brown
Reid McIlroy
64
0
0
30 Mar 2025
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song
Xuwei Ding
Jieyu Zhang
Taiwei Shi
Ryotaro Shimizu
Rahul Gupta
Yang Liu
Jian Kang
Jieyu Zhao
KELM
114
1
0
30 Mar 2025
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
Tianyang Xu
Xiaoze Liu
Feijie Wu
Xiaoqian Wang
Jing Gao
MU
162
1
0
29 Mar 2025
Efficient Inference for Large Reasoning Models: A Survey
Yi Liu
Jiaying Wu
Yufei He
Hongcheng Gao
Hongyu Chen
Baolong Bi
Jiaheng Zhang
Zhiqi Huang
Bryan Hooi
Bryan Hooi
LLMAG
LRM
174
17
0
29 Mar 2025
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Gabriel Recchia
Chatrik Singh Mangat
Issac Li
Gayatri Krishnakumar
ALM
177
0
0
29 Mar 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
499
0
0
28 Mar 2025
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions
Yubo Li
Yidi Miao
Xueying Ding
Ramayya Krishnan
R. Padman
134
0
0
28 Mar 2025
Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models
Runpeng Dai
Run Yang
Fan Zhou
Hongtu Zhu
60
0
0
28 Mar 2025
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Zhanke Zhou
Zhaocheng Zhu
Xuan Li
Mikhail Galkin
Xiao Feng
Sanmi Koyejo
Jian Tang
Bo Han
LRM
169
6
0
28 Mar 2025
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Chung-En Sun
Ge Yan
Tsui-Wei Weng
KELM
LRM
102
3
0
27 Mar 2025
MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning
Jiancheng Zhao
Xingda Yu
Zhen Yang
MoE
88
3
0
27 Mar 2025
The Risks of Using Large Language Models for Text Annotation in Social Science Research
Hao Lin
Yongjun Zhang
59
2
0
27 Mar 2025
SWI: Speaking with Intent in Large Language Models
Yuwei Yin
EunJeong Hwang
Giuseppe Carenini
LRM
131
0
0
27 Mar 2025
Qwen2.5-Omni Technical Report
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
...
K. Dang
Bin Zhang
Xinyu Wang
Yunfei Chu
Junyang Lin
VGen
AuLLM
164
55
0
26 Mar 2025
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark
Sondos Mahmoud Bsharat
Mukul Ranjan
Aidar Myrzakhan
Jiacheng Liu
Bowei Guo
Shengkun Tang
Zhuang Liu
Yuanzhi Li
Zhiqiang Shen
ELM
118
1
0
26 Mar 2025
Previous
1
2
3
...
12
13
14
...
67
68
69
Next