Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 875 papers shown
Title
ALLVB: All-in-One Long Video Understanding Benchmark
Xichen Tan
Yuanjing Luo
Yunfan Ye
Fang Liu
Zhiping Cai
MLLM
VLM
85
0
0
10 Mar 2025
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
Xiangru Tang
Daniel Shao
Jiwoong Sohn
Jiapeng Chen
Jiayi Zhang
...
Yilun Zhao
Chenglin Wu
Wenqi Shi
Arman Cohan
Mark B. Gerstein
AI4MH
LRM
ELM
LM&MA
70
4
0
10 Mar 2025
WildIFEval: Instruction Following in the Wild
Gili Lior
Asaf Yehudai
Ariel Gera
L. Ein-Dor
71
0
0
09 Mar 2025
MoFE: Mixture of Frozen Experts Architecture
Jean Seo
Jaeyoon Kim
Hyopil Shin
MoE
203
0
0
09 Mar 2025
Evaluating and Aligning Human Economic Risk Preferences in LLMs
Jiaheng Liu
Yi Yang
Kar Yan Tam
67
0
0
09 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
Chenyu Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
111
2
0
07 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu-Xi Cheng
MoE
103
2
0
07 Mar 2025
The Society of HiveMind: Multi-Agent Optimization of Foundation Model Swarms to Unlock the Potential of Collective Intelligence
Noah Mamie
Susie Xi Rao
LLMAG
AI4CE
51
0
0
07 Mar 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
62
0
0
06 Mar 2025
Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models
Benyamin Jamialahmadi
Parsa Kavehzadeh
Mehdi Rezagholizadeh
Parsa Farinneya
Hossein Rajabzadeh
A. Jafari
Boxing Chen
Marzieh S. Tahaei
42
0
0
06 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
51
0
0
06 Mar 2025
LLMs Can Generate a Better Answer by Aggregating Their Own Responses
Zichong Li
Xinyu Feng
Yuheng Cai
Zixuan Zhang
Tianyi Liu
Chen Liang
Weizhu Chen
Haoyu Wang
T. Zhao
LRM
55
1
0
06 Mar 2025
Robust Learning of Diverse Code Edits
Tushar Aggarwal
Swayam Singh
Abhijeet Awasthi
Aditya Kanade
Nagarajan Natarajan
SyDa
195
0
0
05 Mar 2025
A Practical Memory Injection Attack against LLM Agents
Shen Dong
Shaocheng Xu
Pengfei He
Y. Li
Jiliang Tang
Tianming Liu
Hui Liu
Zhen Xiang
LLMAG
AAML
43
2
0
05 Mar 2025
EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram Reports
L. Moukheiber
Mira Moukheiber
Dana Moukheiiber
Jae-Woo Ju
Hyung-Chul Lee
LM&MA
79
0
0
04 Mar 2025
Adaptively profiling models with task elicitation
Davis Brown
Prithvi Balehannina
Helen Jin
Shreya Havaldar
Hamed Hassani
Eric Wong
ALM
ELM
108
0
0
03 Mar 2025
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai
Yijie Xu
Jinhui Ye
Hao Liu
Hui Xiong
3DV
RALM
86
2
0
03 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Wenjie Qu
Xiren Zhou
MoE
SyDa
76
28
0
03 Mar 2025
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Kashun Shum
Yuanmin Huang
Hongjian Zou
Qi Ding
Yixuan Liao
Xiao Chen
Qian Liu
Junxian He
67
2
0
02 Mar 2025
Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective
Yuko Nakagi
Keigo Tada
Sota Yoshino
Shinji Nishimoto
Yu Takagi
LRM
37
0
0
28 Feb 2025
BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology
Ludovico Mitchener
Jon M. Laurent
Benjamin Tenmann
Siddharth Narayanan
Geemi P Wellawatte
A. White
Lorenzo Sani
Samuel G. Rodriques
LLMAG
LM&MA
ELM
64
3
0
28 Feb 2025
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
Xujie Yuan
Y. Liu
Shimin Di
Shiwen Wu
Libin Zheng
Rui Meng
Lei Chen
Xiaofang Zhou
Jian Yin
36
0
0
28 Feb 2025
The Power of Personality: A Human Simulation Perspective to Investigate Large Language Model Agents
Yifan Duan
Yihong Tang
Xuefeng Bai
Kehai Chen
J. Li
Min Zhang
LLMAG
228
0
0
28 Feb 2025
PaCA: Partial Connection Adaptation for Efficient Fine-Tuning
Sunghyeon Woo
Sol Namkung
Sunwoo Lee
Inho Jeong
Beomseok Kim
Dongsuk Jeon
39
0
0
28 Feb 2025
Plan2Align: Predictive Planning Based Test-Time Preference Alignment in Paragraph-Level Machine Translation
Kuang-Da Wang
Teng-Ruei Chen
Yu-Heng Hung
Shuoyang Ding
Yueh-Hua Wu
Yu-Chun Wang
Chao-Han Huck Yang
Wen-Chih Peng
Ping-Chun Hsieh
74
0
0
28 Feb 2025
Similarity-Distance-Magnitude Universal Verification
Allen Schmaltz
UQCV
AAML
182
0
0
27 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
90
3
0
26 Feb 2025
Stay Focused: Problem Drift in Multi-Agent Debate
Jonas Becker
Lars Benedikt Kaesberg
Andreas Stephan
Jan Philip Wahle
Terry Ruas
Bela Gipp
59
1
0
26 Feb 2025
ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction
Jeesu Jung
Chanjun Park
Sangkeun Jung
76
0
0
26 Feb 2025
Learning to Generate Structured Output with Schema Reinforcement Learning
Yunfan LU
Haolun Li
Xin Cong
Zhong Zhang
Yesai Wu
Yankai Lin
Zhiyuan Liu
Fangming Liu
Maosong Sun
47
1
0
26 Feb 2025
BIG-Bench Extra Hard
Mehran Kazemi
Bahare Fatemi
Hritik Bansal
John Palowitch
Chrysovalantis Anastasiou
...
Kate Olszewska
Yi Tay
Vinh Q. Tran
Quoc V. Le
Orhan Firat
ELM
LRM
122
6
0
26 Feb 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Siyuan Zhang
Y. Zhang
Yinpeng Dong
Hang Su
HILM
KELM
227
0
0
26 Feb 2025
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions
Gyeongje Cho
Yeonkyoung So
Jaejin Lee
ELM
62
0
0
26 Feb 2025
Shh, don't say that! Domain Certification in LLMs
Cornelius Emde
Alasdair Paren
Preetham Arvind
Maxime Kayser
Tom Rainforth
Thomas Lukasiewicz
Guohao Li
Philip Torr
Adel Bibi
53
1
0
26 Feb 2025
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
M. Lee
Shinbok Lee
Gaeun Seo
98
1
0
26 Feb 2025
Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions
Yizhe Zhang
Richard He Bai
Zijin Gu
Ruixiang Zhang
Jiatao Gu
Emmanuel Abbe
Samy Bengio
Navdeep Jaitly
LRM
BDL
70
1
0
25 Feb 2025
Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support
G. Wang
Minyu Gao
Shuai Yang
Ya Zhang
Lizhi He
...
Yexuan Zhang
Wanyue Li
Lu Chen
Jintao Fei
Xin Li
146
1
0
25 Feb 2025
A General Framework to Enhance Fine-tuning-based LLM Unlearning
J. Ren
Zhenwei Dai
Xianfeng Tang
Hui Liu
Jingying Zeng
...
R. Goutam
Suhang Wang
Yue Xing
Qi He
Hui Liu
MU
163
1
0
25 Feb 2025
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Matthew Barker
Andrew Bell
Evan Thomas
James Carr
Thomas Andrews
Umang Bhatt
87
1
0
25 Feb 2025
Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews
Mengqiao Liu
Tevin Wang
Cassandra A. Cohen
Sarah Li
Chenyan Xiong
LRM
74
0
0
24 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
71
8
0
24 Feb 2025
When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models
Weilan Wang
Yu Mao
Dongdong Tang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
67
1
0
24 Feb 2025
LongAttn: Selecting Long-context Training Data via Token-level Attention
Longyun Wu
Dawei Zhu
Guangxiang Zhao
Zhuocheng Yu
Junfeng Ran
Xiangyu Wong
Lin Sun
Sujian Li
48
0
0
24 Feb 2025
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
Marthe Ballon
Andres Algaba
Vincent Ginis
LRM
ReLM
44
5
0
24 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
71
0
0
24 Feb 2025
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
67
0
0
24 Feb 2025
R-LoRA: Random Initialization of Multi-Head LoRA for Multi-Task Learning
Jinda Liu
Yi-Ju Chang
Yuan Wu
59
0
0
24 Feb 2025
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Raeid Saqur
Anastasis Kratsios
Florian Krach
Yannick Limmer
Jacob-Junqi Tian
John Willes
Blanka Horvath
Frank Rudzicz
MoE
53
0
0
24 Feb 2025
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
Shahriar Kabir Nahin
R. N. Nandi
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Kowsher
Apu Chandraw Shill
Md Ibrahim
Mehadi Hasan Menon
Tareq Al Muntasir
Firoj Alam
68
0
0
24 Feb 2025
Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
Martin Kuo
Jingyang Zhang
Jianyi Zhang
Minxue Tang
Louis DiValentin
...
William Chen
Amin Hass
Tianlong Chen
Yuxiao Chen
Yiming Li
MU
KELM
51
2
0
24 Feb 2025
Previous
1
2
3
4
5
...
16
17
18
Next