Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Refining Time Series Anomaly Detectors using Large Language Models
Alan Yang
Yulin Chen
Sean Lee
Venus Montes
AI4TS
66
0
0
26 Mar 2025
Efficient Model Development through Fine-tuning Transfer
Pin-Jie Lin
Rishab Balasubramanian
Fengyuan Liu
Nikhil Kandpal
Tu Vu
187
2
0
25 Mar 2025
Gemma 3 Technical Report
Gemma Team
Aishwarya B Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
...
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
VLM
193
136
0
25 Mar 2025
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
Haebin Shin
Lei Ji
Xiao Liu
Yeyun Gong
116
0
0
24 Mar 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
202
137
0
24 Mar 2025
A Survey of Large Language Model Agents for Question Answering
Murong Yue
LLMAG
LM&MA
ELM
112
5
0
24 Mar 2025
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
Sky CH-Wang
Darshan Deshpande
Smaranda Muresan
Anand Kannappan
Rebecca Qian
129
1
0
24 Mar 2025
DeLoRA: Decoupling Angles and Strength in Low-rank Adaptation
Massimo Bini
Leander Girrbach
Zeynep Akata
220
1
0
23 Mar 2025
OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery
Vignesh Prabhakar
Md Amirul Islam
Adam Atanas
Yansen Wang
J. N. Han
...
Rucha Apte
Robert Clark
Kang Xu
Zihan Wang
Kai Liu
LRM
226
5
0
22 Mar 2025
ChatBench: From Static Benchmarks to Human-AI Evaluation
Serina Chang
Ashton Anderson
Jake M. Hofman
ELM
AI4MH
95
5
0
22 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
118
1
0
22 Mar 2025
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior
Siyang Song
Xinpeng Wang
Guangyao Zhai
Nassir Navab
Yun Xue
LLMAG
99
0
0
22 Mar 2025
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
Codefuse
Ling Team
Wenting Cai
Yuchen Cao
Cai Chen
...
Wei Zhang
Zhenru Zhang
Hailin Zhao
Xunjin Zheng
Jun Zhou
ALM
MoE
104
1
0
22 Mar 2025
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities
Aly M. Kassem
Bernhard Schölkopf
Zhijing Jin
42
1
0
20 Mar 2025
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
Yunzhi Yao
Jizhan Fang
Jia-Chen Gu
N. Zhang
Shumin Deng
Ningyu Zhang
Nanyun Peng
KELM
117
3
0
20 Mar 2025
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
Yifan Sun
Han Wang
Dongbai Li
Gang Wang
Huan Zhang
AAML
94
0
0
20 Mar 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at
ResearchTrend Connect | LLMAG
on
07 May 2025
200
14
0
20 Mar 2025
Tuning LLMs by RAG Principles: Towards LLM-native Memory
Jiale Wei
Shuchi Wu
Ruochen Liu
Xiang Ying
Jingbo Shang
Fangbo Tao
RALM
102
0
0
20 Mar 2025
Advancing Problem-Based Learning in Biomedical Engineering in the Era of Generative AI
Micky C. Nnamdi
J. Ben Tamo
Wenqi Shi
M. D. Wang
AI4CE
79
0
0
20 Mar 2025
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models
Hong Yi Lin
Chunhua Liu
Haoyu Gao
Patanamon Thongtanunam
Christoph Treude
ELM
91
1
0
20 Mar 2025
Prada: Black-Box LLM Adaptation with Private Data on Resource-Constrained Devices
Ziyi Wang
Yexiao He
Zheyu Shen
Yu Li
Guoheng Sun
Myungjin Lee
Ang Li
89
0
0
19 Mar 2025
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu
Qian Liu
Haonan Wang
Shiqi Chen
Xiangming Gu
Tianyu Pang
Min-Yen Kan
102
0
0
19 Mar 2025
Reasoning Effort and Problem Complexity: A Scaling Analysis in LLMs
Benjamin Estermann
Roger Wattenhofer
LRM
71
2
0
19 Mar 2025
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering
Francesco Maria Molfese
Luca Moroni
Luca Gioffrè
Alessandro Sciré
Simone Conia
Roberto Navigli
ELM
117
2
0
19 Mar 2025
Empowering Smaller Models: Tuning LLaMA and Gemma with Chain-of-Thought for Ukrainian Exam Tasks
Mykyta Syromiatnikov
Victoria Ruvinskaya
Nataliia Komleva
ALM
LRM
95
0
0
18 Mar 2025
On the clustering behavior of sliding windows
Boris Alexeev
Wenyan Luo
Dustin G. Mixon
Yan X Zhang
AI4TS
145
1
0
18 Mar 2025
Mapping the Trust Terrain: LLMs in Software Engineering -- Insights and Perspectives
Dipin Khati
Yijin Liu
David Nader-Palacio
Yixuan Zhang
Denys Poshyvanyk
71
1
0
18 Mar 2025
Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation
Sayak Nag
Udita Ghosh
Sarosij Bose
Calvin-Khang Ta
Jiachen Li
Amit K. Roy-Chowdhury
221
0
0
18 Mar 2025
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence
Sophia Hager
David Mueller
Kevin Duh
Nicholas Andrews
152
1
0
18 Mar 2025
How much do LLMs learn from negative examples?
Shadi S. Hamdan
Deniz Yuret
72
0
0
18 Mar 2025
Measuring AI Ability to Complete Long Tasks
Thomas Kwa
Ben West
Joel Becker
Amy Deng
Katharyn Garcia
...
Lucas Jun Koba Sato
H. Wijk
Daniel M. Ziegler
Elizabeth Barnes
Lawrence Chan
ELM
286
18
0
18 Mar 2025
ConQuer: A Framework for Concept-Based Quiz Generation
Yicheng Fu
Zikui Wang
Liuxin Yang
Meiqing Huo
Zhongdongming Dai
AI4Ed
89
0
0
18 Mar 2025
CARE: A QLoRA-Fine Tuned Multi-Domain Chatbot With Fast Learning On Minimal Hardware
Ankit Dutta
Nabarup Ghosh
Ankush Chatterjee
87
0
0
18 Mar 2025
Triad: Empowering LMM-based Anomaly Detection with Vision Expert-guided Visual Tokenizer and Manufacturing Process
Yuanze Li
Shihao Yuan
Haolin Wang
Qizhang Li
Ming-Yu Liu
Chen Xu
Guangming Shi
Wangmeng Zuo
85
2
0
17 Mar 2025
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Alexander Ku
Declan Campbell
Xuechunzi Bai
Jiayi Geng
Ryan Liu
...
Ilia Sucholutsky
Veniamin Veselovsky
Liyi Zhang
Jian-Qiao Zhu
Thomas L. Griffiths
ELM
154
4
0
17 Mar 2025
ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning
Baohao Liao
Christian Herold
Seyyed Hadi Hashemi
Stefan Vasilev
Shahram Khadivi
Christof Monz
MQ
132
0
0
17 Mar 2025
Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences
Kedi Chen
Zhikai Lei
Fan Zhang
Yinqi Zhang
Qin Chen
Jie Zhou
Liang He
Qipeng Guo
Kai Chen
Wei-na Zhang
ELM
LRM
102
1
0
17 Mar 2025
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
Presented at
ResearchTrend Connect | LLMAG
on
23 Apr 2025
234
39
0
17 Mar 2025
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
157
10
0
17 Mar 2025
MetaScale: Test-Time Scaling with Evolving Meta-Thoughts
Qin Liu
Wenxuan Zhou
Nan Xu
James Y. Huang
Fei Wang
Sheng Zhang
Hoifung Poon
Mengzhao Chen
LLMAG
ReLM
AI4Cl
LRM
159
3
0
17 Mar 2025
Atyaephyra at SemEval-2025 Task 4: Low-Rank Negative Preference Optimization
Jan Bronec
Jindřich Helcl
MU
135
0
0
17 Mar 2025
The Amazon Nova Family of Models: Technical Report and Model Card
Amazon AGI
Aaron Langford
A. Shah
Abhanshu Gupta
Abhimanyu Bhatter
...
Benjamin Biggs
Benjamin Ott
Bhanu Vinzamuri
Bharath Venkatesh
Bhavana Ganesh
26
21
0
17 Mar 2025
A Survey on the Optimization of Large Language Model-based Agents
Shangheng Du
Jiabao Zhao
Jinxin Shi
Zhentao Xie
Xin Jiang
Yanhong Bai
Liang He
LLMAG
LM&Ro
LM&MA
539
5
0
16 Mar 2025
HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs
Tsz Chung Cheng
Chung Shing Cheng
Chaak Ming Lau
Eugene Tin-Ho Lam
Chun Yat Wong
Hoi On Yu
Cheuk Hei Chong
ELM
105
2
0
16 Mar 2025
The Lucie-7B LLM and the Lucie Training Dataset: Open resources for multilingual language generation
Olivier Gouvert
Julie Hunter
Jérôme Louradour
Christophe Cerisara
Evan Dufraisse
Yaya Sy
Laura Rivière
Jean-Pierre Lorré
OpenLLM-France community
454
0
0
15 Mar 2025
TLUE: A Tibetan Language Understanding Evaluation Benchmark
Fan Gao
Cheng Huang
Nyima Tashi
Xiangxiang Wang
Thupten Tsering
...
Gadeng Luosang
Rinchen Dongrub
Dorje Tashi
Xiao Feng
Yongbin Yu
ELM
257
2
0
15 Mar 2025
A Survey on Federated Fine-tuning of Large Language Models
Yebo Wu
Chunlin Tian
Jingguang Li
He Sun
Kahou Tam
Zhanting Zhou
Haicheng Liao
Zhijiang Guo
Li Li
Chengzhong Xu
FedML
154
5
0
15 Mar 2025
Don't Forget It! Conditional Sparse Autoencoder Clamping Works for Unlearning
Matthew Khoriaty
Andrii Shportko
Gustavo Mercier
Zach Wood-Doughty
MU
87
3
0
14 Mar 2025
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Neusha Javidnia
B. Rouhani
F. Koushanfar
554
0
0
14 Mar 2025
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li
Mehdi Rezagholizadeh
Mingyu Yang
Vikram Appia
Emad Barsoum
VLM
106
1
0
14 Mar 2025
Previous
1
2
3
...
13
14
15
...
67
68
69
Next