Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
AI Benchmarks and Datasets for LLM Evaluation
Todor Ivanov
Valeri Penchev
157
2
0
02 Dec 2024
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Marco Federici
Davide Belli
M. V. Baalen
Amir Jalalirad
Andrii Skliar
Bence Major
Markus Nagel
Paul N. Whatmough
212
1
0
02 Dec 2024
Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor
Ashwin Baluja
103
3
0
01 Dec 2024
TAROT: Targeted Data Selection via Optimal Transport
Lan Feng
Fan Nie
Yuejiang Liu
Alexandre Alahi
OT
218
1
0
30 Nov 2024
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Angelika Romanou
Negar Foroutan
Anna Sotnikova
Zeming Chen
Sree Harsha Nelaturu
...
Mike Zhang
Imanol Schlag
Marzieh Fadaee
Sara Hooker
Antoine Bosselut
ELM
181
8
0
29 Nov 2024
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
197
4
0
28 Nov 2024
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Akhiad Bercovich
Tomer Ronen
Talor Abramovich
Nir Ailon
Nave Assaf
...
Ido Shahaf
Oren Tropp
Omer Ullman Argov
Ran Zilberstein
Ran El-Yaniv
213
4
0
28 Nov 2024
FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving
Ao Shen
Zhiyao Li
Mingyu Gao
95
4
0
27 Nov 2024
A gentle push funziona benissimo: making instructed models in Italian via contrastive activation steering
Daniel Scalena
Elisabetta Fersini
Malvina Nissim
LLMSV
131
0
0
27 Nov 2024
Curriculum Demonstration Selection for In-Context Learning
Duc Anh Vu
Nguyen Tran Cong Duy
Xiaobao Wu
Hoang Minh Nhat
Du Mingzhe
Nguyen Thanh Thong
Anh Tuan Luu
137
0
0
27 Nov 2024
Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
Andrii Skliar
T. V. Rozendaal
Romain Lepert
Todor Boinovski
M. V. Baalen
Markus Nagel
Paul N. Whatmough
B. Bejnordi
MoE
174
2
0
27 Nov 2024
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
Vladimir Malinovskii
Andrei Panferov
Ivan Ilin
Han Guo
Peter Richtárik
Dan Alistarh
MQ
165
7
0
26 Nov 2024
Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning
Zhu Xu
Zhiqiang Zhao
Zihan Zhang
Yuchi Liu
Quanwei Shen
Fei Liu
Yu Kuang
Jian He
Conglin Liu
187
2
0
26 Nov 2024
BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Yilong Zhao
Shuo Yang
Kan Zhu
Lianmin Zheng
Baris Kasikci
Yang Zhou
Jiarong Xing
Ion Stoica
230
7
0
25 Nov 2024
Predicting Emergent Capabilities by Finetuning
Charlie Snell
Eric Wallace
Dan Klein
Sergey Levine
ELM
LRM
142
6
0
25 Nov 2024
Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models
Alireza Amiri-Margavi
Iman Jebellat
Ehsan Jebellat
Seyed Pouyan Mousavi Davoudi
165
3
0
25 Nov 2024
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
Xiaoye Qu
Daize Dong
Xuyang Hu
Tong Zhu
Weigao Sun
Yu Cheng
MoE
150
13
0
24 Nov 2024
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset
Tobi Olatunji
Charles Nimo
A. Owodunni
Tassallah Abdullahi
Emmanuel Ayodele
...
Michael Best
Irfan Essa
Stephen E. Moore
Chris Fourie
Mercy Nyamewaa Asiedu
LM&MA
148
3
0
23 Nov 2024
Reassessing Layer Pruning in LLMs: New Insights and Methods
Yao Lu
Hao Cheng
Yujie Fang
Zeyu Wang
Jiaheng Wei
Dongwei Xu
Qi Xuan
Xiaoniu Yang
Zhaowei Zhu
130
4
0
23 Nov 2024
From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
M. Finkelstein
Dan Deutsch
Parker Riley
Juraj Juraska
Geza Kovacs
Markus Freitag
119
0
0
23 Nov 2024
Ex Uno Pluria: Insights on Ensembling in Low Precision Number Systems
G. Nam
Juho Lee
135
0
0
22 Nov 2024
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
Zhaopeng Tu
VLM
267
0
0
21 Nov 2024
Quantization without Tears
Minghao Fu
Hao Yu
Jie Shao
Junjie Zhou
Ke Zhu
Jianxin Wu
MQ
201
3
0
21 Nov 2024
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
Anka Reuel
Amelia F. Hardy
Chandler Smith
Max Lamparth
Malcolm Hardy
Mykel J. Kochenderfer
ELM
199
24
0
20 Nov 2024
Are Large Language Models Memorizing Bug Benchmarks?
Daniel Ramos
Claudia Mamede
Kush Jain
Paulo Canelas
Catarina Gamboa
Claire Le Goues
PILM
ELM
180
10
0
20 Nov 2024
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Davide Paglieri
Bartłomiej Cupiał
Samuel Coward
Ulyana Piterbarg
Maciej Wolczyk
...
Lerrel Pinto
Rob Fergus
Jakob Foerster
Jack Parker-Holder
Tim Rocktaschel
LLMAG
LRM
213
22
0
20 Nov 2024
Loss-to-Loss Prediction: Scaling Laws for All Datasets
David Brandfonbrener
Nikhil Anand
Nikhil Vyas
Eran Malach
Sham Kakade
132
4
0
19 Nov 2024
A Comparative Study of Text Retrieval Models on DaReCzech
Jakub Stetina
Martin Fajcik
Michal Stefanik
Michal Hradis
113
0
0
19 Nov 2024
MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis
Yingjie Zhou
Zicheng Zhang
Jiezhang Cao
Jun Jia
Yanwei Jiang
Farong Wen
Xiaohong Liu
Xiongkuo Min
Guangtao Zhai
108
4
0
18 Nov 2024
VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs
Keer Lu
Keshi Zhao
Zhuoran Zhang
Zheng Liang
Da Pan
...
Xin Wu
Guosheng Dong
Bin Cui
Tengjiao Wang
Wentao Zhang
VLM
CLL
120
0
0
18 Nov 2024
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Blake Bullwinkel
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
164
18
0
18 Nov 2024
AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers
Jake Grigsby
Justin Sasek
Samyak Parajuli
Daniel Adebi
Amy Zhang
Yuke Zhu
OffRL
70
6
0
17 Nov 2024
LLäMmlein: Transparent, Compact and Competitive German-Only Language Models from Scratch
Jan Pfister
Julia Wunderle
Andreas Hotho
133
0
0
17 Nov 2024
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
Jintao Zhang
Haofeng Huang
Pengle Zhang
Jia Wei
Jun-Jie Zhu
Jianfei Chen
MQ
VLM
181
28
0
17 Nov 2024
CODECLEANER: Elevating Standards with A Robust Data Contamination Mitigation Toolkit
Jialun Cao
Songqiang Chen
Wuqi Zhang
Hau Ching Lo
Shing-Chi Cheung
66
1
0
16 Nov 2024
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
Y. Fu
Zhongzhi Yu
Junwei Li
Jiayi Qian
Yongan Zhang
Xiangchi Yuan
Dachuan Shi
Roman Yakunin
Y. Lin
98
4
0
15 Nov 2024
Does Prompt Formatting Have Any Impact on LLM Performance?
Jia He
Mukund Rungta
David Koleczek
Arshdeep Sekhon
Franklin X Wang
Sadid Hasan
LLMAG
LRM
102
59
0
15 Nov 2024
Measuring Non-Adversarial Reproduction of Training Data in Large Language Models
Michael Aerni
Javier Rando
Edoardo Debenedetti
Nicholas Carlini
Daphne Ippolito
F. Tramèr
75
5
0
15 Nov 2024
Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions
Yutao Hou
Yajing Luo
Zhiwen Ruan
Hongru Wang
Weifeng Ge
Yuxiao Chen
Guanhua Chen
ELM
80
0
0
15 Nov 2024
Xmodel-1.5: An 1B-scale Multilingual LLM
Wang Qun
Liu Yang
Lin Qingquan
Jiang Ling
LRM
71
0
0
15 Nov 2024
Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems
Taaha Kazi
Ruiliang Lyu
Sizhe Zhou
Dilek Hakkani-Tur
Gokhan Tur
ELM
LLMAG
68
2
0
15 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
143
93
1
15 Nov 2024
A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld
Emery Cooper
Miles Kodama
Linh Chi Nguyen
Ethan Perez
152
1
0
15 Nov 2024
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
Janghwan Lee
Jiwoong Park
Jinseok Kim
Yongjik Kim
Jungju Oh
Jinwook Oh
Jungwook Choi
80
2
0
15 Nov 2024
A Benchmark for Long-Form Medical Question Answering
Pedram Hosseini
Jessica M. Sin
Bing Ren
Bryceton G. Thomas
Elnaz Nouri
Ali Farahanchi
Saeed Hassanpour
ELM
LM&MA
AI4MH
102
6
0
14 Nov 2024
How do Machine Learning Models Change?
Joel Castaño
Rafael Cabañas
Antonio Salmerón
David Lo
Silverio Martínez-Fernández
VLM
89
0
0
14 Nov 2024
Sparse Upcycling: Inference Inefficient Finetuning
Sasha Doubov
Nikhil Sardana
Vitaliy Chiley
MoE
71
1
0
13 Nov 2024
ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization
Weibo Zhao
Yubin Shi
Xinyu Lyu
Wanchen Sui
Shen Li
Shen Li
MQ
78
1
0
12 Nov 2024
SetLexSem Challenge: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models
Bardiya Akhbari
Manish Gawali
Nicholas A. Dronen
AAML
115
0
0
11 Nov 2024
Transformer verbatim in-context retrieval across time and scale
Kristijan Armeni
Marko Pranjic
Senja Pollak
62
1
0
11 Nov 2024
Previous
1
2
3
...
22
23
24
...
67
68
69
Next