Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Xi Li
Ruofan Mao
Yusen Zhang
Renze Lou
Chen Wu
Jiaqi Wang
LRM
AAML
108
14
0
10 Jun 2024
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
Xiaoying Zhang
Baolin Peng
Ye Tian
Jingyan Zhou
Yipeng Zhang
Haitao Mi
Helen Meng
CLL
KELM
168
8
0
10 Jun 2024
Zero-Shot End-To-End Spoken Question Answering In Medical Domain
Yanis Labrak
Adel Moumen
Richard Dufour
Mickael Rouvier
ELM
LM&MA
MedIm
79
1
0
09 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
208
44
0
09 Jun 2024
Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts
Metehan Oguz
Yusuf Umut Ciftci
Yavuz Faruk Bakman
71
3
0
08 Jun 2024
A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding
Yiqing Shen
Zan Chen
Michail Mamalakis
Luhan He
Haiyang Xia
Tianbin Li
Yanzhou Su
Junjun He
Yu Guang Wang
AI4MH
150
10
0
08 Jun 2024
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Kalyan Nakka
Jimmy Dani
Nitesh Saxena
171
1
0
08 Jun 2024
A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques
Megh Thakkar
Quentin Fournier
Matthew D Riemer
Pin-Yu Chen
Payel Das
Payel Das
Sarath Chandar
ALM
87
11
0
07 Jun 2024
FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models
Guangyi Liu
Rui Ge
Xinyu Zhu
Jingyi Chai
Yaxin Du
Yang Liu
Yanfeng Wang
Siheng Chen
FedML
113
19
0
07 Jun 2024
Revisiting Catastrophic Forgetting in Large Language Model Tuning
Hongyu Li
Liang Ding
Meng Fang
Dacheng Tao
CLL
KELM
84
19
0
07 Jun 2024
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Bill Yuchen Lin
Yuntian Deng
Khyathi Chandu
Faeze Brahman
Abhilasha Ravichander
Valentina Pyatkin
Nouha Dziri
Ronan Le Bras
Yejin Choi
108
82
0
07 Jun 2024
Large Language Model-guided Document Selection
Xiang Kong
Tom Gunter
Ruoming Pang
70
4
0
07 Jun 2024
Improving Alignment and Robustness with Circuit Breakers
Andy Zou
Long Phan
Justin Wang
Derek Duenas
Maxwell Lin
Maksym Andriushchenko
Rowan Wang
Zico Kolter
Matt Fredrikson
Dan Hendrycks
AAML
147
114
0
06 Jun 2024
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
112
5
0
06 Jun 2024
DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning
Shangqing Tu
Kejian Zhu
Yushi Bai
Zijun Yao
Lei Hou
Juanzi Li
107
7
0
06 Jun 2024
Are We Done with MMLU?
Aryo Pradipta Gema
Joshua Ong Jun Leang
Giwon Hong
Alessio Devoto
Alberto Carlo Maria Mancino
...
R. McHardy
Joshua Harris
Jean Kaddour
Emile van Krieken
Pasquale Minervini
ELM
148
44
0
06 Jun 2024
Uncovering Limitations of Large Language Models in Information Seeking from Tables
Chaoxu Pang
Yixuan Cao
Chunhao Yang
Ping Luo
RALM
LMTD
75
6
0
06 Jun 2024
UltraMedical: Building Specialized Generalists in Biomedicine
Kaiyan Zhang
Sihang Zeng
Ermo Hua
Ning Ding
Zhang-Ren Chen
...
Xuekai Zhu
Xingtai Lv
Hu Jinfang
Zhiyuan Liu
Bowen Zhou
LM&MA
115
33
0
06 Jun 2024
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices
Ruiyang Qin
Dancheng Liu
Zheyu Yan
Zhaoxuan Tan
Zixuan Pan
Zhenge Jia
Meng Jiang
Ahmed Abbasi
Jinjun Xiong
Yiyu Shi
101
15
0
06 Jun 2024
A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions
Lei Liu
Xiaoyan Yang
Junchi Lei
Xiaoyang Liu
Yue Shen
...
Peng Wei
Jinjie Gu
Zhixuan Chu
Zhan Qin
Kui Ren
LM&MA
AILaw
105
19
0
06 Jun 2024
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering
Anand Subramanian
Viktor Schlegel
Abhinav Ramesh Kashyap
Thanh-Tung Nguyen
Vijay Prakash Dwivedi
Stefan Winkler
ELM
LM&MA
AI4MH
66
3
0
06 Jun 2024
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art
Chen Cecilia Liu
Iryna Gurevych
Anna Korhonen
191
6
0
06 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
131
10
0
05 Jun 2024
Does your data spark joy? Performance gains from domain upsampling at the end of training
Cody Blakeney
Mansheej Paul
Brett W. Larsen
Sean Owen
Jonathan Frankle
89
20
0
05 Jun 2024
Cycles of Thought: Measuring LLM Confidence through Stable Explanations
Evan Becker
Stefano Soatto
107
11
0
05 Jun 2024
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Tim Franzmeyer
Aleksandar Shtedritski
Samuel Albanie
Philip Torr
João F. Henriques
Jakob N. Foerster
54
1
0
05 Jun 2024
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
David Ifeoluwa Adelani
Jessica Ojo
Israel Abebe Azime
Jian Yun Zhuang
Jesujoba Oluwadara Alabi
...
Salomey Osei
Sokhar Samb
Tadesse Kebede Guge
Pontus Stenetorp
Pontus Stenetorp
ELM
207
10
0
05 Jun 2024
Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models
Sheng-Lun Wei
Cheng-Kuang Wu
Hen-Hsen Huang
Hsin-Hsi Chen
99
12
0
05 Jun 2024
MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge
Yuxuan Zhou
Xien Liu
Chen Ning
Ji Wu
ELM
83
3
0
05 Jun 2024
Xmodel-LM Technical Report
Yichuan Wang
Yang Liu
Yu Yan
Qun Wang
Xucheng Huang
Ling Jiang
OSLM
ALM
64
1
0
05 Jun 2024
On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept
Guangliang Liu
Haitao Mao
Bochuan Cao
Zhiyu Xue
K. Johnson
Jiliang Tang
Rongrong Wang
LRM
112
10
0
04 Jun 2024
Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks
Andrew Gambardella
Yusuke Iwasawa
Yutaka Matsuo
LRM
56
6
0
04 Jun 2024
Conditional Language Learning with Context
X. Zhang
Miao Li
Ji Wu
96
4
0
04 Jun 2024
Mitigate Position Bias in Large Language Models via Scaling a Single Dimension
Yijiong Yu
Huiqiang Jiang
Xufang Luo
Qianhui Wu
Chin-Yew Lin
Dongsheng Li
Yuqing Yang
Yongfeng Huang
L. Qiu
125
0
0
04 Jun 2024
Pattern Recognition or Medical Knowledge? The Problem with Multiple-Choice Questions in Medicine
Maxime Griot
Jean Vanderdonckt
D. Yüksel
C. Hemptinne
AI4Ed
ELM
LM&MA
129
6
0
04 Jun 2024
JBBQ: Japanese Bias Benchmark for Analyzing Social Biases in Large Language Models
Hitomi Yanaka
Namgi Han
Ryoma Kumon
Jie Lu
Masashi Takeshita
Ryo Sekizawa
Taisei Kato
Hiromi Arai
110
4
0
04 Jun 2024
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Marianna Nezhurina
Lucia Cipolina-Kun
Mehdi Cherti
J. Jitsev
LLMAG
LRM
ELM
ReLM
195
37
0
04 Jun 2024
MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering
Robert Osazuwa Ness
Katie Matton
Hayden Helm
Sheng Zhang
Junaid Bajwa
Carey E. Priebe
Eric Horvitz
ELM
64
13
0
03 Jun 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Yubo Wang
Xueguang Ma
Ge Zhang
Yuansheng Ni
Abhranil Chandra
...
Kai Wang
Alex Zhuang
Rongqi Fan
Xiang Yue
Wenhu Chen
LRM
ELM
179
465
0
03 Jun 2024
LoFiT: Localized Fine-tuning on LLM Representations
Fangcong Yin
Xi Ye
Greg Durrett
106
23
0
03 Jun 2024
Decoupled Alignment for Robust Plug-and-Play Adaptation
Haozheng Luo
Jiahao Yu
Wenxin Zhang
Jialong Li
Jerry Yao-Chieh Hu
Xingyu Xing
Han Liu
108
11
0
03 Jun 2024
SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
Quandong Wang
Yuxuan Yuan
Xiaoyu Yang
Ruike Zhang
Kang Zhao
Wei Liu
Jian Luan
Daniel Povey
Bin Wang
84
0
0
03 Jun 2024
Sparsity-Accelerated Training for Large Language Models
Da Ma
Lu Chen
Pengyu Wang
Hongshen Xu
Hanqi Li
Liangtai Sun
Su Zhu
Shuai Fan
Kai Yu
LRM
62
1
0
03 Jun 2024
Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function
Keyon Vafa
Ashesh Rambachan
S. Mullainathan
ELM
ALM
83
17
0
03 Jun 2024
R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models
Ken Deng
Jiaheng Liu
He Zhu
Congnan Liu
Jingxin Li
...
Yuanxing Zhang
Wenbo Su
Bangyu Xiang
Tiezheng Ge
Bo Zheng
116
4
0
03 Jun 2024
Demonstration Augmentation for Zero-shot In-context Learning
Yi Su
Yunpeng Tai
Yixin Ji
Juntao Li
Bowen Yan
Min Zhang
RALM
104
10
0
03 Jun 2024
Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
Yu-Min Tseng
Yu-Chao Huang
Teng-Yun Hsiao
Yu-Ching Hsu
Chao-Wei Huang
Jia-Yin Foo
Yun-Nung Chen
LLMAG
430
92
0
03 Jun 2024
Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors
Mengge Xue
Zhenyu Hu
Liqun Liu
Kuo Liao
Shuang Li
Honglin Han
Meng Zhao
Chengguo Yin
85
8
0
03 Jun 2024
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Jinjie Ni
Fuzhao Xue
Xiang Yue
Yuntian Deng
Mahir Shah
Kabir Jain
Graham Neubig
Yang You
ELM
82
48
0
03 Jun 2024
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
Tianwen Wei
Bo Zhu
Liang Zhao
Cheng Cheng
Biye Li
...
Yutuan Ma
Rui Hu
Shuicheng Yan
Han Fang
Yahui Zhou
MoE
155
32
0
03 Jun 2024
Previous
1
2
3
...
40
41
42
...
67
68
69
Next