Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 875 papers shown
Title
CBQ: Cross-Block Quantization for Large Language Models
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
38
13
0
13 Dec 2023
LLMEval: A Preliminary Study on How to Evaluate Large Language Models
Yue Zhang
Ming Zhang
Haipeng Yuan
Shichun Liu
Yongyao Shi
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
ELM
22
10
0
12 Dec 2023
Rethinking the Instruction Quality: LIFT is What You Need
Yang Xu
Yongqiang Yao
Yufan Huang
Mengnan Qi
Maoquan Wang
Bin Gu
Neel Sundaresan
ALM
27
35
0
12 Dec 2023
Learning to Break: Knowledge-Enhanced Reasoning in Multi-Agent Debate System
Haotian Wang
Xiyuan Du
Weijiang Yu
Qianglong Chen
Kun Zhu
Zheng Chu
Lian Yan
Yi Guan
32
10
0
08 Dec 2023
Efficient Online Data Mixing For Language Model Pre-Training
Alon Albalak
Liangming Pan
Colin Raffel
Wei Wang
30
34
0
05 Dec 2023
The Philosopher's Stone: Trojaning Plugins of Large Language Models
Tian Dong
Minhui Xue
Guoxing Chen
Rayne Holland
Shaofeng Li
Yan Meng
Zhen Liu
Haojin Zhu
AAML
25
11
0
01 Dec 2023
ArcMMLU: A Library and Information Science Benchmark for Large Language Models
Shitou Zhang
Zuchao Li
Xingshen Liu
Liming Yang
Ping Wang
ELM
19
0
0
30 Nov 2023
TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Haotian Wang
Ming Liu
Bing Qin
LRM
ELM
35
9
0
29 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
27
58
0
27 Nov 2023
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
Han Guo
P. Greengard
Eric P. Xing
Yoon Kim
MQ
38
43
0
20 Nov 2023
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Zhuosheng Zhang
Yao Yao
Aston Zhang
Xiangru Tang
Xinbei Ma
...
Yiming Wang
Mark B. Gerstein
Rui Wang
Gongshen Liu
Hai Zhao
LLMAG
LM&Ro
LRM
42
53
0
20 Nov 2023
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
Junying Chen
Xidong Wang
Anningzhe Gao
Feng Jiang
Shunian Chen
...
Chuyi Kong
Jianquan Li
Xiang Wan
Haizhou Li
Benyou Wang
LM&MA
24
60
0
16 Nov 2023
R-Tuning: Instructing Large Language Models to Say `I Don't Know'
Hanning Zhang
Shizhe Diao
Yong Lin
Yi R. Fung
Qing Lian
Xingyao Wang
Yangyi Chen
Heng Ji
Tong Zhang
UQLM
42
38
0
16 Nov 2023
How Well Do Large Language Models Truly Ground?
Hyunji Lee
Se June Joo
Chaeeun Kim
Joel Jang
Doyoung Kim
Kyoung-Woon On
Minjoon Seo
HILM
33
6
0
15 Nov 2023
MELA: Multilingual Evaluation of Linguistic Acceptability
Ziyin Zhang
Yikang Liu
Wei Huang
Junyu Mao
Rui Wang
Hai Hu
30
3
0
15 Nov 2023
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation
Vaishnavi Shrivastava
Percy Liang
Ananya Kumar
28
28
0
15 Nov 2023
Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework
Markus Anderljung
Everett Thornton Smith
Joe O'Brien
Lisa Soder
Ben Bucknall
Emma Bluemke
Jonas Schuett
Robert F. Trager
Lacey Strahm
Rumman Chowdhury
43
17
0
15 Nov 2023
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Keming Lu
Hongyi Yuan
Runji Lin
Junyang Lin
Zheng Yuan
Chang Zhou
Jingren Zhou
MoE
LRM
45
52
0
15 Nov 2023
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences
Yuanhe Tian
Ruyi Gan
Yan Song
Jiaxing Zhang
Yongdong Zhang
AI4MH
AI4CE
LM&MA
27
31
0
10 Nov 2023
Chain of Images for Intuitively Reasoning
Fanxu Meng
Haotong Yang
Yiding Wang
Muhan Zhang
LRM
36
7
0
09 Nov 2023
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples
Shuo Yang
Wei-Lin Chiang
Lianmin Zheng
Joseph E. Gonzalez
Ion Stoica
ALM
27
111
0
08 Nov 2023
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Simon Lermen
Charlie Rogers-Smith
Jeffrey Ladish
ALM
31
83
0
31 Oct 2023
Constituency Parsing using LLMs
Xuefeng Bai
Jialong Wu
Yulong Chen
Zhongqing Wang
Yue Zhang
41
1
0
30 Oct 2023
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELM
ALM
62
110
0
26 Oct 2023
OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models
Mingfeng Xue
Dayiheng Liu
Kexin Yang
Guanting Dong
Wenqiang Lei
Zheng Yuan
Chang Zhou
Jingren Zhou
LLMAG
22
2
0
25 Oct 2023
Exploring the Boundaries of GPT-4 in Radiology
Qianchu Liu
Stephanie L. Hyland
Shruthi Bannur
Kenza Bouzid
Daniel Coelho De Castro
...
Anja Thieme
A. Nori
M. Lungren
Ozan Oktay
Javier Alvarez-Valle
LM&MA
AI4CE
40
36
0
23 Oct 2023
AlpaCare:Instruction-tuned Large Language Models for Medical Application
Xinlu Zhang
Chenxin Tian
Xianjun Yang
Lichang Chen
Zekun Li
Linda R. Petzold
LM&MA
32
59
0
23 Oct 2023
PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain
Wei-wei Zhu
Xiaoling Wang
Huanran Zheng
Mosha Chen
Buzhou Tang
ELM
LM&MA
21
33
0
22 Oct 2023
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
221
198
0
20 Oct 2023
Evaluating the Symbol Binding Ability of Large Language Models for Multiple-Choice Questions in Vietnamese General Education
Duc-Vu Nguyen
Quoc-Nam Nguyen
83
6
0
18 Oct 2023
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
Kai Lv
Hang Yan
Qipeng Guo
Haijun Lv
Xipeng Qiu
ODL
27
20
0
16 Oct 2023
Improving Large Language Model Fine-tuning for Solving Math Problems
Yixin Liu
Avi Singh
C. D. Freeman
John D. Co-Reyes
Peter J. Liu
LRM
ReLM
43
45
0
16 Oct 2023
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models
Yuyang Bai
Shangbin Feng
Vidhisha Balachandran
Zhaoxuan Tan
Shiqi Lou
Tianxing He
Yulia Tsvetkov
ELM
40
2
0
15 Oct 2023
Instruction Tuning with Human Curriculum
Bruce W. Lee
Hyunsoo Cho
Kang Min Yoo
45
3
0
14 Oct 2023
Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model
Qichen Ye
Junling Liu
Dading Chong
Peilin Zhou
Yining Hua
...
Meng Cao
Ziming Wang
Xuxin Cheng
Andrew Liu
Zhenhua Guo
AI4MH
LM&MA
ELM
35
20
0
13 Oct 2023
InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems
Willy Chung
Samuel Cahyawijaya
Bryan Wilie
Holy Lovenia
Pascale Fung
27
5
0
13 Oct 2023
GLoRE: Evaluating Logical Reasoning of Large Language Models
Hanmeng Liu
Zhiyang Teng
Ruoxi Ning
Jian Liu
Qiji Zhou
Yuexin Zhang
Yue Zhang
ReLM
ELM
LRM
70
8
0
13 Oct 2023
A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing
Carlos Gómez-Rodríguez
Paul Williams
29
67
0
12 Oct 2023
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
Zhikai Li
Xiaoxuan Liu
Banghua Zhu
Zhen Dong
Qingyi Gu
Kurt Keutzer
MQ
32
7
0
11 Oct 2023
Deep Variational Multivariate Information Bottleneck -- A Framework for Variational Losses
Eslam Abdelaleem
I. Nemenman
K. M. Martini
30
5
0
05 Oct 2023
JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
Chang Gao
Wenxuan Zhang
Guizhen Chen
Wai Lam
55
5
0
04 Oct 2023
NOLA: Compressing LoRA using Linear Combination of Random Basis
Soroush Abbasi Koohpayegani
K. Navaneet
Parsa Nooralinejad
Soheil Kolouri
Hamed Pirsiavash
40
12
0
04 Oct 2023
FELM: Benchmarking Factuality Evaluation of Large Language Models
Shiqi Chen
Yiran Zhao
Jinghan Zhang
Ethan Chern
Siyang Gao
Pengfei Liu
Junxian He
HILM
41
33
0
01 Oct 2023
LawBench: Benchmarking Legal Knowledge of Large Language Models
Zhiwei Fei
Xiaoyu Shen
D. Zhu
Fengzhe Zhou
Zhuo Han
Songyang Zhang
Kai-xiang Chen
Zongwen Shen
Jidong Ge
ELM
AILaw
36
36
0
28 Sep 2023
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models
Jung Hwan Heo
Jeonghoon Kim
Beomseok Kwon
Byeongwook Kim
Se Jung Kwon
Dongsoo Lee
MQ
43
9
0
27 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
43
66
0
21 Sep 2023
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Nolan Dey
Daria Soboleva
Faisal Al-Khateeb
Bowen Yang
Ribhu Pathria
...
Robert Myers
Jacob Robert Steeves
Natalia Vassilieva
Marvin Tom
Joel Hestness
MoE
27
15
0
20 Sep 2023
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Zenan Zhou
Zhiying Wu
ELM
LRM
77
703
0
19 Sep 2023
Anchor Points: Benchmarking Models with Much Fewer Examples
Rajan Vivek
Kawin Ethayarajh
Diyi Yang
Douwe Kiela
ALM
29
22
0
14 Sep 2023
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
LRM
36
25
0
08 Sep 2023
Previous
1
2
3
...
14
15
16
17
18
Next