Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus
Arman Zharmagambetov
Chuan Guo
Brandon Amos
Yuandong Tian
AAML
145
67
0
21 Apr 2024
Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions
Soumyadeep Roy
A. Khatua
Fatemeh Ghoochani
Uwe Hadler
Wolfgang Nejdl
Niloy Ganguly
ELM
LM&MA
86
11
0
20 Apr 2024
Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration
Yi-Chong Huang
Xiaocheng Feng
Baohang Li
Yang Xiang
Hui Wang
Bing Qin
Ting Liu
FedML
97
30
0
19 Apr 2024
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Chao Jin
Zili Zhang
Xuanlin Jiang
Fangyue Liu
Xin Liu
Xuanzhe Liu
Xin Jin
118
47
0
18 Apr 2024
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Aitor Ormazabal
Che Zheng
Cyprien de Masson dÁutume
Dani Yogatama
Deyu Fu
...
Yazheng Yang
Yi Tay
Yuqi Wang
Zhongkai Zhu
Zhihui Xie
LRM
VLM
ReLM
98
52
0
18 Apr 2024
Large Language Models in Targeted Sentiment Analysis
Nicolay Rusnachenko
A. Golubev
Natalia Loukachevitch
LRM
70
3
0
18 Apr 2024
From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
Xenia Ohmer
Elia Bruni
Dieuwke Hupkes
AI4CE
113
7
0
18 Apr 2024
CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment
Geyu Lin
Bin Wang
Zhengyuan Liu
Nancy F. Chen
148
8
0
18 Apr 2024
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
Minbeom Kim
Hwanhee Lee
Joonsuk Park
Hwaran Lee
Kyomin Jung
122
3
0
18 Apr 2024
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey
Tula Masterman
Sandi Besen
Mason Sawtell
Alex Chao
LM&Ro
LLMAG
114
58
0
17 Apr 2024
Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization
Costas Mavromatis
Petros Karypis
George Karypis
MoMe
79
30
0
17 Apr 2024
Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models
Yue Zhou
Yada Zhu
Diego Antognini
Yoon Kim
Yang Zhang
ReLM
LRM
41
3
0
17 Apr 2024
AgentKit: Flow Engineering with Graphs, not Coding
Yue Wu
Yewen Fan
So Yeon Min
Shrimai Prabhumoye
Stephen Marcus McAleer
Yonatan Bisk
Ruslan Salakhutdinov
Yuanzhi Li
Tom Michael Mitchell
AI4CE
104
1
0
17 Apr 2024
ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models
Trong-Hieu Nguyen
Anh-Cuong Le
Viet-Cuong Nguyen
60
1
0
17 Apr 2024
A Survey on Retrieval-Augmented Text Generation for Large Language Models
Yizheng Huang
Jimmy X. Huang
3DV
RALM
154
51
0
17 Apr 2024
SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs
Jaehyung Kim
Jaehyun Nam
Sangwoo Mo
Jongjin Park
Sang-Woo Lee
Minjoon Seo
Jung-Woo Ha
Jinwoo Shin
AIFin
RALM
ELM
121
51
0
17 Apr 2024
Self-playing Adversarial Language Game Enhances LLM Reasoning
Pengyu Cheng
Tianhao Hu
Han Xu
Zhisong Zhang
Yong Dai
Lei Han
Nan Du
Nan Du
Xiaolong Li
SyDa
LRM
ReLM
193
38
0
16 Apr 2024
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Haozheng Fan
Hao Zhou
Guangtai Huang
Parameswaran Raman
Xinwei Fu
Gaurav Gupta
Dhananjay Ram
Yida Wang
Jun Huan
81
6
0
16 Apr 2024
DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion
Yu Li
Zhihua Wei
Han Jiang
Chuanyang Gong
LLMSV
84
3
0
16 Apr 2024
Compression Represents Intelligence Linearly
Yuzhen Huang
Jinghan Zhang
Zifei Shan
Junxian He
82
29
0
15 Apr 2024
Resilience of Large Language Models for Noisy Instructions
Bin Wang
Chengwei Wei
Zhengyuan Liu
Geyu Lin
Nancy F. Chen
142
15
0
15 Apr 2024
Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model
Hyunsoo Cho
ALM
31
0
0
15 Apr 2024
LoRA Dropout as a Sparsity Regularizer for Overfitting Control
Yang Lin
Xinyu Ma
Xu Chu
Yujie Jin
Zhibang Yang
Yasha Wang
Hong-yan Mei
97
27
0
15 Apr 2024
Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Siyan Zhao
Daniel Israel
Guy Van den Broeck
Aditya Grover
KELM
VLM
73
6
0
15 Apr 2024
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski
Boris Shaposhnikov
Alexey Malakhov
Nikita Surnachev
Yaroslav Aksenov
Ian Maksimov
Nikita Balagansky
Daniil Gavrilov
OffRL
131
35
0
15 Apr 2024
LLeMpower: Understanding Disparities in the Control and Access of Large Language Models
Vishwas Sathish
Hannah Lin
Aditya K Kamath
Anish Nyayachavadi
86
5
0
14 Apr 2024
Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions
Taojun Hu
Xiao-Hua Zhou
ELM
88
18
0
14 Apr 2024
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation
Ruixin Yang
Dheeraj Rajagopal
S. Hayati
Bin Hu
Dongyeop Kang
LLMAG
136
7
0
14 Apr 2024
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts
Yusheng Liao
Shuyang Jiang
Yu Wang
Yanfeng Wang
MoE
116
5
0
13 Apr 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
97
33
0
12 Apr 2024
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
Xinpeng Wang
Chengzhi Hu
Bolei Ma
Paul Röttger
Barbara Plank
OOD
95
6
0
12 Apr 2024
Do Large Language Models Learn Human-Like Strategic Preferences?
Jesse Roberts
Kyle Moore
Douglas H. Fisher
57
5
0
11 Apr 2024
MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference
Mobashir Sadat
Cornelia Caragea
87
5
0
11 Apr 2024
Rho-1: Not All Tokens Are What You Need
Zheng-Wen Lin
Zhibin Gou
Yeyun Gong
Xiao Liu
Yelong Shen
...
Chen Lin
Yujiu Yang
Jian Jiao
Nan Duan
Weizhu Chen
CLL
160
75
0
11 Apr 2024
Post-Hoc Reversal: Are We Selecting Models Prematurely?
Rishabh Ranjan
Saurabh Garg
Mrigank Raman
Carlos Guestrin
Zachary Chase Lipton
78
0
0
11 Apr 2024
UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs
Chaoqun He
Renjie Luo
Shengding Hu
Yuanqian Zhao
Jie Zhou
Hanghao Wu
Jiajie Zhang
Xu Han
Zhiyuan Liu
Maosong Sun
ELM
62
17
0
11 Apr 2024
MM-PhyQA: Multimodal Physics Question-Answering With Multi-Image CoT Prompting
Avinash Anand
Janak Kapuriya
Apoorv Singh
Jay Saraf
Naman Lal
Astha Verma
Rushali Gupta
R. Shah
LRM
46
15
0
11 Apr 2024
Scalable Language Model with Generalized Continual Learning
Bohao Peng
Zhuotao Tian
Shu Liu
Mingchang Yang
Jiaya Jia
ALM
CLL
KELM
89
18
0
11 Apr 2024
CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers
Longwei Zou
Qingyang Wang
Han Zhao
Jiangang Kong
Yi Yang
Yangdong Deng
107
0
0
10 Apr 2024
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Yu Ying Chiu
Amirhossein Ajalloeian
Maria Antoniak
Chan Young Park
Shuyue Stella Li
Mehar Bhatia
Sahithya Ravi
Yulia Tsvetkov
Vered Shwartz
Yejin Choi
87
23
0
10 Apr 2024
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Kehua Feng
Keyan Ding
Hongzhi Tan
Kede Ma
Zhihua Wang
...
Yuzhou Cheng
Ge Sun
Guozhou Zheng
Qiang Zhang
H. Chen
128
13
0
10 Apr 2024
Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?
Omid Ghahroodi
Marzia Nouri
Mohammad V. Sanian
Alireza Sahebi
D. Dastgheib
Ehsaneddin Asgari
M. Baghshah
M. Rohban
ELM
AAML
85
11
0
09 Apr 2024
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
Chonghua Wang
Haodong Duan
Songyang Zhang
Dahua Lin
Kai-xiang Chen
ELM
82
23
0
09 Apr 2024
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu
Yuge Tu
Xu Han
Chaoqun He
Ganqu Cui
...
Chaochao Jia
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
MoE
131
347
0
09 Apr 2024
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Zhengran Zeng
Wei Ye
Jindong Wang
Yue Zhang
Shikun Zhang
63
3
0
09 Apr 2024
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
Michael Lutz
Arth Bohra
Manvel Saroyan
Artem Harutyunyan
Giovanni Campagna
LLMAG
63
15
0
08 Apr 2024
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge
Weikai Lu
Huiping Zhuang
Jianwei Wang
Zhengdong Lu
Zelin Chen
Huiping Zhuang
Cen Chen
MU
AAML
KELM
88
30
0
08 Apr 2024
CodecLM: Aligning Language Models with Tailored Synthetic Data
Zifeng Wang
Chun-Liang Li
Vincent Perot
Long T. Le
Jin Miao
Zizhao Zhang
Chen-Yu Lee
Tomas Pfister
SyDa
ALM
73
21
0
08 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
167
39
0
08 Apr 2024
PORTULAN ExtraGLUE Datasets and Models: Kick-starting a Benchmark for the Neural Processing of Portuguese
T. Osório
Bernardo Leite
Henrique Lopes Cardoso
Luís Gomes
João Rodrigues
Rodrigo Santos
António Branco
96
3
0
08 Apr 2024
Previous
1
2
3
...
45
46
47
...
67
68
69
Next