Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs
Shuyi Xie
Wenlin Yao
Yong Dai
Shaobo Wang
Donlin Zhou
...
Zhichao Hu
Dong Yu
Zhengyou Zhang
Jing Nie
Yuhong Liu
ELM
ALM
98
4
0
09 Nov 2023
Chain of Images for Intuitively Reasoning
Fanxu Meng
Haotong Yang
Yiding Wang
Muhan Zhang
LRM
78
10
0
09 Nov 2023
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
142
939
0
09 Nov 2023
Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization
Jangwhan Lee
Minsoo Kim
Seungcheol Baek
Seok Joong Hwang
Wonyong Sung
Jungwook Choi
MQ
70
17
0
09 Nov 2023
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
Hongjian Zhou
Fenglin Liu
Boyang Gu
Xinyu Zou
Jinfa Huang
...
Yefeng Zheng
Lei A. Clifton
Zheng Li
Fenglin Liu
David Clifton
LM&MA
175
128
0
09 Nov 2023
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Zhenfang Chen
Rui Sun
Wenjun Liu
Yining Hong
Chuang Gan
LRM
113
15
0
08 Nov 2023
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Shashank Gupta
Vaishnavi Shrivastava
Ameet Deshpande
Ashwin Kalyan
Peter Clark
Ashish Sabharwal
Tushar Khot
210
122
0
08 Nov 2023
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples
Shuo Yang
Wei-Lin Chiang
Lianmin Zheng
Joseph E. Gonzalez
Ion Stoica
ALM
64
129
0
08 Nov 2023
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment
Geyang Guo
Ranchi Zhao
Tianyi Tang
Wayne Xin Zhao
Ji-Rong Wen
ALM
101
32
0
07 Nov 2023
Ziya2: Data-centric Learning is All LLMs Need
Ruyi Gan
Ziwei Wu
Renliang Sun
Junyu Lu
Xiaojun Wu
...
Ping Yang
Qi Yang
Hao Wang
Jiaxing Zhang
Yan Song
VLM
ALM
103
19
0
06 Nov 2023
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Le Yu
Yu Bowen
Haiyang Yu
Fei Huang
Yongbin Li
MoMe
124
337
0
06 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLM
MLLM
176
517
0
06 Nov 2023
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David Wagner
ALM
97
32
0
06 Nov 2023
QualEval: Qualitative Evaluation for Model Improvement
Vishvak Murahari
Ameet Deshpande
Peter Clark
Tanmay Rajpurohit
Ashish Sabharwal
Karthik Narasimhan
Ashwin Kalyan
63
5
0
06 Nov 2023
Post Turing: Mapping the landscape of LLM Evaluation
Alexey Tikhonov
Ivan P. Yamshchikov
ELM
102
4
0
03 Nov 2023
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
196
156
0
03 Nov 2023
AFPQ: Asymmetric Floating Point Quantization for LLMs
Yijia Zhang
Sicheng Zhang
Shijie Cao
Dayou Du
Jianyu Wei
Ting Cao
Ningyi Xu
MQ
58
5
0
03 Nov 2023
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
Jiao Ou
Junda Lu
Che Liu
Yihong Tang
Fuzheng Zhang
Di Zhang
Kun Gai
ALM
LM&MA
104
15
0
03 Nov 2023
Making Harmful Behaviors Unlearnable for Large Language Models
Xin Zhou
Yi Lu
Ruotian Ma
Tao Gui
Qi Zhang
Xuanjing Huang
MU
84
12
0
02 Nov 2023
Effective Human-AI Teams via Learned Natural Language Rules and Onboarding
Hussein Mozannar
Jimin J Lee
Dennis L. Wei
P. Sattigeri
Subhro Das
David Sontag
131
13
0
02 Nov 2023
Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving
Z. Ling
Yunhao Fang
Xuanlin Li
Tongzhou Mu
Mingu Lee
Reza Pourreza
Roland Memisevic
Hao Su
LRM
93
4
0
01 Nov 2023
Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions
Taehyeon Kim
Joonkee Kim
Gihun Lee
Se-Young Yun
103
14
0
01 Nov 2023
Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering
Zhen Guo
Yining Hua
LM&MA
CLL
ALM
AI4MH
65
5
0
01 Nov 2023
ChipNeMo: Domain-Adapted LLMs for Chip Design
Mingjie Liu
Teodor-Dumitru Ene
Robert M. Kirby
Chris Cheng
N. Pinckney
...
Pratik P Suthar
Varun Tej
Walker J. Turner
Kaizhe Xu
Haoxin Ren
184
164
0
31 Oct 2023
Defining a New NLP Playground
Sha Li
Chi Han
Pengfei Yu
Carl Edwards
Manling Li
...
Yi R. Fung
Charles Yu
Joel R. Tetreault
Eduard H. Hovy
Heng Ji
123
5
0
31 Oct 2023
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Simon Lermen
Charlie Rogers-Smith
Jeffrey Ladish
ALM
77
92
0
31 Oct 2023
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu
Quan-Sen Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Yue Cao
Xinlong Wang
Jingjing Liu
VLM
119
62
0
31 Oct 2023
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
Tian Liang
Zhiwei He
Jen-tse Huang
Wenxuan Wang
Wenxiang Jiao
Rui Wang
Yujiu Yang
Zhaopeng Tu
Shuming Shi
Xing Wang
LLMAG
127
5
0
31 Oct 2023
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
Yuxin Jiang
Yufei Wang
Xingshan Zeng
Wanjun Zhong
Liangyou Li
Fei Mi
Lifeng Shang
Xin Jiang
Qun Liu
Wei Wang
ALM
122
32
0
31 Oct 2023
Improving Input-label Mapping with Demonstration Replay for In-context Learning
Zhuocheng Gong
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
Dongyan Zhao
Rui Yan
70
2
0
30 Oct 2023
MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models
Zhenpeng Su
Xing Wu
Xue Bai
Zijia Lin
Hui Chen
Guiguang Ding
Wei Zhou
Songlin Hu
138
5
0
30 Oct 2023
Constituency Parsing using LLMs
Xuefeng Bai
Jialong Wu
Yulong Chen
Zhongqing Wang
Yue Zhang
115
1
0
30 Oct 2023
Skywork: A More Open Bilingual Foundation Model
Tianwen Wei
Liang Zhao
Lichang Zhang
Bo Zhu
Lijie Wang
...
Yongyi Peng
Xiaojuan Liang
Shuicheng Yan
Han Fang
Yahui Zhou
93
102
0
30 Oct 2023
M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
Wai-Chung Kwan
Xingshan Zeng
Yufei Wang
Yusen Sun
Liangyou Li
Lifeng Shang
Qun Liu
Kam-Fai Wong
ELM
152
11
0
30 Oct 2023
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
Nan He
Hanyu Lai
Chenyang Zhao
Zirui Cheng
Junting Pan
...
Zhaohui Hou
Zhiyuan Huang
Shaoqing Lu
Ding Liang
Mingjie Zhan
LRM
68
14
0
29 Oct 2023
Debiasing Algorithm through Model Adaptation
Tomasz Limisiewicz
David Marecek
Tomáš Musil
112
14
0
29 Oct 2023
TarGEN: Targeted Data Generation with Large Language Models
Himanshu Gupta
Kevin Scaria
Ujjwala Anantheswaran
Shreyas Verma
Mihir Parmar
Saurabh Arjun Sawant
Chitta Baral
Swaroop Mishra
SyDa
70
9
0
27 Oct 2023
Evaluation of large language models using an Indian language LGBTI+ lexicon
Aditya Joshi
S. Rawat
A. Dange
33
1
0
26 Oct 2023
In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
T. Ullman
92
4
0
26 Oct 2023
Proving Test Set Contamination in Black Box Language Models
Yonatan Oren
Nicole Meister
Niladri Chatterji
Faisal Ladhak
Tatsunori B. Hashimoto
HILM
124
146
0
26 Oct 2023
An Open Source Data Contamination Report for Large Language Models
Yucheng Li
Frank Guerin
Chenghua Lin
ELM
104
19
0
26 Oct 2023
Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Dingli Yu
Simran Kaur
Arushi Gupta
Jonah Brown-Cohen
Anirudh Goyal
Sanjeev Arora
ALM
LLMAG
78
47
0
26 Oct 2023
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELM
ALM
181
143
0
26 Oct 2023
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Shayne Longpre
Robert Mahari
Anthony Chen
Naana Obeng-Marnu
Damien Sileo
...
K. Bollacker
Tongshuang Wu
Luis Villa
Sandy Pentland
Sara Hooker
95
65
0
25 Oct 2023
SuperHF: Supervised Iterative Learning from Human Feedback
Gabriel Mukobi
Peter Chatain
Su Fong
Robert Windesheim
Gitta Kutyniok
Kush S. Bhatia
Silas Alberti
ALM
92
8
0
25 Oct 2023
OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models
Mingfeng Xue
Dayiheng Liu
Kexin Yang
Guanting Dong
Wenqiang Lei
Zheng Yuan
Chang Zhou
Jingren Zhou
LLMAG
62
3
0
25 Oct 2023
Evaluating, Understanding, and Improving Constrained Text Generation for Large Language Models
Xiang Chen
Xiaojun Wan
55
0
0
25 Oct 2023
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David Evans
Shruti Tople
Robert West
KELM
LLMAG
95
24
0
24 Oct 2023
Self-Guard: Empower the LLM to Safeguard Itself
Zezhong Wang
Fangkai Yang
Lu Wang
Pu Zhao
Hongru Wang
Liang Chen
Qingwei Lin
Kam-Fai Wong
166
35
0
24 Oct 2023
MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Yizhe Yang
Huashan Sun
Jiawei Li
Runheng Liu
Yinghao Li
Yuhang Liu
Heyan Huang
Yang Gao
ALM
LRM
43
10
0
24 Oct 2023
Previous
1
2
3
...
58
59
60
...
67
68
69
Next