ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
Wentao Ge
Shunian Chen
Guiming Hardy Chen
Zhihong Chen
Junying Chen
...
Anningzhe Gao
Zhiyi Zhang
Jianquan Li
Xiang Wan
Benyou Wang
MLLM
101
6
0
23 Nov 2023
LM-Cocktail: Resilient Tuning of Language Models via Model Merging
LM-Cocktail: Resilient Tuning of Language Models via Model Merging
Shitao Xiao
Zheng Liu
Peitian Zhang
Xingrun Xing
MoMeKELM
158
28
0
22 Nov 2023
On the Calibration of Large Language Models and Alignment
On the Calibration of Large Language Models and Alignment
Chiwei Zhu
Benfeng Xu
Quan Wang
Yongdong Zhang
Zhendong Mao
157
45
0
22 Nov 2023
ComPEFT: Compression for Communicating Parameter Efficient Updates via
  Sparsification and Quantization
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization
Prateek Yadav
Leshem Choshen
Colin Raffel
Mohit Bansal
89
14
0
22 Nov 2023
LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms
LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms
Aditi Jha
Sam Havens
Jeremey Dohmann
Alex Trott
Jacob P. Portes
ALM
50
11
0
22 Nov 2023
GAIA: a benchmark for General AI Assistants
GAIA: a benchmark for General AI Assistants
Grégoire Mialon
Clémentine Fourrier
Craig Swift
Thomas Wolf
Yann LeCun
Thomas Scialom
AI4MHALMELMRALM
98
186
0
21 Nov 2023
Data Diversity Matters for Robust Instruction Tuning
Data Diversity Matters for Robust Instruction Tuning
Alexander Bukharin
Tuo Zhao
164
44
0
21 Nov 2023
How Far Have We Gone in Vulnerability Detection Using Large Language
  Models
How Far Have We Gone in Vulnerability Detection Using Large Language Models
Zeyu Gao
Hao Wang
Yuchen Zhou
Wenyu Zhu
Chao Zhang
81
22
0
21 Nov 2023
AcademicGPT: Empowering Academic Research
AcademicGPT: Empowering Academic Research
Shufa Wei
Xiaolong Xu
Xianbiao Qi
Xi Yin
Jun Xia
...
Chihao Dai
Lihua Wang
Xiaohui Liu
Lei Zhang
Yutao Xie
LM&MA
84
3
0
21 Nov 2023
ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for
  Interdisciplinary Science
ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science
Sai Munikoti
Anurag Acharya
S. Wagle
Sameera Horawalavithana
RALM
92
8
0
21 Nov 2023
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient
  Language Model Finetuning
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
Han Guo
P. Greengard
Eric P. Xing
Yoon Kim
MQ
146
57
0
20 Nov 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MHELM
148
738
0
20 Nov 2023
Igniting Language Intelligence: The Hitchhiker's Guide From
  Chain-of-Thought Reasoning to Language Agents
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Zhuosheng Zhang
Yao Yao
Aston Zhang
Xiangru Tang
Xinbei Ma
...
Yiming Wang
Mark B. Gerstein
Rui Wang
Gongshen Liu
Hai Zhao
LLMAGLM&RoLRM
151
61
0
20 Nov 2023
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Yiming Wang
Yu Lin
Xiaodong Zeng
Guannan Zhang
MoMe
142
21
0
20 Nov 2023
TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in
  LLMs through Translation-Assisted Chain-of-Thought Processes
TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes
Bibek Upadhayay
Vahid Behzadan
73
11
0
17 Nov 2023
Advancements in Generative AI: A Comprehensive Review of GANs, GPT,
  Autoencoders, Diffusion Model, and Transformers
Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders, Diffusion Model, and Transformers
Staphord Bengesi
Hoda El-Sayed
Md Kamruzzaman Sarker
Yao Houkpati
John Irungu
T. Oladunni
128
93
0
17 Nov 2023
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in
  Psychology
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology
Junlei Zhang
Hongliang He
Nirui Song
Zhanchao Zhou
Shuyuan He
...
Huachuan Qiu
Anqi Li
Yong Dai
Lizhi Ma
Zhenzhong Lan
CoGeELMLRM
94
2
0
16 Nov 2023
MedAgents: Large Language Models as Collaborators for Zero-shot Medical
  Reasoning
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Xiangru Tang
Anni Zou
Zhuosheng Zhang
Ziming Li
Yilun Zhao
Xingyao Zhang
Arman Cohan
Mark B. Gerstein
LRMLM&MA
159
172
0
16 Nov 2023
Investigating Data Contamination in Modern Benchmarks for Large Language
  Models
Investigating Data Contamination in Modern Benchmarks for Large Language Models
Chunyuan Deng
Yilun Zhao
Xiangru Tang
Mark B. Gerstein
Arman Cohan
AAMLELM
102
63
0
16 Nov 2023
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
Junying Chen
Xidong Wang
Anningzhe Gao
Feng Jiang
Shunian Chen
...
Chuyi Kong
Jianquan Li
Xiang Wan
Haizhou Li
Benyou Wang
LM&MA
76
68
0
16 Nov 2023
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Whispers of Doubt Amidst Echoes of Triumph in NLP Robustness
Ashim Gupta
Rishanth Rajendhran
Nathan Stringham
Vivek Srikumar
Ana Marasović
AAML
88
3
0
16 Nov 2023
R-Tuning: Instructing Large Language Models to Say `I Don't Know'
R-Tuning: Instructing Large Language Models to Say `I Don't Know'
Hanning Zhang
Shizhe Diao
Yong Lin
Yi R. Fung
Qing Lian
Xingyao Wang
Yangyi Chen
Heng Ji
Tong Zhang
UQLM
137
47
0
16 Nov 2023
Bergeron: Combating Adversarial Attacks through a Conscience-Based
  Alignment Framework
Bergeron: Combating Adversarial Attacks through a Conscience-Based Alignment Framework
Matthew Pisano
Peter Ly
Abraham Sanders
Bingsheng Yao
Dakuo Wang
T. Strzalkowski
Mei Si
AAML
75
26
0
16 Nov 2023
Digital Socrates: Evaluating LLMs through Explanation Critiques
Digital Socrates: Evaluating LLMs through Explanation Critiques
Yuling Gu
Oyvind Tafjord
Peter Clark
ELMLRM
77
2
0
16 Nov 2023
A Speed Odyssey for Deployable Quantization of LLMs
A Speed Odyssey for Deployable Quantization of LLMs
Qingyuan Li
Ran Meng
Yiduo Li
Bo Zhang
Liang Li
Yifan Lu
Xiangxiang Chu
Yerui Sun
Yuchen Xie
MQ
92
8
0
16 Nov 2023
Stealthy and Persistent Unalignment on Large Language Models via
  Backdoor Injections
Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
Yuanpu Cao
Bochuan Cao
Jinghui Chen
99
28
0
15 Nov 2023
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large
  Language Models
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
Fangzhi Xu
Zhiyong Wu
Qiushi Sun
Siyu Ren
Fei Yuan
Shuai Yuan
Qika Lin
Yu Qiao
Jun Liu
LLMAG
108
37
0
15 Nov 2023
PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large
  Language Models
PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large Language Models
Haoan Jin
Siyuan Chen
Dilawaier Dilixiati
Yewei Jiang
Mengyue Wu
Ke Zhu
ELMAI4MHLM&MA
79
4
0
15 Nov 2023
CLEAN-EVAL: Clean Evaluation on Contaminated Large Language Models
CLEAN-EVAL: Clean Evaluation on Contaminated Large Language Models
Wenhong Zhu
Hong-ping Hao
Zhiwei He
Yun-Ze Song
Yumeng Zhang
Hanxu Hu
Yiran Wei
Rui Wang
Hongyuan Lu
AAMLELM
52
12
0
15 Nov 2023
How Well Do Large Language Models Truly Ground?
How Well Do Large Language Models Truly Ground?
Hyunji Lee
Se June Joo
Chaeeun Kim
Joel Jang
Doyoung Kim
Kyoung-Woon On
Minjoon Seo
HILM
95
8
0
15 Nov 2023
GRASP: A novel benchmark for evaluating language GRounding And Situated
  Physics understanding in multimodal language models
GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models
Serwan Jassim
Mario S. Holubar
Annika Richter
Cornelius Wolff
Xenia Ohmer
Elia Bruni
ELM
94
14
0
15 Nov 2023
MELA: Multilingual Evaluation of Linguistic Acceptability
MELA: Multilingual Evaluation of Linguistic Acceptability
Ziyin Zhang
Yikang Liu
Wei-Ping Huang
Junyu Mao
Rui Wang
Hai Hu
74
3
0
15 Nov 2023
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence
  Estimation
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation
Vaishnavi Shrivastava
Percy Liang
Ananya Kumar
82
32
0
15 Nov 2023
Towards Publicly Accountable Frontier LLMs: Building an External
  Scrutiny Ecosystem under the ASPIRE Framework
Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework
Markus Anderljung
Everett Thornton Smith
Joe O'Brien
Lisa Soder
Ben Bucknall
Emma Bluemke
Jonas Schuett
Robert F. Trager
Lacey Strahm
Rumman Chowdhury
111
18
0
15 Nov 2023
Routing to the Expert: Efficient Reward-guided Ensemble of Large
  Language Models
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Keming Lu
Hongyi Yuan
Runji Lin
Junyang Lin
Zheng Yuan
Chang Zhou
Jingren Zhou
MoELRM
105
63
0
15 Nov 2023
Safer-Instruct: Aligning Language Models with Automated Preference Data
Safer-Instruct: Aligning Language Models with Automated Preference Data
Taiwei Shi
Kai Chen
Jieyu Zhao
ALMSyDa
115
28
0
15 Nov 2023
Are You Sure? Challenging LLMs Leads to Performance Drops in The
  FlipFlop Experiment
Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment
Philippe Laban
Lidiya Murakhovs'ka
Caiming Xiong
Chien-Sheng Wu
LRM
93
23
0
14 Nov 2023
CodeScope: An Execution-based Multilingual Multitask Multidimensional
  Benchmark for Evaluating LLMs on Code Understanding and Generation
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
Weixiang Yan
Haitian Liu
Yunkun Wang
Yunzhe Li
Qian Chen
...
Tingyu Lin
Weishan Zhao
Li Zhu
Hari Sundaram
Shuiguang Deng
ELMLRM
144
37
0
14 Nov 2023
Efficient Continual Pre-training for Building Domain Specific Large
  Language Models
Efficient Continual Pre-training for Building Domain Specific Large Language Models
Yong Xie
Karan Aggarwal
Aitzaz Ahmad
CLL
105
24
0
14 Nov 2023
Towards Open-Ended Visual Recognition with Large Language Model
Towards Open-Ended Visual Recognition with Large Language Model
Qihang Yu
Xiaohui Shen
Liang-Chieh Chen
VLM
74
8
0
14 Nov 2023
A Survey of Confidence Estimation and Calibration in Large Language
  Models
A Survey of Confidence Estimation and Calibration in Large Language Models
Jiahui Geng
Fengyu Cai
Yuxia Wang
Heinz Koeppl
Preslav Nakov
Iryna Gurevych
UQCV
148
82
0
14 Nov 2023
How Well Do Large Language Models Understand Syntax? An Evaluation by
  Asking Natural Language Questions
How Well Do Large Language Models Understand Syntax? An Evaluation by Asking Natural Language Questions
Houquan Zhou
Yang Hou
Zhenghua Li
Xuebin Wang
Zhefeng Wang
Xinyu Duan
Min Zhang
ELM
81
5
0
14 Nov 2023
SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training
  with Adversarial Remarks
SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks
Mengsay Loem
Masahiro Kaneko
Naoaki Okazaki
LRM
68
5
0
14 Nov 2023
Fair Abstractive Summarization of Diverse Perspectives
Fair Abstractive Summarization of Diverse Perspectives
Yusen Zhang
Nan Zhang
Yixin Liu
Alexander R. Fabbri
Junru Liu
...
Caiming Xiong
Jieyu Zhao
Dragomir R. Radev
Kathleen McKeown
Rui Zhang
79
11
0
14 Nov 2023
Towards the Law of Capacity Gap in Distilling Language Models
Towards the Law of Capacity Gap in Distilling Language Models
Chen Zhang
Dawei Song
Zheyu Ye
Yan Gao
ELM
74
22
0
13 Nov 2023
BizBench: A Quantitative Reasoning Benchmark for Business and Finance
BizBench: A Quantitative Reasoning Benchmark for Business and Finance
Rik Koncel-Kedziorski
Michael Krumdick
Viet Dac Lai
Varshini Reddy
Charles Lovering
Chris Tanner
AIMat
52
8
0
11 Nov 2023
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Weiyang Liu
Zeju Qiu
Yao Feng
Yuliang Xiu
Yuxuan Xue
...
Songyou Peng
Yandong Wen
Michael J. Black
Adrian Weller
Bernhard Schölkopf
111
72
0
10 Nov 2023
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training
  Regime and Better Alignment to Human Preferences
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences
Yuanhe Tian
Ruyi Gan
Yan Song
Jiaxing Zhang
Yongdong Zhang
AI4MHAI4CELM&MA
129
41
0
10 Nov 2023
CFBenchmark: Chinese Financial Assistant Benchmark for Large Language
  Model
CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model
Yang Lei
Jiangtong Li
Dawei Cheng
Zhijun Ding
Changjun Jiang
47
11
0
10 Nov 2023
Removing RLHF Protections in GPT-4 via Fine-Tuning
Removing RLHF Protections in GPT-4 via Fine-Tuning
Qiusi Zhan
Richard Fang
R. Bindu
Akul Gupta
Tatsunori Hashimoto
Daniel Kang
MUAAML
101
104
0
09 Nov 2023
Previous
123...575859...676869
Next