ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
TRAM: Benchmarking Temporal Reasoning for Large Language Models
TRAM: Benchmarking Temporal Reasoning for Large Language Models
Yuqing Wang
Yun Zhao
LRM
111
14
0
02 Oct 2023
FELM: Benchmarking Factuality Evaluation of Large Language Models
FELM: Benchmarking Factuality Evaluation of Large Language Models
Shiqi Chen
Yiran Zhao
Jinghan Zhang
Ethan Chern
Siyang Gao
Pengfei Liu
Junxian He
HILM
126
41
0
01 Oct 2023
Understanding In-Context Learning from Repetitions
Understanding In-Context Learning from Repetitions
Jianhao Yan
Jin Xu
Chiyu Song
Chenming Wu
Yafu Li
Yue Zhang
112
24
0
30 Sep 2023
Self-Specialization: Uncovering Latent Expertise within Large Language
  Models
Self-Specialization: Uncovering Latent Expertise within Large Language Models
Junmo Kang
Hongyin Luo
Yada Zhu
Jacob A. Hansen
James R. Glass
David D. Cox
Alan Ritter
Rogerio Feris
Leonid Karlinsky
ALMMoMe
100
4
0
29 Sep 2023
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks,
  benefits, and alternative methods for pursuing open-source objectives
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives
Elizabeth Seger
Noemi Dreksler
Richard Moulange
Emily Dardaman
Jonas Schuett
...
Emma Bluemke
Michael Aird
Patrick Levermore
Julian Hazell
Abhishek Gupta
74
43
0
29 Sep 2023
LoRA ensembles for large language model fine-tuning
LoRA ensembles for large language model fine-tuning
Xi Wang
Laurence Aitchison
Maja Rudolph
UQCV
111
39
0
29 Sep 2023
Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind
  Aware GPT-4
Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4
Jiaxian Guo
Bo Yang
Paul D. Yoo
Bill Yuchen Lin
Yusuke Iwasawa
Yutaka Matsuo
LLMAG
118
45
0
29 Sep 2023
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
A. Maritan
Jiaao Chen
S. Dey
Luca Schenato
Diyi Yang
Xing Xie
ELMLRM
158
54
0
29 Sep 2023
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Ryan Koo
Minhwa Lee
Vipul Raheja
Jong Inn Park
Zae Myung Kim
Dongyeop Kang
ALM
114
87
0
29 Sep 2023
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zhangyang Wang
74
7
0
29 Sep 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
369
1,921
0
28 Sep 2023
GPT-Fathom: Benchmarking Large Language Models to Decipher the
  Evolutionary Path towards GPT-4 and Beyond
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
Timothée Darcet
Yuyu Zhang
Yijie Zhu
Chenguang Xi
Pengyang Gao
Piotr Bojanowski
Kevin Chen-Chuan Chang
ELM
68
24
0
28 Sep 2023
LawBench: Benchmarking Legal Knowledge of Large Language Models
LawBench: Benchmarking Legal Knowledge of Large Language Models
Zhiwei Fei
Xiaoyu Shen
D. Zhu
Fengzhe Zhou
Zhuo Han
Songyang Zhang
Kai-xiang Chen
Zongwen Shen
Jidong Ge
ELMAILaw
134
46
0
28 Sep 2023
The Confidence-Competence Gap in Large Language Models: A Cognitive
  Study
The Confidence-Competence Gap in Large Language Models: A Cognitive Study
Aniket Kumar Singh
Suman Devkota
Bishal Lamichhane
Uttam Dhakal
Chandra Dhakal
LRM
71
10
0
28 Sep 2023
Large Language Model Routing with Benchmark Datasets
Large Language Model Routing with Benchmark Datasets
Tal Shnitzer
Anthony Ou
Mírian Silva
Kate Soule
Yuekai Sun
Justin Solomon
Neil Thompson
Mikhail Yurochkin
RALM
83
71
0
27 Sep 2023
NLPBench: Evaluating Large Language Models on Solving NLP Problems
NLPBench: Evaluating Large Language Models on Solving NLP Problems
Linxin Song
Jieyu Zhang
Lechao Cheng
Pengyuan Zhou
Dinesh Manocha
Irene Li
ELMLM&MALRM
109
12
0
27 Sep 2023
Beyond the Chat: Executable and Verifiable Text-Editing with LLMs
Beyond the Chat: Executable and Verifiable Text-Editing with LLMs
Philippe Laban
Jesse Vig
Marti A. Hearst
Caiming Xiong
Chien-Sheng Wu
KELM
102
30
0
27 Sep 2023
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models
Jung Hwan Heo
Jeonghoon Kim
Beomseok Kwon
Byeongwook Kim
Se Jung Kwon
Dongsoo Lee
MQ
129
10
0
27 Sep 2023
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Yuhui Xu
Lingxi Xie
Xiaotao Gu
Xin Chen
Heng Chang
Hengheng Zhang
Zhensu Chen
Xiaopeng Zhang
Qi Tian
MQ
82
110
0
26 Sep 2023
Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?
Rui Li
Guoyin Wang
Jiwei Li
LRM
51
14
0
26 Sep 2023
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling
  Capacities of Large Language Models
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Zican Dong
Tianyi Tang
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
RALMALM
152
39
0
23 Sep 2023
AI Risk Profiles: A Standards Proposal for Pre-Deployment AI Risk
  Disclosures
AI Risk Profiles: A Standards Proposal for Pre-Deployment AI Risk Disclosures
E. Sherman
Ian W. Eisenberg
81
6
0
22 Sep 2023
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems
Leonardo Ranaldi
Fabio Massimo Zanzotto
71
3
0
21 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare
  Conversations Powered by Generative AI
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MAELMAI4MH
139
78
0
21 Sep 2023
Code Soliloquies for Accurate Calculations in Large Language Models
Code Soliloquies for Accurate Calculations in Large Language Models
Shashank Sonkar
Myco Le
Xinghe Chen
Naiming Liu
D. B. Mallick
Richard G. Baraniuk
SyDa
60
15
0
21 Sep 2023
AceGPT, Localizing Large Language Models in Arabic
AceGPT, Localizing Large Language Models in Arabic
Huang Huang
Fei Yu
Jianqing Zhu
Xuening Sun
Hao Cheng
...
Lian Zhang
Ruoyu Sun
Xiang Wan
Haizhou Li
Jinchao Xu
158
57
0
21 Sep 2023
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Tianle Li
Siyuan Zhuang
...
Zi Lin
Eric P. Xing
Joseph E. Gonzalez
Ion Stoica
Haotong Zhang
128
221
0
21 Sep 2023
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Nolan Dey
Daria Soboleva
Faisal Al-Khateeb
Bowen Yang
Ribhu Pathria
...
Robert Myers
Jacob Robert Steeves
Natalia Vassilieva
Marvin Tom
Joel Hestness
MoE
87
16
0
20 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
111
199
0
20 Sep 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
SlimPajama-DC: Understanding Data Combinations for LLM Training
Zhiqiang Shen
Tianhua Tao
Liqun Ma
Willie Neiswanger
Zhengzhong Liu
...
Bowen Tan
Joel Hestness
Natalia Vassilieva
Daria Soboleva
Eric Xing
109
50
0
19 Sep 2023
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model
  Pre-trained from Scratch
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch
Juntao Li
Zecheng Tang
Yuyang Ding
Pinzheng Wang
Pei Guo
...
Wenliang Chen
Guohong Fu
Qiaoming Zhu
Guodong Zhou
Hao Fei
109
5
0
19 Sep 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
225
162
0
19 Sep 2023
Estimating Contamination via Perplexity: Quantifying Memorisation in
  Language Model Evaluation
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model Evaluation
Yucheng Li
91
35
0
19 Sep 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELMLRM
332
755
0
19 Sep 2023
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Yadong Lu
Chunyuan Li
Haotian Liu
Jianwei Yang
Jianfeng Gao
Yelong Shen
MLLM
164
31
0
18 Sep 2023
Pruning Large Language Models via Accuracy Predictor
Pruning Large Language Models via Accuracy Predictor
Yupeng Ji
Yibo Cao
Jiu-si Liu
KELM
77
4
0
18 Sep 2023
Can Large Language Models Understand Real-World Complex Instructions?
Can Large Language Models Understand Real-World Complex Instructions?
Qi He
Jie Zeng
Wenhao Huang
Lina Chen
Jin Xiao
...
Shisong Chen
Yikai Zhang
Zhouhong Gu
Jiaqing Liang
Yanghua Xiao
ALMLRMELM
155
59
0
17 Sep 2023
Contrastive Decoding Improves Reasoning in Large Language Models
Contrastive Decoding Improves Reasoning in Large Language Models
Sean O'Brien
Mike Lewis
SyDaLRMReLM
102
39
0
17 Sep 2023
Cross-Lingual Knowledge Editing in Large Language Models
Cross-Lingual Knowledge Editing in Large Language Models
Jiaan Wang
Yunlong Liang
Zengkui Sun
Yu Cao
Jiarong Xu
Fandong Meng
KELM
86
12
0
16 Sep 2023
Rethinking Learning Rate Tuning in the Era of Large Language Models
Rethinking Learning Rate Tuning in the Era of Large Language Models
Hongpeng Jin
Wenqi Wei
Xuyu Wang
Wenbin Zhang
Yanzhao Wu
75
11
0
16 Sep 2023
EchoPrompt: Instructing the Model to Rephrase Queries for Improved
  In-context Learning
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning
Rajasekhar Reddy Mekala
Yasaman Razeghi
Sameer Singh
LRM
91
11
0
16 Sep 2023
Advancing the Evaluation of Traditional Chinese Language Models: Towards
  a Comprehensive Benchmark Suite
Advancing the Evaluation of Traditional Chinese Language Models: Towards a Comprehensive Benchmark Suite
Chan-Jan Hsu
Chang-Le Liu
Feng-Ting Liao
Po-Chun Hsu
Yi-Chang Chen
Da-shan Shiu
ELMALM
58
13
0
15 Sep 2023
Investigating Answerability of LLMs for Long-Form Question Answering
Investigating Answerability of LLMs for Long-Form Question Answering
Meghana Moorthy Bhat
Rui Meng
Ye Liu
Yingbo Zhou
Semih Yavuz
75
11
0
15 Sep 2023
Using Large Language Model to Solve and Explain Physics Word Problems
  Approaching Human Level
Using Large Language Model to Solve and Explain Physics Word Problems Approaching Human Level
Jingzhe Ding
Yan Cen
Xinyuan Wei
AI4CE
98
11
0
15 Sep 2023
Anchor Points: Benchmarking Models with Much Fewer Examples
Anchor Points: Benchmarking Models with Much Fewer Examples
Rajan Vivek
Kawin Ethayarajh
Diyi Yang
Douwe Kiela
ALM
116
28
0
14 Sep 2023
ExpertQA: Expert-Curated Questions and Attributed Answers
ExpertQA: Expert-Curated Questions and Attributed Answers
Chaitanya Malaviya
Subin Lee
Sihao Chen
Elizabeth Sieber
Mark Yatskar
Dan Roth
ELMHILM
116
58
0
14 Sep 2023
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
76
3
0
14 Sep 2023
Pretraining on the Test Set Is All You Need
Pretraining on the Test Set Is All You Need
Rylan Schaeffer
118
30
0
13 Sep 2023
In-Contextual Gender Bias Suppression for Large Language Models
In-Contextual Gender Bias Suppression for Large Language Models
Daisuke Oba
Masahiro Kaneko
Danushka Bollegala
88
9
0
13 Sep 2023
SafetyBench: Evaluating the Safety of Large Language Models
SafetyBench: Evaluating the Safety of Large Language Models
Zhexin Zhang
Leqi Lei
Lindong Wu
Rui Sun
Yongkang Huang
Chong Long
Xiao Liu
Xuanyu Lei
Jie Tang
Minlie Huang
LRMLM&MAELM
134
112
0
13 Sep 2023
Previous
123...616263...676869
Next