ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
UNO Arena for Evaluating Sequential Decision-Making Capability of Large
  Language Models
UNO Arena for Evaluating Sequential Decision-Making Capability of Large Language Models
Zhanyue Qin
Haochuan Wang
Deyuan Liu
Ziyang Song
Cunhang Fan
...
Zhen Lei
Zhiying Tu
Dianhui Chu
Xiaoyan Yu
Dianbo Sui
ELMLRM
96
2
0
24 Jun 2024
AnnotatedTables: A Large Tabular Dataset with Language Model Annotations
AnnotatedTables: A Large Tabular Dataset with Language Model Annotations
Yaojie Hu
Ilias Fountalis
Jin Tian
N. Vasiloglou
LMTD
77
5
0
24 Jun 2024
Compensate Quantization Errors: Make Weights Hierarchical to Compensate
  Each Other
Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other
Yifei Gao
Jie Ou
Lei Wang
Yuting Xiao
Zhiyuan Xiang
Ruiting Dai
Jun Cheng
MQ
61
3
0
24 Jun 2024
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing
  Backpropagation
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
Yuchen Yang
Yingdong Shi
Cheems Wang
Xiantong Zhen
Yuxuan Shi
Jun Xu
79
3
0
24 Jun 2024
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Deyuan Liu
Zhan Qin
Han Wang
Zhao Yang
Zecheng Wang
...
Zhao Lv
Zhiying Tu
Dianhui Chu
Bo Li
Dianbo Sui
127
2
0
24 Jun 2024
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Somnath Basu Roy Chowdhury
Krzysztof Choromanski
Arijit Sehanobish
Avinava Dubey
Snigdha Chaturvedi
MU
108
10
0
24 Jun 2024
Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Xiaochen Li
Zheng-Xin Yong
Stephen H. Bach
CLL
98
18
0
23 Jun 2024
SEAM: A Stochastic Benchmark for Multi-Document Tasks
SEAM: A Stochastic Benchmark for Multi-Document Tasks
Gili Lior
Avi Caciularu
Arie Cattan
Shahar Levy
Ori Shapira
Gabriel Stanovsky
RALM
82
5
0
23 Jun 2024
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Zezhong Wang
Xingshan Zeng
Weiwen Liu
Yufei Wang
Liangyou Li
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRM
113
4
0
23 Jun 2024
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Lynn Chua
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Pasin Manurangsi
Amer Sinha
Chulin Xie
Chiyuan Zhang
161
2
0
23 Jun 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
Siyang Song
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLMELMLM&MA
169
35
0
23 Jun 2024
The Music Maestro or The Musically Challenged, A Massive Music
  Evaluation Benchmark for Large Language Models
The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models
Jiajia Li
Lu Yang
Mingni Tang
Cong Chen
Zuchao Li
Ping Wang
Hai Zhao
LM&MA
86
6
0
22 Jun 2024
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large
  Language Models without Training through Attention Calibration
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
Zhongzhi Yu
Zheng Wang
Yonggan Fu
Huihong Shi
Khalid Shaikh
Yingyan Celine Lin
118
25
0
22 Jun 2024
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex
  Models
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
Xinrong Zhang
Yingfa Chen
Shengding Hu
Xu Han
Zihang Xu
Yuanwei Xu
Weilin Zhao
Maosong Sun
Zhiyuan Liu
90
11
0
22 Jun 2024
RuleR: Improving LLM Controllability by Rule-based Data Recycling
RuleR: Improving LLM Controllability by Rule-based Data Recycling
Ming Li
Han Chen
Chenguang Wang
Dang Nguyen
Dianqi Li
Dinesh Manocha
147
11
0
22 Jun 2024
Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy
  Retrieval-Augmented Generation
Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation
Yu Bai
Yukai Miao
Li Chen
Dan Li
Yanyu Ren
Hongtao Xie
Ce Yang
Xuhui Cai
62
2
0
21 Jun 2024
Optimised Grouped-Query Attention Mechanism for Transformers
Optimised Grouped-Query Attention Mechanism for Transformers
Yuang Chen
Cheng Zhang
Xitong Gao
Robert D. Mullins
George A. Constantinides
Yiren Zhao
75
9
0
21 Jun 2024
ICLEval: Evaluating In-Context Learning Ability of Large Language Models
ICLEval: Evaluating In-Context Learning Ability of Large Language Models
Wentong Chen
Yankai Lin
ZhenHao Zhou
HongYun Huang
Yantao Jia
Bo Zhao
Ji-Rong Wen
ELM
86
4
0
21 Jun 2024
Data Efficient Evaluation of Large Language Models and Text-to-Image
  Models via Adaptive Sampling
Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling
Cong Xu
Gayathri Saranathan
Mahammad Parwez Alam
Arpit Shah
James Lim
Soon Yee Wong
Foltin Martin
Suparna Bhattacharya
VLM
83
5
0
21 Jun 2024
Efficient Continual Pre-training by Mitigating the Stability Gap
Efficient Continual Pre-training by Mitigating the Stability Gap
Yiduo Guo
Jie Fu
Huishuai Zhang
Dongyan Zhao
Songlin Yang
79
15
0
21 Jun 2024
Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers
Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers
Manuel Mondal
Ljiljana Dolamic
Gérôme Bovet
Philippe Cudré-Mauroux
Julien Audiffren
100
2
0
21 Jun 2024
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Roman Vashurin
Ekaterina Fadeeva
Artem Vazhentsev
Akim Tsvigun
Daniil Vasilev
...
Timothy Baldwin
Timothy Baldwin
Preslav Nakov
Maxim Panov
Artem Shelmanov
HILM
184
28
0
21 Jun 2024
Understanding Finetuning for Factual Knowledge Extraction
Understanding Finetuning for Factual Knowledge Extraction
Gaurav R. Ghosal
Tatsunori Hashimoto
Aditi Raghunathan
85
18
0
20 Jun 2024
MultiAgent Collaboration Attack: Investigating Adversarial Attacks in
  Large Language Model Collaborations via Debate
MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate
Alfonso Amayuelas
Xianjun Yang
Antonis Antoniades
Wenyue Hua
Liangming Pan
William Wang
AAMLLLMAG
84
18
0
20 Jun 2024
Unveiling the Spectrum of Data Contamination in Language Models: A
  Survey from Detection to Remediation
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Chunyuan Deng
Yilun Zhao
Yuzhao Heng
Yitong Li
Jiannan Cao
Xiangru Tang
Arman Cohan
91
15
0
20 Jun 2024
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
Hasan Hammoud
Umberto Michieli
Fabio Pizzati
Philip Torr
Adel Bibi
Guohao Li
Mete Ozay
MoMe
73
18
0
20 Jun 2024
Instruction Pre-Training: Language Models are Supervised Multitask
  Learners
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Daixuan Cheng
Yuxian Gu
Shaohan Huang
Junyu Bi
Minlie Huang
Furu Wei
SyDa
137
27
0
20 Jun 2024
LiveMind: Low-latency Large Language Models with Simultaneous Inference
LiveMind: Low-latency Large Language Models with Simultaneous Inference
Chuangtao Chen
Grace Li Zhang
Xunzhao Yin
Cheng Zhuo
Ulf Schlichtmann
Bing Li
LRM
110
5
0
20 Jun 2024
Timo: Towards Better Temporal Reasoning for Language Models
Timo: Towards Better Temporal Reasoning for Language Models
Zhaochen Su
Jun Zhang
Tong Zhu
Xiaoye Qu
Juntao Li
Min Zhang
Yu Cheng
LRM
96
23
0
20 Jun 2024
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large
  Language Model Evaluation
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation
Qin Zhu
Qingyuan Cheng
Runyu Peng
Xiaonan Li
Tengxiao Liu
Ru Peng
Xipeng Qiu
Xuanjing Huang
76
7
0
20 Jun 2024
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungbeen Lee
Seungwon Lim
Seungju Han
Giyeong Oh
Hyungjoo Chae
...
Beong-woo Kwak
Yeonsoo Lee
Dongha Lee
Jinyoung Yeo
Youngjae Yu
101
16
0
20 Jun 2024
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Tinghao Xie
Xiangyu Qi
Yi Zeng
Yangsibo Huang
Udari Madhushani Sehwag
...
Bo Li
Kai Li
Danqi Chen
Peter Henderson
Prateek Mittal
ALMELM
191
79
0
20 Jun 2024
CityGPT: Empowering Urban Spatial Cognition of Large Language Models
CityGPT: Empowering Urban Spatial Cognition of Large Language Models
Jie Feng
Tianhui Liu
Junbo Yan
Siqi Guo
Yuming Lin
Yong Li
125
16
0
20 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Ziang Xiao
Shu Wang
Xing Xie
ELMALM
166
8
0
20 Jun 2024
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics
  in the Real World
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
Weixiang Yan
Haitian Liu
Tengxiao Wu
Qian Chen
Wen Wang
...
Jiayi Wang
Weishan Zhao
Yixin Zhang
Renjun Zhang
Li Zhu
LM&MA
88
13
0
19 Jun 2024
Self-play with Execution Feedback: Improving Instruction-following
  Capabilities of Large Language Models
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
Guanting Dong
Keming Lu
Chengpeng Li
Tingyu Xia
Bowen Yu
Chang Zhou
Jingren Zhou
SyDaALMLRM
126
21
0
19 Jun 2024
BeHonest: Benchmarking Honesty in Large Language Models
BeHonest: Benchmarking Honesty in Large Language Models
Steffi Chern
Zhulin Hu
Yuqing Yang
Ethan Chern
Yuan Guo
Jiahe Jin
Binjie Wang
Pengfei Liu
HILMALM
143
6
0
19 Jun 2024
Data Contamination Can Cross Language Barriers
Data Contamination Can Cross Language Barriers
Feng Yao
Yufan Zhuang
Zihao Sun
Sunan Xu
Animesh Kumar
Jingbo Shang
94
12
0
19 Jun 2024
Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG
Evaluating nnn-Gram Novelty of Language Models Using Rusty-DAWG
William Merrill
Noah A. Smith
Yanai Elazar
ELMTDI
116
12
0
18 Jun 2024
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal
  Quantization levels and Rank Values trough Differentiable Bayesian Gates
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates
Cristian Meo
Ksenia Sycheva
Anirudh Goyal
Justin Dauwels
MQ
75
5
0
18 Jun 2024
Is It Good Data for Multilingual Instruction Tuning or Just Bad
  Multilingual Evaluation for Large Language Models?
Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models?
Pinzhen Chen
Simon Yu
Zhicheng Guo
Barry Haddow
ELM
116
3
0
18 Jun 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All
  Tools
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
162
650
0
18 Jun 2024
Benchmarking Multi-Image Understanding in Vision and Language Models:
  Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Bingchen Zhao
Yongshuo Zong
Letian Zhang
Timothy Hospedales
VLM
118
19
0
18 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELMALM
179
66
0
18 Jun 2024
Breaking the Ceiling of the LLM Community by Treating Token Generation
  as a Classification for Ensembling
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
Yao-Ching Yu
Chun-Chih Kuo
Ziqi Ye
Yu-Cheng Chang
Yueh-Se Li
88
12
0
18 Jun 2024
Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large
  Language Models
Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models
Eldar Kurtic
Amir Moeini
Dan Alistarh
LRM
101
2
0
18 Jun 2024
Unveiling the Flaws: Exploring Imperfections in Synthetic Data and
  Mitigation Strategies for Large Language Models
Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models
Jie Chen
Yupeng Zhang
Bingning Wang
Wayne Xin Zhao
Ji-Rong Wen
Weipeng Chen
SyDa
85
6
0
18 Jun 2024
QOG:Question and Options Generation based on Language Model
QOG:Question and Options Generation based on Language Model
Jincheng Zhou
82
3
0
18 Jun 2024
Problem-Solving in Language Model Networks
Problem-Solving in Language Model Networks
Ciaran Regan
Alexandre Gournail
Mizuki Oka
LRMLLMAGKELM
70
3
0
18 Jun 2024
WebCanvas: Benchmarking Web Agents in Online Environments
WebCanvas: Benchmarking Web Agents in Online Environments
Yichen Pan
Dehan Kong
Sida Zhou
Cheng Cui
Yifei Leng
...
Hangyu Liu
Yanyi Shang
Shuyan Zhou
Tongshuang Wu
Zhengyang Wu
152
43
0
18 Jun 2024
Previous
123...373839...676869
Next