ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
Searching for Best Practices in Retrieval-Augmented Generation
Searching for Best Practices in Retrieval-Augmented Generation
Xiaohua Wang
Zhenghua Wang
Xuan Gao
Feiran Zhang
Yixin Wu
...
Qi Qian
Ruicheng Yin
Changze Lv
Xiaoqing Zheng
Xuanjing Huang
113
62
0
01 Jul 2024
Development of Cognitive Intelligence in Pre-trained Language Models
Development of Cognitive Intelligence in Pre-trained Language Models
Raj Sanjay Shah
Khushi Bhardwaj
Sashank Varma
115
2
0
01 Jul 2024
Exploring Advanced Large Language Models with LLMsuite
Exploring Advanced Large Language Models with LLMsuite
Giorgio Roffo
LLMAG
36
0
0
01 Jul 2024
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models:
  Enhancing Performance and Reducing Inference Costs
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
Enshu Liu
Junyi Zhu
Zinan Lin
Xuefei Ning
Matthew B. Blaschko
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MoE
107
7
0
01 Jul 2024
FoldGPT: Simple and Effective Large Language Model Compression Scheme
FoldGPT: Simple and Effective Large Language Model Compression Scheme
Songwei Liu
Chao Zeng
Lianqiang Li
Chenqian Yan
Lean Fu
Xing Mei
Fangmin Chen
86
5
0
01 Jul 2024
BAPO: Base-Anchored Preference Optimization for Personalized Alignment
  in Large Language Models
BAPO: Base-Anchored Preference Optimization for Personalized Alignment in Large Language Models
Gihun Lee
Minchan Jeong
Yujin Kim
Hojung Jung
Jaehoon Oh
Sangmook Kim
Se-Young Yun
78
3
0
30 Jun 2024
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Yuheng Zhang
Dian Yu
Baolin Peng
Linfeng Song
Ye Tian
Mingyue Huo
Nan Jiang
Haitao Mi
Dong Yu
228
18
0
30 Jun 2024
LLMs-as-Instructors: Learning from Errors Toward Automating Model
  Improvement
LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement
Jiahao Ying
Mingbao Lin
Yixin Cao
Wei Tang
Bo Wang
Qianru Sun
Xuanjing Huang
Shuicheng Yan
LRM
80
11
0
29 Jun 2024
It's Morphing Time: Unleashing the Potential of Multiple LLMs via
  Multi-objective Optimization
It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization
Bingdong Li
Zixiang Di
Yanting Yang
Hong Qian
Peng Yang
Hao Hao
Ke Tang
Aimin Zhou
MoMe
119
6
0
29 Jun 2024
GraphArena: Evaluating and Exploring Large Language Models on Graph Computation
GraphArena: Evaluating and Exploring Large Language Models on Graph Computation
Jianheng Tang
Qifan Zhang
Yuhan Li
Nuo Chen
Jia Li
115
1
0
29 Jun 2024
Closed-Form Test Functions for Biophysical Sequence Optimization
  Algorithms
Closed-Form Test Functions for Biophysical Sequence Optimization Algorithms
Samuel Stanton
R. Alberstein
Nathan C. Frey
Andrew Watkins
Kyunghyun Cho
107
5
0
28 Jun 2024
YuLan: An Open-source Large Language Model
YuLan: An Open-source Large Language Model
Yutao Zhu
Kun Zhou
Kelong Mao
Wentong Chen
Yiding Sun
...
Wenbing Huang
Ze-Feng Gao
Yueguo Chen
Weizheng Lu
Ji-Rong Wen
ALMELM
70
1
0
28 Jun 2024
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for
  Foundation Models
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Zhi-Long Ji
Jin-Feng Bai
Zhen-Ru Pan
Fan-Hu Zeng
Jian Xu
Jia-Xin Zhang
Cheng-Lin Liu
ELM
103
14
0
28 Jun 2024
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Boyao Wang
Dylan Zhang
Hanning Zhang
Xingyuan Pan
Minrui Xu
Jipeng Zhang
Renjie Pi
Xiaoyu Wang
Tong Zhang
139
10
0
28 Jun 2024
Changing Answer Order Can Decrease MMLU Accuracy
Changing Answer Order Can Decrease MMLU Accuracy
Vipul Gupta
David Pantoja
Candace Ross
Adina Williams
Megan Ung
101
25
0
27 Jun 2024
From Artificial Needles to Real Haystacks: Improving Retrieval
  Capabilities in LLMs by Finetuning on Synthetic Data
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Zheyang Xiong
Vasilis Papageorgiou
Kangwook Lee
Dimitris Papailiopoulos
SyDaRALM
96
13
0
27 Jun 2024
FernUni LLM Experimental Infrastructure (FLEXI) -- Enabling
  Experimentation and Innovation in Higher Education Through Access to Open
  Large Language Models
FernUni LLM Experimental Infrastructure (FLEXI) -- Enabling Experimentation and Innovation in Higher Education Through Access to Open Large Language Models
Torsten Zesch
Michael Hanses
Niels Seidel
Piush Aggarwal
Dirk Veiel
Claudia de Witt
53
0
0
27 Jun 2024
Improving Weak-to-Strong Generalization with Reliability-Aware Alignment
Improving Weak-to-Strong Generalization with Reliability-Aware Alignment
Yue Guo
Yi Yang
98
11
0
27 Jun 2024
Length Optimization in Conformal Prediction
Length Optimization in Conformal Prediction
Shayan Kiyani
George Pappas
Hamed Hassani
115
17
0
27 Jun 2024
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Colin White
Samuel Dooley
Manley Roberts
Arka Pal
Ben Feuer
...
Willie Neiswanger
Micah Goldblum
Tom Goldstein
Willie Neiswanger
Micah Goldblum
ELM
125
20
0
27 Jun 2024
DataGen: Unified Synthetic Dataset Generation via Large Language Models
DataGen: Unified Synthetic Dataset Generation via Large Language Models
Yue Huang
Siyuan Wu
Chujie Gao
Dongping Chen
Qihui Zhang
...
Tianyi Zhou
Xiangliang Zhang
Jianfeng Gao
Chaowei Xiao
Lichao Sun
SyDa
127
20
0
27 Jun 2024
Categorical Syllogisms Revisited: A Review of the Logical Reasoning
  Abilities of LLMs for Analyzing Categorical Syllogism
Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism
Shi Zong
Jimmy Lin
ELMLRM
73
3
0
26 Jun 2024
Evaluating Copyright Takedown Methods for Language Models
Evaluating Copyright Takedown Methods for Language Models
Boyi Wei
Weijia Shi
Yangsibo Huang
Noah A. Smith
Chiyuan Zhang
Luke Zettlemoyer
Kai Li
Peter Henderson
147
25
0
26 Jun 2024
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying
  and Reweighting Context-Aware Neurons
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons
Dan Shi
Renren Jin
Tianhao Shen
Weilong Dong
Xinwei Wu
Deyi Xiong
105
11
0
26 Jun 2024
Evaluating Quality of Answers for Retrieval-Augmented Generation: A
  Strong LLM Is All You Need
Evaluating Quality of Answers for Retrieval-Augmented Generation: A Strong LLM Is All You Need
Yang Wang
Alberto Garcia Hernandez
Roman Kyslyi
Nicholas S. Kersting
101
3
0
26 Jun 2024
A Survey on Mixture of Experts in Large Language Models
A Survey on Mixture of Experts in Large Language Models
Weilin Cai
Juyong Jiang
Fan Wang
Jing Tang
Sunghun Kim
Jiayi Huang
MoE
100
123
0
26 Jun 2024
PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
Huixuan Zhang
Yun Lin
Xiaojun Wan
144
0
0
26 Jun 2024
RouteLLM: Learning to Route LLMs with Preference Data
RouteLLM: Learning to Route LLMs with Preference Data
Isaac Ong
Amjad Almahairi
Vincent Wu
Wei-Lin Chiang
Tianhao Wu
Joseph E. Gonzalez
M. W. Kadous
Ion Stoica
174
106
0
26 Jun 2024
Evaluating the Efficacy of Foundational Models: Advancing Benchmarking
  Practices to Enhance Fine-Tuning Decision-Making
Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making
O. Amujo
S. Yang
88
1
0
25 Jun 2024
VarBench: Robust Language Model Benchmarking Through Dynamic Variable
  Perturbation
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation
Kun Qian
Shunji Wan
Claudia Tang
Youzhi Wang
Xuanming Zhang
Maximillian Chen
Zhou Yu
AAML
93
12
0
25 Jun 2024
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse
  Gradients
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Aashiq Muhamed
Oscar Li
David Woodruff
Mona Diab
Virginia Smith
101
13
0
25 Jun 2024
Banishing LLM Hallucinations Requires Rethinking Generalization
Banishing LLM Hallucinations Requires Rethinking Generalization
Johnny Li
Saksham Consul
Eda Zhou
James Wong
Naila Farooqui
...
Zhuxiaona Wei
Tian Wu
Ben Echols
Sharon Zhou
Gregory Diamos
LRM
68
13
0
25 Jun 2024
The FineWeb Datasets: Decanting the Web for the Finest Text Data at
  Scale
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Guilherme Penedo
Hynek Kydlícek
Loubna Ben Allal
Anton Lozhkov
Margaret Mitchell
Colin Raffel
Leandro von Werra
Thomas Wolf
144
265
0
25 Jun 2024
Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian
  Benchmark
Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark
Fabio Mercorio
Mario Mezzanzanica
Daniele Potertì
Antonio Serino
Andrea Seveso
119
5
0
25 Jun 2024
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment
  and Knowledge Aggregation
MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation
Yusheng Liao
Shuyang Jiang
Yanfeng Wang
Yu Wang
105
3
0
25 Jun 2024
MoE-CT: A Novel Approach For Large Language Models Training With
  Resistance To Catastrophic Forgetting
MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting
Tianhao Li
Shangjie Li
Binbin Xie
Deyi Xiong
Baosong Yang
CLL
122
4
0
25 Jun 2024
TALEC: Teach Your LLM to Evaluate in Specific Domain with In-house
  Criteria by Criteria Division and Zero-shot Plus Few-shot
TALEC: Teach Your LLM to Evaluate in Specific Domain with In-house Criteria by Criteria Division and Zero-shot Plus Few-shot
Kaiqi Zhang
Shuai Yuan
Honghan Zhao
ALMELM
71
2
0
25 Jun 2024
Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing
  LLMs Beyond Integer Bit-Levels
Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels
Razvan-Gabriel Dumitru
Vikas Yadav
Rishabh Maheshwary
Paul-Ioan Clotan
Sathwik Tejaswi Madhusudhan
Mihai Surdeanu
MQ
127
2
0
25 Jun 2024
Make Some Noise: Unlocking Language Model Parallel Inference Capability
  through Noisy Training
Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training
Yixuan Wang
Xianzhen Luo
Fuxuan Wei
Yijun Liu
Qingfu Zhu
Xuanyu Zhang
Qing Yang
Dongliang Xu
Wanxiang Che
71
4
0
25 Jun 2024
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning
  Graph
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang
Jiaao Chen
Diyi Yang
LRM
78
13
0
25 Jun 2024
Mitigating Hallucination in Fictional Character Role-Play
Mitigating Hallucination in Fictional Character Role-Play
Nafis Sadeq
Zhouhang Xie
Byungkyu Kang
Prarit Lamba
Xiang Gao
Julian McAuley
HILM
113
11
0
25 Jun 2024
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Shane Arora
Marzena Karpinska
Hung-Ting Chen
Ipsita Bhattacharjee
Mohit Iyyer
Eunsol Choi
HILM
150
14
0
25 Jun 2024
RES-Q: Evaluating Code-Editing Large Language Model Systems at the
  Repository Scale
RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale
Beck Labash
August Rosedale
Alex Reents
Lucas Negritto
Colin Wiel
KELM
51
10
0
24 Jun 2024
Modulating Language Model Experiences through Frictions
Modulating Language Model Experiences through Frictions
Katherine M. Collins
Valerie Chen
Ilia Sucholutsky
Hannah Rose Kirk
Malak Sadek
Holli Sargeant
Ameet Talwalkar
Adrian Weller
Umang Bhatt
KELM
118
5
0
24 Jun 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
136
23
0
24 Jun 2024
AutoDetect: Towards a Unified Framework for Automated Weakness Detection
  in Large Language Models
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models
Jiale Cheng
Yida Lu
Xiaotao Gu
Pei Ke
Xiao-Yang Liu
Yuxiao Dong
Hongning Wang
Jie Tang
Minlie Huang
77
6
0
24 Jun 2024
Task Oriented In-Domain Data Augmentation
Task Oriented In-Domain Data Augmentation
Xiao Liang
Xinyu Hu
Simiao Zuo
Yeyun Gong
Qiang Lou
Yi Liu
Shao-Lun Huang
Jian Jiao
86
5
0
24 Jun 2024
Scaling Laws for Linear Complexity Language Models
Scaling Laws for Linear Complexity Language Models
Xuyang Shen
Dong Li
Ruitao Leng
Zhen Qin
Weigao Sun
Yiran Zhong
LRM
83
8
0
24 Jun 2024
Evaluation of Language Models in the Medical Context Under
  Resource-Constrained Settings
Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings
Andrea Posada
Daniel Rueckert
Felix Meissen
Philip Muller
LM&MAELM
63
0
0
24 Jun 2024
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual
  Pre-training
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training
Tong Zhu
Xiaoye Qu
Daize Dong
Jiacheng Ruan
Jingqi Tong
Conghui He
Yu Cheng
MoEALM
116
89
0
24 Jun 2024
Previous
123...363738...676869
Next