ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM
  Reliance
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance
Kaitlyn Zhou
Jena D. Hwang
Xiang Ren
Nouha Dziri
Dan Jurafsky
Maarten Sap
87
7
0
10 Jul 2024
A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models :
  Safety, Consensus, Objectivity, Reproducibility and Explainability
A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability
Ting Fang Tan
Kabilan Elangovan
J. Ong
Nigam Shah
J. Sung
...
Haibo Wang
Chang Fu Kuo
Simon Chesterman
Zee Kin Yeong
Daniel Ting
ELM
39
6
0
10 Jul 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Mengzhao Chen
Wenqi Shao
Peng Xu
Jiahao Wang
Peng Gao
Kaipeng Zhang
Ping Luo
MQ
163
35
0
10 Jul 2024
Training on the Test Task Confounds Evaluation and Emergence
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo
Florian E. Dorner
Moritz Hardt
ELM
162
9
1
10 Jul 2024
Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language
  Models
Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models
Jupinder Parmar
Sanjev Satheesh
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
118
38
0
09 Jul 2024
Self-Recognition in Language Models
Self-Recognition in Language Models
Tim R. Davidson
Viacheslav Surkov
V. Veselovsky
Giuseppe Russo
Robert West
Çağlar Gülçehre
PILM
321
4
0
09 Jul 2024
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
Rulin Shao
Jacqueline He
Akari Asai
Weijia Shi
Tim Dettmers
Sewon Min
Luke Zettlemoyer
Pang Wei Koh
RALM
102
26
0
09 Jul 2024
Virtual Personas for Language Models via an Anthology of Backstories
Virtual Personas for Language Models via an Anthology of Backstories
Suhong Moon
Marwa Abdulhai
Minwoo Kang
Joseph Suh
Widyadewi Soedarmadji
Eran Kohen Behar
David M. Chan
88
16
0
09 Jul 2024
Composable Interventions for Language Models
Composable Interventions for Language Models
Arinbjorn Kolbeinsson
Kyle O'Brien
Tianjin Huang
Shanghua Gao
Shiwei Liu
...
Anurag J. Vaidya
Faisal Mahmood
Marinka Zitnik
Tianlong Chen
Thomas Hartvigsen
KELMMU
201
4
0
09 Jul 2024
A Single Transformer for Scalable Vision-Language Modeling
A Single Transformer for Scalable Vision-Language Modeling
Yangyi Chen
Xingyao Wang
Hao Peng
Heng Ji
LRM
107
17
0
08 Jul 2024
Data, Data Everywhere: A Guide for Pretraining Dataset Construction
Data, Data Everywhere: A Guide for Pretraining Dataset Construction
Jupinder Parmar
Shrimai Prabhumoye
Pritam Gundecha
Bo Liu
Aastha Jhunjhunwala
Zhilin Wang
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
126
10
0
08 Jul 2024
Igea: a Decoder-Only Language Model for Biomedical Text Generation in
  Italian
Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian
T. M. Buonocore
Simone Rancati
Enea Parimbelli
89
0
0
08 Jul 2024
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation
  Capabilities Beyond 100 Languages
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
Yinquan Lu
Wenhao Zhu
Lei Li
Yu Qiao
Fei Yuan
94
32
0
08 Jul 2024
A Survey on LoRA of Large Language Models
A Survey on LoRA of Large Language Models
Yuren Mao
Yuhang Ge
Yijiang Fan
Wenyi Xu
Yu Mi
Zhonghao Hu
Yunjun Gao
ALM
158
41
0
08 Jul 2024
PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation
PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation
Jinpeng Hu
Tengteng Dong
Luo Gang
Hui Ma
Peng Zou
Xiao Sun
Dan Guo
Meng Wang
AI4MH
92
7
0
08 Jul 2024
LLMBox: A Comprehensive Library for Large Language Models
LLMBox: A Comprehensive Library for Large Language Models
Tianyi Tang
Yiwen Hu
Bingqian Li
Wenyang Luo
Zijing Qin
...
Chunxuan Xia
Junyi Li
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
65
2
0
08 Jul 2024
On Speeding Up Language Model Evaluation
On Speeding Up Language Model Evaluation
Jin Peng Zhou
Christian K. Belardi
Ruihan Wu
Travis Zhang
Carla P. Gomes
Wen Sun
Kilian Q. Weinberger
165
2
0
08 Jul 2024
SBoRA: Low-Rank Adaptation with Regional Weight Updates
SBoRA: Low-Rank Adaptation with Regional Weight Updates
L. Po
Yuyang Liu
Haoxuan Wu
Tianqi Zhang
Weikang Yu
Zeyu Jiang
Kun Li
78
1
0
07 Jul 2024
ElecBench: a Power Dispatch Evaluation Benchmark for Large Language
  Models
ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models
Xiyuan Zhou
Huan Zhao
Yuheng Cheng
Yuji Cao
Gaoqi Liang
Guolong Liu
Wenxuan Liu
Yan Xu
Junhua Zhao
ELM
85
6
0
07 Jul 2024
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM
  Compression
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
Zhichao Xu
Ashim Gupta
Tao Li
Oliver Bentham
Vivek Srikumar
111
13
0
06 Jul 2024
On scalable oversight with weak LLMs judging strong LLMs
On scalable oversight with weak LLMs judging strong LLMs
Zachary Kenton
Noah Y. Siegel
János Kramár
Jonah Brown-Cohen
Samuel Albanie
...
Rishabh Agarwal
David Lindner
Yunhao Tang
Noah D. Goodman
Rohin Shah
ELM
108
36
0
05 Jul 2024
Not (yet) the whole story: Evaluating Visual Storytelling Requires More
  than Measuring Coherence, Grounding, and Repetition
Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
66
6
0
05 Jul 2024
LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
Xingyu Xie
Zhijie Lin
Kim-Chuan Toh
Pan Zhou
98
3
0
05 Jul 2024
Trustworthy Classification through Rank-Based Conformal Prediction Sets
Trustworthy Classification through Rank-Based Conformal Prediction Sets
Rui Luo
Zhixin Zhou
186
15
0
05 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language
  Models: Challenges, Limitations, and Recommendations
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq Joty
Jimmy Huang
ELMALM
105
41
0
04 Jul 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
202
1
0
04 Jul 2024
Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM
  Inference on Heterogeneous Systems
Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems
Grant Wilkins
Srinivasan Keshav
Richard Mortier
91
9
0
04 Jul 2024
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal
  Models Across Multilingual and Multicultural Vision-Language Tasks
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks
Florian Schneider
Sunayana Sitaram
VLM
84
12
0
04 Jul 2024
GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation
  Quality Across Languages, Domains, and Expertise Levels
GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels
Jianhao Yan
Pingchuan Yan
Yulong Chen
Judy Li
Xianchao Zhu
Yue Zhang
LM&MAELM
131
18
0
04 Jul 2024
Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation
Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation
Yi-Chen Li
Fuxiang Zhang
Wenjie Qiu
Lei Yuan
Chengxing Jia
Zongzhang Zhang
Yang Yu
Bo An
83
3
0
04 Jul 2024
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
Zhen Tan
Daize Dong
Xinyu Zhao
Jie Peng
Yu Cheng
Tianlong Chen
MoE
91
4
0
03 Jul 2024
How Does Quantization Affect Multilingual LLMs?
How Does Quantization Affect Multilingual LLMs?
Kelly Marchisio
Saurabh Dash
Hongyu Chen
Dennis Aumiller
Ahmet Üstün
Sara Hooker
Sebastian Ruder
MQ
129
15
0
03 Jul 2024
Improving Conversational Abilities of Quantized Large Language Models
  via Direct Preference Alignment
Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
Janghwan Lee
Seongmin Park
S. Hong
Minsoo Kim
Du-Seong Chang
Jungwook Choi
44
6
0
03 Jul 2024
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in
  LLMs
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
Yue Yu
Ming-Yu Liu
Zihan Liu
Wei Ping
Jiaxuan You
Chao Zhang
Mohammad Shoeybi
Bryan Catanzaro
ALMRALM
130
74
0
02 Jul 2024
RLHF Can Speak Many Languages: Unlocking Multilingual Preference
  Optimization for LLMs
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
John Dang
Arash Ahmadian
Kelly Marchisio
Julia Kreutzer
Ahmet Üstün
Sara Hooker
103
28
0
02 Jul 2024
CFinBench: A Comprehensive Chinese Financial Benchmark for Large
  Language Models
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models
Ying Nie
Binwei Yan
Tianyu Guo
Hao Liu
Haoyu Wang
...
Weihao Wang
Qiang Li
Weijian Sun
Yunhe Wang
Dacheng Tao
ELM
143
3
0
02 Jul 2024
GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning
GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning
Hasna Chouikhi
Manel Aloui
Cyrine Ben Hammou
Ghaith Chaabane
Haithem Kchaou
Chehir Dhaouadi
76
0
0
02 Jul 2024
Cost-Effective Proxy Reward Model Construction with On-Policy and Active
  Learning
Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Yifang Chen
Shuohang Wang
Ziyi Yang
Hiteshi Sharma
Nikos Karampatziakis
Donghan Yu
Kevin Jamieson
Simon Shaolei Du
Yelong Shen
OffRL
102
5
0
02 Jul 2024
The Art of Saying No: Contextual Noncompliance in Language Models
The Art of Saying No: Contextual Noncompliance in Language Models
Faeze Brahman
Sachin Kumar
Vidhisha Balachandran
Pradeep Dasigi
Valentina Pyatkin
...
Jack Hessel
Yulia Tsvetkov
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
142
32
0
02 Jul 2024
To Forget or Not? Towards Practical Knowledge Unlearning for Large
  Language Models
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
Bozhong Tian
Xiaozhuan Liang
Siyuan Cheng
Qingbin Liu
Mengru Wang
Dianbo Sui
Xi Chen
Huajun Chen
Xin Xu
MU
89
14
0
02 Jul 2024
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for
  Sparse Architectural Large Language Models
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Zihan Wang
Deli Chen
Damai Dai
Runxin Xu
Zhuoshu Li
Y. Wu
MoEALM
79
3
0
02 Jul 2024
Survey on Knowledge Distillation for Large Language Models: Methods,
  Evaluation, and Application
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application
Chuanpeng Yang
Wang Lu
Yao Zhu
Yidong Wang
Qian Chen
Chenlong Gao
Bingjie Yan
Yiqiang Chen
ALMKELM
103
32
0
02 Jul 2024
Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior
Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior
Pedro Henrique Luz de Araujo
Benjamin Roth
104
5
0
02 Jul 2024
The #Somos600M Project: Generating NLP resources that represent the
  diversity of the languages from LATAM, the Caribbean, and Spain
The #Somos600M Project: Generating NLP resources that represent the diversity of the languages from LATAM, the Caribbean, and Spain
María Grandury
SyDa
76
2
0
01 Jul 2024
AI Agents That Matter
AI Agents That Matter
Sayash Kapoor
Benedikt Stroebl
Zachary S. Siegel
Nitya Nadgir
Arvind Narayanan
99
43
0
01 Jul 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min Lin
MoE
182
54
1
01 Jul 2024
Expressive and Generalizable Low-rank Adaptation for Large Models via
  Slow Cascaded Learning
Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning
Siwei Li
Yifan Yang
Yifei Shen
Fangyun Wei
Zongqing Lu
L. Qiu
Yuqing Yang
AI4CE
91
3
0
01 Jul 2024
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable
  Objectives
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives
Luísa Shimabucoro
Sebastian Ruder
Julia Kreutzer
Marzieh Fadaee
Sara Hooker
SyDa
74
5
0
01 Jul 2024
Collaborative Performance Prediction for Large Language Models
Collaborative Performance Prediction for Large Language Models
Qiyuan Zhang
Fuyuan Lyu
Xue Liu
Chen Ma
56
5
0
01 Jul 2024
We-Math: Does Your Large Multimodal Model Achieve Human-like
  Mathematical Reasoning?
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Runqi Qiao
Qiuna Tan
Guanting Dong
Minhui Wu
Chong Sun
...
Yida Xu
Muxi Diao
Zhimin Bao
Chen Li
Honggang Zhang
VLMLRM
115
56
0
01 Jul 2024
Previous
123...353637...676869
Next