ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
What is the best model? Application-driven Evaluation for Large Language
  Models
What is the best model? Application-driven Evaluation for Large Language Models
Shiguo Lian
Kaikai Zhao
Xinhui Liu
Xuejiao Lei
Bikun Yang
Wenjing Zhang
Kai Wang
Zhaoxiang Liu
ALMELM
102
3
0
14 Jun 2024
A Survey on Large Language Models from General Purpose to Medical
  Applications: Datasets, Methodologies, and Evaluations
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations
Jinqiang Wang
Huansheng Ning
Yi Peng
Qikai Wei
Daniel Tesfai
Wenwei Mao
Tao Zhu
Runhe Huang
LM&MAAI4MHELM
162
8
0
14 Jun 2024
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning
Jiaqi Li
Yixuan Tang
Yi Yang
155
8
0
14 Jun 2024
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming
Victor-Alexandru Pădurean
Adish Singla
ELM
117
4
0
14 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLMLRM
87
1
0
13 Jun 2024
Unpacking DPO and PPO: Disentangling Best Practices for Learning from
  Preference Feedback
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
Hamish Ivison
Yizhong Wang
Jiacheng Liu
Zeqiu Wu
Valentina Pyatkin
Nathan Lambert
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
110
64
0
13 Jun 2024
ReMI: A Dataset for Reasoning with Multiple Images
ReMI: A Dataset for Reasoning with Multiple Images
Mehran Kazemi
Nishanth Dikkala
Ankit Anand
Petar Dević
Ishita Dasgupta
...
Bahare Fatemi
Pranjal Awasthi
Dee Guo
Sreenivas Gollapudi
Ahmed Qureshi
LRMVLM
110
17
0
13 Jun 2024
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large
  Language Models
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models
Kehua Feng
Keyan Ding
Weijie Wang
Xiang Zhuang
Zeyuan Wang
Ming Qin
Yu Zhao
Jianhua Yao
Qiang Zhang
H. Chen
ELM
93
9
0
13 Jun 2024
ME-Switch: A Memory-Efficient Expert Switching Framework for Large
  Language Models
ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models
Jing Liu
Ruihao Gong
Mingyang Zhang
Yefei He
Jianfei Cai
Bohan Zhuang
MoE
78
0
0
13 Jun 2024
Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large
  Language Models
Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models
Minghao Wu
Thuy-Trang Vu
Zhuang Li
Gholamreza Haffari
75
6
0
13 Jun 2024
StreamBench: Towards Benchmarking Continuous Improvement of Language
  Agents
StreamBench: Towards Benchmarking Continuous Improvement of Language Agents
Cheng-Kuang Wu
Zhi Rui Tam
Chieh-Yen Lin
Yun-Nung Chen
Hung-yi Lee
LLMAG
105
8
0
13 Jun 2024
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
Tomer Ashuach
Martin Tutek
Yonatan Belinkov
MUKELM
182
7
0
13 Jun 2024
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
Rithesh Murthy
Liangwei Yang
Juntao Tan
Tulika Awalgaonkar
Yilun Zhou
...
Zuxin Liu
Ming Zhu
Huan Wang
Caiming Xiong
Silvio Savarese
104
6
0
12 Jun 2024
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning
  Enhancement in RLHF and Effective-Merged LLMs
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs
Chen Zheng
Ke Sun
Xun Zhou
MoE
85
0
0
12 Jun 2024
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
  with Nothing
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Yuntian Deng
Radha Poovendran
Yejin Choi
Bill Yuchen Lin
SyDa
126
161
0
12 Jun 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
  Interleaved with Text
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Qingyun Li
Zhe Chen
Weiyun Wang
Wenhai Wang
Shenglong Ye
...
Dahua Lin
Yu Qiao
Botian Shi
Conghui He
Jifeng Dai
VLMOffRL
122
27
0
12 Jun 2024
Large Language Models Must Be Taught to Know What They Don't Know
Large Language Models Must Be Taught to Know What They Don't Know
Sanyam Kapoor
Nate Gruver
Manley Roberts
Katherine Collins
Arka Pal
Umang Bhatt
Adrian Weller
Samuel Dooley
Micah Goldblum
Andrew Gordon Wilson
110
25
0
12 Jun 2024
Supportiveness-based Knowledge Rewriting for Retrieval-augmented
  Language Modeling
Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling
Zile Qiao
Wei Ye
Yong Jiang
Tong Mo
Pengjun Xie
Weiping Li
Fei Huang
Shikun Zhang
KELM
58
4
0
12 Jun 2024
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag
  Competition
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Edoardo Debenedetti
Javier Rando
Daniel Paleka
Silaghi Fineas Florin
Dragos Albastroiu
...
Stefan Kraft
Mario Fritz
Florian Tramèr
Sahar Abdelnabi
Lea Schonherr
114
14
0
12 Jun 2024
An Empirical Study of Mamba-based Language Models
An Empirical Study of Mamba-based Language Models
R. Waleffe
Wonmin Byeon
Duncan Riach
Brandon Norick
V. Korthikanti
...
Vartika Singh
Jared Casper
Jan Kautz
Mohammad Shoeybi
Bryan Catanzaro
125
79
0
12 Jun 2024
ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large
  Language Models
ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models
Xiang Meng
Kayhan Behdin
Haoyue Wang
Rahul Mazumder
80
6
0
12 Jun 2024
Are Large Language Models Good Statisticians?
Are Large Language Models Good Statisticians?
Yizhang Zhu
Shiyin Du
Boyan Li
Yuyu Luo
Nan Tang
ELM
93
18
0
12 Jun 2024
Collective Constitutional AI: Aligning a Language Model with Public
  Input
Collective Constitutional AI: Aligning a Language Model with Public Input
Saffron Huang
Divya Siddarth
Liane Lovitt
Thomas I. Liao
Esin Durmus
Alex Tamkin
Deep Ganguli
ELM
140
83
0
12 Jun 2024
OLMES: A Standard for Language Model Evaluations
OLMES: A Standard for Language Model Evaluations
Yuling Gu
Oyvind Tafjord
Bailey Kuehl
Dany Haddad
Jesse Dodge
Hannaneh Hajishirzi
ELM
134
20
0
12 Jun 2024
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks
Justin Zhao
Flor Miriam Plaza del Arco
Amanda Cercas Curry
Amanda Cercas Curry
ELMALM
90
1
0
12 Jun 2024
QuantMoE-Bench: Examining Post-Training Quantization for Mixture-of-Experts
QuantMoE-Bench: Examining Post-Training Quantization for Mixture-of-Experts
Pingzhi Li
Xiaolong Jin
Yu Cheng
Tianlong Chen
Tianlong Chen
MQMoE
110
2
0
12 Jun 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
171
15
0
12 Jun 2024
MultiPragEval: Multilingual Pragmatic Evaluation of Large Language
  Models
MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models
Dojun Park
Jiwoo Lee
Seohyun Park
Hyeyun Jeong
Youngeun Koo
Soonha Hwang
Seonwoo Park
Sungeun Lee
ELM
63
2
0
11 Jun 2024
OPTune: Efficient Online Preference Tuning
OPTune: Efficient Online Preference Tuning
Lichang Chen
Jiuhai Chen
Chenxi Liu
John Kirchenbauer
Davit Soselia
Chen Zhu
Tom Goldstein
Dinesh Manocha
Heng Huang
70
5
0
11 Jun 2024
Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs
  Evaluation, Benchmark, and Arena
Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena
Aidar Myrzakhan
Sondos Mahmoud Bsharat
Zhiqiang Shen
ELM
75
39
0
11 Jun 2024
TextGrad: Automatic "Differentiation" via Text
TextGrad: Automatic "Differentiation" via Text
Mert Yuksekgonul
Federico Bianchi
Joseph Boen
Sheng Liu
Zhi Huang
Carlos Guestrin
James Zou
LLMAGOODAI4CE
110
48
0
11 Jun 2024
CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence
CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence
Md Tanvirul Alam
Dipkamal Bhusal
Le Nguyen
Nidhi Rastogi
ELM
64
22
0
11 Jun 2024
When Linear Attention Meets Autoregressive Decoding: Towards More
  Effective and Efficient Linearized Large Language Models
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You
Yichao Fu
Zheng Wang
Amir Yazdanbakhsh
Yingyan Celine Lin
133
5
0
11 Jun 2024
BertaQA: How Much Do Language Models Know About Local Culture?
BertaQA: How Much Do Language Models Know About Local Culture?
Julen Etxaniz
Gorka Azkune
A. Soroa
Oier López de Lacalle
Mikel Artetxe
111
11
0
11 Jun 2024
Effectively Compress KV Heads for LLM
Effectively Compress KV Heads for LLM
Hao Yu
Zelan Yang
Shen Li
Shen Li
Jianxin Wu
MQVLM
64
16
0
11 Jun 2024
Crayon: Customized On-Device LLM via Instant Adapter Blending and
  Edge-Server Hybrid Inference
Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference
Jihwan Bang
Juntae Lee
Kyuhong Shim
Seunghan Yang
Simyung Chang
82
7
0
11 Jun 2024
Flextron: Many-in-One Flexible Large Language Model
Flextron: Many-in-One Flexible Large Language Model
Ruisi Cai
Saurav Muralidharan
Greg Heinrich
Hongxu Yin
Zhangyang Wang
Jan Kautz
Pavlo Molchanov
87
14
0
11 Jun 2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
91
31
0
11 Jun 2024
Scaling Large Language Model-based Multi-Agent Collaboration
Scaling Large Language Model-based Multi-Agent Collaboration
Chen Qian
Zihao Xie
YiFei Wang
Wei Liu
Yufan Dang
...
Zhuoyun Du
Weize Chen
Cheng Yang
Zhiyuan Liu
Maosong Sun
AI4CELLMAGLM&Ro
178
78
0
11 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
185
69
0
11 Jun 2024
SciRIFF: A Resource to Enhance Language Model Instruction-Following over
  Scientific Literature
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
David Wadden
Kejian Shi
Jacob Morrison
Aakanksha Naik
Shruti Singh
...
Luca Soldaini
Shannon Zejiang Shen
Doug Downey
Hannaneh Hajishirzi
Arman Cohan
134
15
0
10 Jun 2024
Language Models are Alignable Decision-Makers: Dataset and Application
  to the Medical Triage Domain
Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain
Brian Hu
Bill Ray
Alice Leung
Amy Summerville
David Joy
Christopher Funk
Arslan Basharat
88
6
0
10 Jun 2024
MedExQA: Medical Question Answering Benchmark with Multiple Explanations
MedExQA: Medical Question Answering Benchmark with Multiple Explanations
Yunsoo Kim
Jinge Wu
Yusuf Abdulle
Honghan Wu
ELM
109
24
0
10 Jun 2024
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Zhenliang Xue
Yixin Song
Zeyu Mi
Le Chen
Yubin Xia
Haibo Chen
126
52
0
10 Jun 2024
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in
  Low-Resource and Extinct Languages
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
Andrew M. Bean
Simi Hellsten
Harry Mayne
Jabez Magomere
Ethan A. Chi
Ryan A. Chi
Scott A. Hale
Hannah Rose Kirk
ELMLRM
97
12
0
10 Jun 2024
Can I understand what I create? Self-Knowledge Evaluation of Large
  Language Models
Can I understand what I create? Self-Knowledge Evaluation of Large Language Models
Zhiquan Tan
Lai Wei
Jindong Wang
Xing Xie
Weiran Huang
ELMLRM
69
5
0
10 Jun 2024
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training
  Multiplication-Less Reparameterization
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Haoran You
Yipin Guo
Yichao Fu
Wei Zhou
Huihong Shi
Xiaofan Zhang
Souvik Kundu
Amir Yazdanbakhsh
Y. Lin
KELM
119
11
0
10 Jun 2024
CVQA: Culturally-diverse Multilingual Visual Question Answering
  Benchmark
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
David Romero
Chenyang Lyu
Haryo Akbarianto Wibowo
Teresa Lynn
Injy Hamed
...
Oana Ignat
Joan Nwatu
Rada Mihalcea
Thamar Solorio
Alham Fikri Aji
117
43
0
10 Jun 2024
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated
  Parameters
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Yixin Song
Haotong Xie
Zhengyan Zhang
Bo Wen
Li Ma
Zeyu Mi
Haibo Chen
MoE
168
25
0
10 Jun 2024
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Xiangyu Qi
Ashwinee Panda
Kaifeng Lyu
Xiao Ma
Subhrajit Roy
Ahmad Beirami
Prateek Mittal
Peter Henderson
120
142
0
10 Jun 2024
Previous
123...394041...676869
Next