ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
252
12
0
25 Mar 2024
Risk-Calibrated Human-Robot Interaction via Set-Valued Intent Prediction
Risk-Calibrated Human-Robot Interaction via Set-Valued Intent Prediction
Justin Lidard
Hang Pham
Ariel Bachman
Bryan Boateng
Anirudha Majumdar
198
5
0
23 Mar 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective
Understanding Emergent Abilities of Language Models from the Loss Perspective
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
UQCVLRM
166
56
0
23 Mar 2024
Cost-Efficient Large Language Model Serving for Multi-turn Conversations
  with CachedAttention
Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention
Bin Gao
Zhuomin He
Puru Sharma
Qingxuan Kang
Djordje Jevdjic
Junbo Deng
Xingkun Yang
Zhou Yu
Pengfei Zuo
141
56
0
23 Mar 2024
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A
  Multifaceted Statistical Approach
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach
Kun Sun
Rong Wang
Anders Sogaard
52
3
0
22 Mar 2024
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Nicholas Lee
Thanakul Wattanawong
Sehoon Kim
K. Mangalam
Sheng Shen
Gopala Anumanchipalli
Michael W. Mahoney
Kurt Keutzer
A. Gholami
115
53
0
22 Mar 2024
A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal
  Reasoning
A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning
Changmeng Zheng
Dayong Liang
Wengyu Zhang
Xiao Wei
Tat-Seng Chua
Qing Li
88
1
0
22 Mar 2024
Extending Token Computation for LLM Reasoning
Extending Token Computation for LLM Reasoning
Bingli Liao
Danilo Vasconcellos Vargas
LRM
41
2
0
22 Mar 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual
  Math Problems?
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang
Dongzhi Jiang
Yichi Zhang
Haokun Lin
Ziyu Guo
...
Aojun Zhou
Pan Lu
Kai-Wei Chang
Peng Gao
Hongsheng Li
107
253
0
21 Mar 2024
ChainLM: Empowering Large Language Models with Improved Chain-of-Thought
  Prompting
ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting
Xiaoxue Cheng
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
LRMAI4CEReLM
91
9
0
21 Mar 2024
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich
  Document Understanding
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
Masato Fujitake
MLLM
62
15
0
21 Mar 2024
RakutenAI-7B: Extending Large Language Models for Japanese
RakutenAI-7B: Extending Large Language Models for Japanese
Rakuten Group
Aaron Levine
Connie Huang
Chenguang Wang
Eduardo Batista
...
Ting Cai
Wei-Te Chen
Yandi Xia
Yuki Nakayama
Yutaka Higashiyama
66
9
0
21 Mar 2024
Reverse Training to Nurse the Reversal Curse
Reverse Training to Nurse the Reversal Curse
O. Yu. Golovneva
Zeyuan Allen-Zhu
Jason Weston
Sainbayar Sukhbaatar
116
38
0
20 Mar 2024
RoleInteract: Evaluating the Social Interaction of Role-Playing Agents
RoleInteract: Evaluating the Social Interaction of Role-Playing Agents
Hongzhan Chen
Hehong Chen
Mingshi Yan
Wenshen Xu
Xing Gao
...
Xiaojun Quan
Chenliang Li
Ji Zhang
Fei Huang
Jingren Zhou
70
22
0
20 Mar 2024
AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting
  Emergent Behavior
AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior
Zhouhong Gu
Xiaoxuan Zhu
Haoran Guo
Lin Zhang
Yin Cai
...
Yifei Dai
Yan Gao
Yao Hu
Hongwei Feng
Yanghua Xiao
AI4CE
85
2
0
20 Mar 2024
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Zhangchi Feng
Yongqiang Ma
159
559
0
20 Mar 2024
Hyacinth6B: A large language model for Traditional Chinese
Hyacinth6B: A large language model for Traditional Chinese
Chih-Wei Song
Yin-Te Tsai
107
0
0
20 Mar 2024
LeanReasoner: Boosting Complex Logical Reasoning with Lean
LeanReasoner: Boosting Complex Logical Reasoning with Lean
Dongwei Jiang
Marcio Fonseca
Shay B. Cohen
LRM
61
19
0
20 Mar 2024
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Charles Goddard
Shamane Siriwardhana
Malikeh Ehghaghi
Luke Meyers
Vladimir Karpukhin
Brian Benedict
Mark McQuade
Jacob Solawetz
MoMeKELM
180
103
0
20 Mar 2024
Pretraining Language Models Using Translationese
Pretraining Language Models Using Translationese
Meet Doshi
Raj Dabre
Pushpak Bhattacharyya
SyDa
99
2
0
20 Mar 2024
Toward Sustainable GenAI using Generation Directives for Carbon-Friendly
  Large Language Model Inference
Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference
Baolin Li
Yankai Jiang
V. Gadepally
Devesh Tiwari
101
15
0
19 Mar 2024
Towards Multimodal In-Context Learning for Vision & Language Models
Towards Multimodal In-Context Learning for Vision & Language Models
Sivan Doveh
Shaked Perek
M. Jehanzeb Mirza
Wei Lin
Amit Alfassy
Assaf Arbelle
S. Ullman
Leonid Karlinsky
VLM
188
18
0
19 Mar 2024
Pragmatic Competence Evaluation of Large Language Models for Korean
Pragmatic Competence Evaluation of Large Language Models for Korean
Dojun Park
Jiwoo Lee
Hyeyun Jeong
Seohyun Park
Sungeun Lee
ELM
63
2
0
19 Mar 2024
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation
  Benchmark for Chinese Large Language Models
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Chuang Liu
Renren Jin
Yuqi Ren
Deyi Xiong
ELM
120
0
0
19 Mar 2024
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
Yongshuo Zong
Ondrej Bohdal
Timothy M. Hospedales
97
7
0
19 Mar 2024
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and
  Safety
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Chuang Liu
Linhao Yu
Jiaxuan Li
Renren Jin
Yufei Huang
...
Tao Liu
Jinwang Song
Hongying Zan
Sun Li
Deyi Xiong
ELM
100
7
0
18 Mar 2024
RouterBench: A Benchmark for Multi-LLM Routing System
RouterBench: A Benchmark for Multi-LLM Routing System
Qitian Jason Hu
Jacob Bieker
Xiuyu Li
Nan Jiang
Benjamin Keigwin
Gaurav Ranganath
Kurt Keutzer
Shriyash Kaustubh Upadhyay
113
54
0
18 Mar 2024
Metaphor Understanding Challenge Dataset for LLMs
Metaphor Understanding Challenge Dataset for LLMs
Xiaoyu Tong
Rochelle Choenni
Martha Lewis
Ekaterina Shutova
71
12
0
18 Mar 2024
Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large
  Language Model
Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model
Haoyun Xu
Runzhe Zhan
Derek F. Wong
Lidia S. Chao
77
3
0
18 Mar 2024
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient
  LLMs Under Compression
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong
Jinhao Duan
Chenhui Zhang
Zhangheng Li
Chulin Xie
...
B. Kailkhura
Dan Hendrycks
Dawn Song
Zhangyang Wang
Yue Liu
110
28
0
18 Mar 2024
BEnQA: A Question Answering and Reasoning Benchmark for Bengali and
  English
BEnQA: A Question Answering and Reasoning Benchmark for Bengali and English
H. M. Q. H. Sheikh Shafayat
Rishav Hada
Isaac Cowhey
Rifki Afina
Jerry Tworek
Lorie De Leon
69
3
0
16 Mar 2024
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for
  Evaluating Vision Language Models
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models
Rocktim Jyoti Das
Simeon Emilov Hristov
Haonan Li
Dimitar Iliyanov Dimitrov
Ivan Koychev
Preslav Nakov
CoGeELM
116
17
0
15 Mar 2024
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot
Adrian Lañcucki
Marcin Chochowski
David Tarjan
Edoardo Ponti
100
56
0
14 Mar 2024
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text
  Transformation
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Hang Xu
Zhenguo Li
Dit-Yan Yeung
James T. Kwok
Yu Zhang
MLLM
125
52
0
14 Mar 2024
Laying the Foundation First? Investigating the Generalization from
  Atomic Skills to Complex Reasoning Tasks
Laying the Foundation First? Investigating the Generalization from Atomic Skills to Complex Reasoning Tasks
Yuncheng Huang
Qi He
Yipei Xu
Jiaqing Liang
Yanghua Xiao
LRM
83
1
0
14 Mar 2024
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Zhiqing Sun
Longhui Yu
Yikang Shen
Weiyang Liu
Yiming Yang
Sean Welleck
Chuang Gan
93
69
0
14 Mar 2024
Dial-insight: Fine-tuning Large Language Models with High-Quality
  Domain-Specific Data Preventing Capability Collapse
Dial-insight: Fine-tuning Large Language Models with High-Quality Domain-Specific Data Preventing Capability Collapse
Jianwei Sun
Chaoyang Mei
Linlin Wei
Kaiyu Zheng
Na Liu
Ming Cui
Tianyi Li
ALM
95
4
0
14 Mar 2024
Meta-Cognitive Analysis: Evaluating Declarative and Procedural Knowledge
  in Datasets and Large Language Models
Meta-Cognitive Analysis: Evaluating Declarative and Procedural Knowledge in Datasets and Large Language Models
Zhuoqun Li
Hongyu Lin
Yaojie Lu
Hao Xiang
Xianpei Han
Le Sun
53
1
0
14 Mar 2024
Meaningful Learning: Advancing Abstract Reasoning in Large Language
  Models via Generic Fact Guidance
Meaningful Learning: Advancing Abstract Reasoning in Large Language Models via Generic Fact Guidance
Kai Xiong
Xiao Ding
Ting Liu
Bing Qin
Dongliang Xu
Qing Yang
Hongtao Liu
Yixin Cao
LRM
74
7
0
14 Mar 2024
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic
Emad A. Alghamdi
Reem I. Masoud
Deema Alnuhait
Afnan Y. Alomairi
Ahmed Ashraf
Mohamed Zaytoon
84
6
0
14 Mar 2024
Ethos: Rectifying Language Models in Orthogonal Parameter Space
Ethos: Rectifying Language Models in Orthogonal Parameter Space
Lei Gao
Yue Niu
Tingting Tang
A. Avestimehr
Murali Annavaram
MU
83
12
0
13 Mar 2024
Simple and Scalable Strategies to Continually Pre-train Large Language
  Models
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim
Benjamin Thérien
Kshitij Gupta
Mats L. Richter
Quentin Anthony
Timothée Lesort
Eugene Belilovsky
Irina Rish
KELMCLL
109
63
0
13 Mar 2024
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language
  Agents
SOTOPIA-πππ: Interactive Learning of Socially Intelligent Language Agents
Ruiyi Wang
Haofei Yu
W. Zhang
Zhengyang Qi
Maarten Sap
Graham Neubig
Yonatan Bisk
Hao Zhu
LLMAG
121
44
0
13 Mar 2024
Language models scale reliably with over-training and on downstream
  tasks
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALMELMLRM
183
48
0
13 Mar 2024
Automatic Interactive Evaluation for Large Language Models with State
  Aware Patient Simulator
Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator
Yusheng Liao
Yutong Meng
Yuhao Wang
Hongcheng Liu
Yanfeng Wang
Yu Wang
LM&MAELM
69
9
0
13 Mar 2024
SMART: Submodular Data Mixture Strategy for Instruction Tuning
SMART: Submodular Data Mixture Strategy for Instruction Tuning
Kowndinya Renduchintala
S. Bhatia
Ganesh Ramakrishnan
92
5
0
13 Mar 2024
CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large
  Language Model
CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model
Cheng Chen
Sitong Su
Xu Luo
Hengtao Shen
Lianli Gao
Jingkuan Song
CLL
66
19
0
13 Mar 2024
HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain
  Reinforcement Learning From AI Feedback
HRLAIF: Improvements in Helpfulness and Harmlessness in Open-domain Reinforcement Learning From AI Feedback
Ang Li
Qiugen Xiao
Peng Cao
Jian Tang
Yi Yuan
...
Weidong Guo
Yukang Gan
Jeffrey Xu Yu
D. Wang
Ying Shan
VLMALM
93
10
0
13 Mar 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLMLLMAG
245
515
0
13 Mar 2024
Mastering Text, Code and Math Simultaneously via Fusing Highly
  Specialized Language Models
Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models
Ning Ding
Yulin Chen
Ganqu Cui
Xingtai Lv
Weilin Zhao
Ruobing Xie
Bowen Zhou
Zhiyuan Liu
Maosong Sun
ALMMoMeAI4CE
152
7
0
13 Mar 2024
Previous
123...474849...676869
Next