ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs
Anselm Paulus
Arman Zharmagambetov
Chuan Guo
Brandon Amos
Yuandong Tian
AAML
145
67
0
21 Apr 2024
Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE
  Questions
Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions
Soumyadeep Roy
A. Khatua
Fatemeh Ghoochani
Uwe Hadler
Wolfgang Nejdl
Niloy Ganguly
ELMLM&MA
86
11
0
20 Apr 2024
Ensemble Learning for Heterogeneous Large Language Models with Deep
  Parallel Collaboration
Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration
Yi-Chong Huang
Xiaocheng Feng
Baohang Li
Yang Xiang
Hui Wang
Bing Qin
Ting Liu
FedML
97
30
0
19 Apr 2024
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Chao Jin
Zili Zhang
Xuanlin Jiang
Fangyue Liu
Xin Liu
Xuanzhe Liu
Xin Jin
118
47
0
18 Apr 2024
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
  Models
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Aitor Ormazabal
Che Zheng
Cyprien de Masson dÁutume
Dani Yogatama
Deyu Fu
...
Yazheng Yang
Yi Tay
Yuqi Wang
Zhongkai Zhu
Zhihui Xie
LRMVLMReLM
98
52
0
18 Apr 2024
Large Language Models in Targeted Sentiment Analysis
Large Language Models in Targeted Sentiment Analysis
Nicolay Rusnachenko
A. Golubev
Natalia Loukachevitch
LRM
70
3
0
18 Apr 2024
From Form(s) to Meaning: Probing the Semantic Depths of Language Models
  Using Multisense Consistency
From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
Xenia Ohmer
Elia Bruni
Dieuwke Hupkes
AI4CE
113
7
0
18 Apr 2024
CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment
CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment
Geyu Lin
Bin Wang
Zhengyuan Liu
Nancy F. Chen
148
8
0
18 Apr 2024
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
Minbeom Kim
Hwanhee Lee
Joonsuk Park
Hwaran Lee
Kyomin Jung
122
3
0
18 Apr 2024
The Landscape of Emerging AI Agent Architectures for Reasoning,
  Planning, and Tool Calling: A Survey
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey
Tula Masterman
Sandi Besen
Mason Sawtell
Alex Chao
LM&RoLLMAG
114
58
0
17 Apr 2024
Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization
Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization
Costas Mavromatis
Petros Karypis
George Karypis
MoMe
79
30
0
17 Apr 2024
Paraphrase and Solve: Exploring and Exploiting the Impact of Surface
  Form on Mathematical Reasoning in Large Language Models
Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models
Yue Zhou
Yada Zhu
Diego Antognini
Yoon Kim
Yang Zhang
ReLMLRM
41
3
0
17 Apr 2024
AgentKit: Flow Engineering with Graphs, not Coding
AgentKit: Flow Engineering with Graphs, not Coding
Yue Wu
Yewen Fan
So Yeon Min
Shrimai Prabhumoye
Stephen Marcus McAleer
Yonatan Bisk
Ruslan Salakhutdinov
Yuanzhi Li
Tom Michael Mitchell
AI4CE
104
1
0
17 Apr 2024
ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large
  Language Models
ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models
Trong-Hieu Nguyen
Anh-Cuong Le
Viet-Cuong Nguyen
60
1
0
17 Apr 2024
A Survey on Retrieval-Augmented Text Generation for Large Language
  Models
A Survey on Retrieval-Augmented Text Generation for Large Language Models
Yizheng Huang
Jimmy X. Huang
3DVRALM
154
51
0
17 Apr 2024
SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA
  of LLMs
SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs
Jaehyung Kim
Jaehyun Nam
Sangwoo Mo
Jongjin Park
Sang-Woo Lee
Minjoon Seo
Jung-Woo Ha
Jinwoo Shin
AIFinRALMELM
121
51
0
17 Apr 2024
Self-playing Adversarial Language Game Enhances LLM Reasoning
Self-playing Adversarial Language Game Enhances LLM Reasoning
Pengyu Cheng
Tianhao Hu
Han Xu
Zhisong Zhang
Yong Dai
Lei Han
Nan Du
Nan Du
Xiaolong Li
SyDaLRMReLM
193
38
0
16 Apr 2024
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Haozheng Fan
Hao Zhou
Guangtai Huang
Parameswaran Raman
Xinwei Fu
Gaurav Gupta
Dhananjay Ram
Yida Wang
Jun Huan
81
6
0
16 Apr 2024
DESTEIN: Navigating Detoxification of Language Models via Universal
  Steering Pairs and Head-wise Activation Fusion
DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion
Yu Li
Zhihua Wei
Han Jiang
Chuanyang Gong
LLMSV
84
3
0
16 Apr 2024
Compression Represents Intelligence Linearly
Compression Represents Intelligence Linearly
Yuzhen Huang
Jinghan Zhang
Zifei Shan
Junxian He
82
29
0
15 Apr 2024
Resilience of Large Language Models for Noisy Instructions
Resilience of Large Language Models for Noisy Instructions
Bin Wang
Chengwei Wei
Zhengyuan Liu
Geyu Lin
Nancy F. Chen
142
15
0
15 Apr 2024
Unveiling Imitation Learning: Exploring the Impact of Data Falsity to
  Large Language Model
Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model
Hyunsoo Cho
ALM
31
0
0
15 Apr 2024
LoRA Dropout as a Sparsity Regularizer for Overfitting Control
LoRA Dropout as a Sparsity Regularizer for Overfitting Control
Yang Lin
Xinyu Ma
Xu Chu
Yujie Jin
Zhibang Yang
Yasha Wang
Hong-yan Mei
97
27
0
15 Apr 2024
Prepacking: A Simple Method for Fast Prefilling and Increased Throughput
  in Large Language Models
Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Siyan Zhao
Daniel Israel
Guy Van den Broeck
Aditya Grover
KELMVLM
73
6
0
15 Apr 2024
Learn Your Reference Model for Real Good Alignment
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski
Boris Shaposhnikov
Alexey Malakhov
Nikita Surnachev
Yaroslav Aksenov
Ian Maksimov
Nikita Balagansky
Daniil Gavrilov
OffRL
131
35
0
15 Apr 2024
LLeMpower: Understanding Disparities in the Control and Access of Large
  Language Models
LLeMpower: Understanding Disparities in the Control and Access of Large Language Models
Vishwas Sathish
Hannah Lin
Aditya K Kamath
Anish Nyayachavadi
86
5
0
14 Apr 2024
Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions
Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions
Taojun Hu
Xiao-Hua Zhou
ELM
88
18
0
14 Apr 2024
Confidence Calibration and Rationalization for LLMs via Multi-Agent
  Deliberation
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation
Ruixin Yang
Dheeraj Rajagopal
S. Hayati
Bin Hu
Dongyeop Kang
LLMAG
136
7
0
14 Apr 2024
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models
  with Sparse Mixture of Low-Rank Adapter Experts
MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts
Yusheng Liao
Shuyang Jiang
Yu Wang
Yanfeng Wang
MoE
116
5
0
13 Apr 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
97
33
0
12 Apr 2024
Look at the Text: Instruction-Tuned Language Models are More Robust
  Multiple Choice Selectors than You Think
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
Xinpeng Wang
Chengzhi Hu
Bolei Ma
Paul Röttger
Barbara Plank
OOD
95
6
0
12 Apr 2024
Do Large Language Models Learn Human-Like Strategic Preferences?
Do Large Language Models Learn Human-Like Strategic Preferences?
Jesse Roberts
Kyle Moore
Douglas H. Fisher
57
5
0
11 Apr 2024
MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference
MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference
Mobashir Sadat
Cornelia Caragea
87
5
0
11 Apr 2024
Rho-1: Not All Tokens Are What You Need
Rho-1: Not All Tokens Are What You Need
Zheng-Wen Lin
Zhibin Gou
Yeyun Gong
Xiao Liu
Yelong Shen
...
Chen Lin
Yujiu Yang
Jian Jiao
Nan Duan
Weizhu Chen
CLL
160
75
0
11 Apr 2024
Post-Hoc Reversal: Are We Selecting Models Prematurely?
Post-Hoc Reversal: Are We Selecting Models Prematurely?
Rishabh Ranjan
Saurabh Garg
Mrigank Raman
Carlos Guestrin
Zachary Chase Lipton
78
0
0
11 Apr 2024
UltraEval: A Lightweight Platform for Flexible and Comprehensive
  Evaluation for LLMs
UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs
Chaoqun He
Renjie Luo
Shengding Hu
Yuanqian Zhao
Jie Zhou
Hanghao Wu
Jiajie Zhang
Xu Han
Zhiyuan Liu
Maosong Sun
ELM
62
17
0
11 Apr 2024
MM-PhyQA: Multimodal Physics Question-Answering With Multi-Image CoT
  Prompting
MM-PhyQA: Multimodal Physics Question-Answering With Multi-Image CoT Prompting
Avinash Anand
Janak Kapuriya
Apoorv Singh
Jay Saraf
Naman Lal
Astha Verma
Rushali Gupta
R. Shah
LRM
46
15
0
11 Apr 2024
Scalable Language Model with Generalized Continual Learning
Scalable Language Model with Generalized Continual Learning
Bohao Peng
Zhuotao Tian
Shu Liu
Mingchang Yang
Jiaya Jia
ALMCLLKELM
89
18
0
11 Apr 2024
CQIL: Inference Latency Optimization with Concurrent Computation of
  Quasi-Independent Layers
CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers
Longwei Zou
Qingyang Wang
Han Zhao
Jiangang Kong
Yi Yang
Yangdong Deng
107
0
0
10 Apr 2024
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging
  LLMs' (Lack of) Multicultural Knowledge
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Yu Ying Chiu
Amirhossein Ajalloeian
Maria Antoniak
Chan Young Park
Shuyue Stella Li
Mehar Bhatia
Sahithya Ravi
Yulia Tsvetkov
Vered Shwartz
Yejin Choi
87
23
0
10 Apr 2024
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Kehua Feng
Keyan Ding
Hongzhi Tan
Kede Ma
Zhihua Wang
...
Yuzhou Cheng
Ge Sun
Guozhou Zheng
Qiang Zhang
H. Chen
128
13
0
10 Apr 2024
Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian
  Language?
Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?
Omid Ghahroodi
Marzia Nouri
Mohammad V. Sanian
Alireza Sahebi
D. Dastgheib
Ehsaneddin Asgari
M. Baghshah
M. Rohban
ELMAAML
85
11
0
09 Apr 2024
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
Chonghua Wang
Haodong Duan
Songyang Zhang
Dahua Lin
Kai-xiang Chen
ELM
82
23
0
09 Apr 2024
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
  Training Strategies
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu
Yuge Tu
Xu Han
Chaoqun He
Ganqu Cui
...
Chaochao Jia
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
MoE
131
347
0
09 Apr 2024
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation
  of Large Language Models
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Zhengran Zeng
Wei Ye
Jindong Wang
Yue Zhang
Shikun Zhang
63
3
0
09 Apr 2024
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
Michael Lutz
Arth Bohra
Manvel Saroyan
Artem Harutyunyan
Giovanni Campagna
LLMAG
63
15
0
08 Apr 2024
Eraser: Jailbreaking Defense in Large Language Models via Unlearning
  Harmful Knowledge
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge
Weikai Lu
Huiping Zhuang
Jianwei Wang
Zhengdong Lu
Zelin Chen
Huiping Zhuang
Cen Chen
MUAAMLKELM
88
30
0
08 Apr 2024
CodecLM: Aligning Language Models with Tailored Synthetic Data
CodecLM: Aligning Language Models with Tailored Synthetic Data
Zifeng Wang
Chun-Liang Li
Vincent Perot
Long T. Le
Jin Miao
Zizhao Zhang
Chen-Yu Lee
Tomas Pfister
SyDaALM
73
21
0
08 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELMKELM
167
39
0
08 Apr 2024
PORTULAN ExtraGLUE Datasets and Models: Kick-starting a Benchmark for
  the Neural Processing of Portuguese
PORTULAN ExtraGLUE Datasets and Models: Kick-starting a Benchmark for the Neural Processing of Portuguese
T. Osório
Bernardo Leite
Henrique Lopes Cardoso
Luís Gomes
João Rodrigues
Rodrigo Santos
António Branco
96
3
0
08 Apr 2024
Previous
123...454647...676869
Next