ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
    ELM
    RALM
ArXivPDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 875 papers shown
Title
FLM-101B: An Open LLM and How to Train It with $100K Budget
FLM-101B: An Open LLM and How to Train It with 100KBudget100K Budget100KBudget
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
Li Du
Bowen Qin
Zheng-Wei Zhang
Aixin Sun
Yequan Wang
60
21
0
07 Sep 2023
Aligning Large Language Models for Clinical Tasks
Aligning Large Language Models for Clinical Tasks
Supun Manathunga
Isuru Hettigoda
LM&MA
ELM
AI4MH
36
10
0
06 Sep 2023
GPT Can Solve Mathematical Problems Without a Calculator
GPT Can Solve Mathematical Problems Without a Calculator
Zhengyuan Yang
Ming Ding
Qingsong Lv
Zhihuan Jiang
Zehai He
Yuyi Guo
Jinfeng Bai
Jie Tang
RALM
LRM
39
53
0
06 Sep 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
64
4
0
28 Aug 2023
MedAlign: A Clinician-Generated Dataset for Instruction Following with
  Electronic Medical Records
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Scott L. Fleming
Alejandro Lozano
W. Haberkorn
Jenelle A. Jindal
E. Reis
...
Jonathan H. Chen
Keith Morse
Emma Brunskill
Jason Alan Fries
N. Shah
LM&MA
30
54
0
27 Aug 2023
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
54
14
0
23 Aug 2023
Large Language Models Sensitivity to The Order of Options in
  Multiple-Choice Questions
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions
Pouya Pezeshkpour
Estevam R. Hruschka
LRM
20
131
0
22 Aug 2023
LatEval: An Interactive LLMs Evaluation Benchmark with Incomplete
  Information from Lateral Thinking Puzzles
LatEval: An Interactive LLMs Evaluation Benchmark with Incomplete Information from Lateral Thinking Puzzles
Shulin Huang
Shirong Ma
Hai-Tao Zheng
Mengzuo Huang
Wuhe Zou
Weidong Zhang
Haitao Zheng
LLMAG
LRM
33
27
0
21 Aug 2023
GameEval: Evaluating LLMs on Conversational Games
GameEval: Evaluating LLMs on Conversational Games
Dan Qiao
Chenfei Wu
Yaobo Liang
Juntao Li
Nan Duan
ELM
LLMAG
26
20
0
19 Aug 2023
Separate the Wheat from the Chaff: Model Deficiency Unlearning via
  Parameter-Efficient Module Operation
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
Xinshuo Hu
Dongfang Li
Baotian Hu
Zihao Zheng
Zhenyu Liu
Hao Fei
KELM
MU
35
26
0
16 Aug 2023
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder
  Language Models
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models
Jie Huang
Ming-Yu Liu
Peng Xu
M. Shoeybi
Kevin Chen-Chuan Chang
Bryan Catanzaro
RALM
35
33
0
15 Aug 2023
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained
  Text Evaluation
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation
David Heineman
Yao Dou
Wei-ping Xu
32
7
0
14 Aug 2023
Evaluating the Generation Capabilities of Large Chinese Language Models
Evaluating the Generation Capabilities of Large Chinese Language Models
Hui Zeng
Jingyuan Xue
Meng Hao
Chen Sun
Bin Ning
Na Zhang
ELM
22
12
0
09 Aug 2023
In-Context Alignment: Chat with Vanilla Language Models Before
  Fine-Tuning
In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning
Xiaochuang Han
25
19
0
08 Aug 2023
Simple synthetic data reduces sycophancy in large language models
Simple synthetic data reduces sycophancy in large language models
Jerry W. Wei
Da Huang
Yifeng Lu
Denny Zhou
Quoc V. Le
33
69
0
07 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
28
3
0
07 Aug 2023
ChatMOF: An Autonomous AI System for Predicting and Generating
  Metal-Organic Frameworks
ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks
Y. Kang
Jihan Kim
AI4CE
LLMAG
32
12
0
01 Aug 2023
Generative Models as a Complex Systems Science: How can we make sense of
  large language model behavior?
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
32
14
0
31 Jul 2023
Scaling Sentence Embeddings with Large Language Models
Scaling Sentence Embeddings with Large Language Models
Ting Jiang
Shaohan Huang
Zhongzhi Luan
Deqing Wang
Fuzhen Zhuang
LRM
44
40
0
31 Jul 2023
Okapi: Instruction-tuned Large Language Models in Multiple Languages
  with Reinforcement Learning from Human Feedback
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Viet Dac Lai
Chien Van Nguyen
Nghia Trung Ngo
Thuat Nguyen
Franck Dernoncourt
Ryan A. Rossi
Thien Huu Nguyen
ALM
42
131
0
29 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
Fahad Shahbaz Khan
VLM
38
118
0
25 Jul 2023
Instruction-following Evaluation through Verbalizer Manipulation
Instruction-following Evaluation through Verbalizer Manipulation
Shiyang Li
Jun Yan
Hai Wang
Zheng Tang
Xiang Ren
Vijay Srinivasan
Hongxia Jin
36
25
0
20 Jul 2023
DialogStudio: Towards Richest and Most Diverse Unified Dataset
  Collection for Conversational AI
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
Jianguo Zhang
Kun Qian
Zhiwei Liu
Shelby Heinecke
Rui Meng
Ye Liu
Zhou Yu
Huan Wang
Silvio Savarese
Caiming Xiong
39
22
0
19 Jul 2023
CValues: Measuring the Values of Chinese Large Language Models from
  Safety to Responsibility
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility
Guohai Xu
Jiayi Liu
Mingshi Yan
Haotian Xu
Jinghui Si
...
Rong Zhang
Ji Zhang
Chao Peng
Feiyan Huang
Jingren Zhou
ALM
ELM
36
74
0
19 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
120
11,099
0
18 Jul 2023
COLLIE: Systematic Construction of Constrained Text Generation Tasks
COLLIE: Systematic Construction of Constrained Text Generation Tasks
Shunyu Yao
Howard Chen
Austin W. Hanjie
Runzhe Yang
Karthik Narasimhan
47
32
0
17 Jul 2023
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and
  Rule-Based Methods
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods
Steven Moore
H. A. Nguyen
Tianying Chen
John C. Stamper
ELM
21
33
0
16 Jul 2023
Large Language Models as Superpositions of Cultural Perspectives
Large Language Models as Superpositions of Cultural Perspectives
Grgur Kovač
Masataka Sawayama
Rémy Portelas
Cédric Colas
Peter Ford Dominey
Pierre-Yves Oudeyer
LLMAG
38
33
0
15 Jul 2023
Empowering Cross-lingual Behavioral Testing of NLP Models with
  Typological Features
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features
Ester Hlavnova
Sebastian Ruder
35
5
0
11 Jul 2023
Becoming self-instruct: introducing early stopping criteria for minimal
  instruct tuning
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning
Waseem Alshikh
Manhal Daaboul
K. Goddard
Brock Imel
Kiran Kamble
Parikshit Kulkarni
M. Russak
ALM
11
13
0
05 Jul 2023
Robots That Ask For Help: Uncertainty Alignment for Large Language Model
  Planners
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Allen Z. Ren
Anushri Dixit
Alexandra Bodrova
Sumeet Singh
Stephen Tu
...
Jacob Varley
Zhenjia Xu
Dorsa Sadigh
Andy Zeng
Anirudha Majumdar
LM&Ro
64
220
0
04 Jul 2023
Personality Traits in Large Language Models
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MA
LLMAG
58
119
0
01 Jul 2023
On the Exploitability of Instruction Tuning
On the Exploitability of Instruction Tuning
Manli Shu
Jiong Wang
Chen Zhu
Jonas Geiping
Chaowei Xiao
Tom Goldstein
SILM
36
92
0
28 Jun 2023
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
62
359
0
20 Jun 2023
Domain-specific ChatBots for Science using Embeddings
Domain-specific ChatBots for Science using Embeddings
Kevin G. Yager
41
8
0
15 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric P. Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
51
3,854
0
09 Jun 2023
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large
  Language Models
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
Yew Ken Chia
Pengfei Hong
Lidong Bing
Soujanya Poria
ELM
25
63
0
07 Jun 2023
Applying Standards to Advance Upstream & Downstream Ethics in Large
  Language Models
Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models
Jose Berengueres
Marybeth Sandell
27
0
0
06 Jun 2023
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Zechun Liu
Barlas Oğuz
Changsheng Zhao
Ernie Chang
Pierre Stock
Yashar Mehdad
Yangyang Shi
Raghuraman Krishnamoorthi
Vikas Chandra
MQ
60
190
0
29 May 2023
Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language
  Models
Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models
Yuhui Zhang
Michihiro Yasunaga
Zhengping Zhou
Jeff Z. HaoChen
James Zou
Percy Liang
Serena Yeung
47
7
0
27 May 2023
The False Promise of Imitating Proprietary LLMs
The False Promise of Imitating Proprietary LLMs
Arnav Gudibande
Eric Wallace
Charles Burton Snell
Xinyang Geng
Hao Liu
Pieter Abbeel
Sergey Levine
Dawn Song
ALM
44
198
0
25 May 2023
In-Context Impersonation Reveals Large Language Models' Strengths and
  Biases
In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Leonard Salewski
Stephan Alaniz
Isabel Rio-Torto
Eric Schulz
Zeynep Akata
44
151
0
24 May 2023
Estimating Large Language Model Capabilities without Labeled Test Data
Estimating Large Language Model Capabilities without Labeled Test Data
Harvey Yiyun Fu
Qinyuan Ye
Albert Xu
Xiang Ren
Robin Jia
21
8
0
24 May 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
  Large Language Models
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
...
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALM
MoE
38
54
0
24 May 2023
Emergent inabilities? Inverse scaling over the course of pretraining
Emergent inabilities? Inverse scaling over the course of pretraining
J. Michaelov
Benjamin Bergen
LRM
ReLM
22
3
0
24 May 2023
Improving Factuality and Reasoning in Language Models through Multiagent
  Debate
Improving Factuality and Reasoning in Language Models through Multiagent Debate
Yilun Du
Shuang Li
Antonio Torralba
J. Tenenbaum
Igor Mordatch
LLMAG
LRM
70
614
0
23 May 2023
Enhancing Chat Language Models by Scaling High-quality Instructional
  Conversations
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
Ning Ding
Yulin Chen
Bokai Xu
Yujia Qin
Zhi Zheng
Shengding Hu
Zhiyuan Liu
Maosong Sun
Bowen Zhou
ALM
45
491
0
23 May 2023
Memory-Efficient Fine-Tuning of Compressed Large Language Models via
  sub-4-bit Integer Quantization
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Jeonghoon Kim
J. H. Lee
Sungdong Kim
Joonsuk Park
Kang Min Yoo
S. Kwon
Dongsoo Lee
MQ
44
99
0
23 May 2023
Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large
  Language Models
Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models
Alfonso Amayuelas
Kyle Wong
Liangming Pan
Wenhu Chen
Luu Anh Tuan
42
26
0
23 May 2023
Polyglot or Not? Measuring Multilingual Encyclopedic Knowledge in
  Foundation Models
Polyglot or Not? Measuring Multilingual Encyclopedic Knowledge in Foundation Models
Tim Schott
Daniel Furman
Shreshta Bhat
ELM
35
4
0
23 May 2023
Previous
123...15161718
Next