ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with
  Code-based Self-Verification
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Aojun Zhou
Ke Wang
Zimu Lu
Weikang Shi
Sichun Luo
...
Shaoqing Lu
Anya Jia
Linqi Song
Mingjie Zhan
Hongsheng Li
ReLMLRM
82
159
0
15 Aug 2023
Robustness Over Time: Understanding Adversarial Examples' Effectiveness
  on Longitudinal Versions of Large Language Models
Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models
Yugeng Liu
Tianshuo Cong
Zhengyu Zhao
Michael Backes
Yun Shen
Yang Zhang
AAML
90
8
0
15 Aug 2023
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained
  Text Evaluation
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation
David Heineman
Yao Dou
Wei Xu
80
7
0
14 Aug 2023
CausalLM is not optimal for in-context learning
CausalLM is not optimal for in-context learning
Nan Ding
Tomer Levinboim
Jialin Wu
Sebastian Goodman
Radu Soricut
72
26
0
14 Aug 2023
Building Trust in Conversational AI: A Comprehensive Review and Solution
  Architecture for Explainable, Privacy-Aware Systems using LLMs and Knowledge
  Graph
Building Trust in Conversational AI: A Comprehensive Review and Solution Architecture for Explainable, Privacy-Aware Systems using LLMs and Knowledge Graph
Ahtsham Zafar
V. Parthasarathy
Chan Le Van
Saad Shahid
A. khan
Arsalan Shahid
79
14
0
13 Aug 2023
Self-Alignment with Instruction Backtranslation
Self-Alignment with Instruction Backtranslation
Xian Li
Ping Yu
Chunting Zhou
Timo Schick
Omer Levy
Luke Zettlemoyer
Jason Weston
M. Lewis
SyDa
102
135
0
11 Aug 2023
Evaluating the Generation Capabilities of Large Chinese Language Models
Evaluating the Generation Capabilities of Large Chinese Language Models
Hui Zeng
Jingyuan Xue
Meng Hao
Chen Sun
Bin Ning
Na Zhang
ELM
82
12
0
09 Aug 2023
CLEVA: Chinese Language Models EVAluation Platform
CLEVA: Chinese Language Models EVAluation Platform
Yanyang Li
Jianqiao Zhao
Duo Zheng
Zi-Yuan Hu
Zhi Chen
...
Yongfeng Huang
Shijia Huang
Dahua Lin
Michael R. Lyu
Liwei Wang
ALMELM
100
11
0
09 Aug 2023
In-Context Alignment: Chat with Vanilla Language Models Before
  Fine-Tuning
In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning
Xiaochuang Han
52
19
0
08 Aug 2023
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Binfeng Xu
Xukun Liu
Hua Shen
Zeyu Han
Yuhan Li
Murong Yue
Zhi-Ping Peng
Yuchen Liu
Ziyu Yao
Dongkuan Xu
LLMAG
97
19
0
08 Aug 2023
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
Jiaju Lin
Haoran Zhao
Aochi Zhang
Yiting Wu
Huqiuyue Ping
Qin Chen
ELMLLMAG
104
68
0
08 Aug 2023
Simple synthetic data reduces sycophancy in large language models
Simple synthetic data reduces sycophancy in large language models
Jerry W. Wei
Da Huang
Yifeng Lu
Denny Zhou
Quoc V. Le
114
74
0
07 Aug 2023
AgentBench: Evaluating LLMs as Agents
AgentBench: Evaluating LLMs as Agents
Xiao Liu
Hao Yu
Hanchen Zhang
Yifan Xu
Xuanyu Lei
...
Yu-Chuan Su
Huan Sun
Minlie Huang
Yuxiao Dong
Jie Tang
ELMLLMAG
152
315
0
07 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
76
3
0
07 Aug 2023
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models
  Fine-tuning
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning
Longteng Zhang
Lin Zhang
Shaoshuai Shi
Xiaowen Chu
Yue Liu
AI4CE
72
107
0
07 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
146
90
0
03 Aug 2023
ChatMOF: An Autonomous AI System for Predicting and Generating
  Metal-Organic Frameworks
ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks
Y. Kang
Jihan Kim
AI4CELLMAG
91
13
0
01 Aug 2023
Generative Models as a Complex Systems Science: How can we make sense of
  large language model behavior?
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
92
15
0
31 Jul 2023
Scaling Sentence Embeddings with Large Language Models
Scaling Sentence Embeddings with Large Language Models
Ting Jiang
Shaohan Huang
Zhongzhi Luan
Deqing Wang
Fuzhen Zhuang
LRM
115
47
0
31 Jul 2023
Do LLMs Possess a Personality? Making the MBTI Test an Amazing
  Evaluation for Large Language Models
Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models
Keyu Pan
Yawen Zeng
LLMAG
83
44
0
30 Jul 2023
Okapi: Instruction-tuned Large Language Models in Multiple Languages
  with Reinforcement Learning from Human Feedback
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Viet Dac Lai
Chien Van Nguyen
Nghia Trung Ngo
Thuat Nguyen
Franck Dernoncourt
Ryan Rossi
Thien Huu Nguyen
ALM
133
150
0
29 Jul 2023
SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark
SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark
Liang Xu
Anqi Li
Lei Zhu
Han Xue
Changtai Zhu
Kangkang Zhao
Hao He
Xuanwei Zhang
Qiyue Kang
Zhenzhong Lan
RALMELMLRM
77
55
0
27 Jul 2023
TransNormerLLM: A Faster and Better Large Language Model with Improved
  TransNormer
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
Zhen Qin
Dong Li
Weigao Sun
Weixuan Sun
Xuyang Shen
...
Yunshen Wei
Baohong Lv
Xiao Luo
Yu Qiao
Yiran Zhong
94
18
0
27 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
146
127
0
25 Jul 2023
Evaluating Large Language Models for Radiology Natural Language
  Processing
Evaluating Large Language Models for Radiology Natural Language Processing
Zheng Liu
Tianyang Zhong
Yiwei Li
Yutong Zhang
Yirong Pan
...
Shijie Zhao
Quanzheng Li
Hongtu Zhu
Dinggang Shen
Tianming Liu
LM&MAELM
126
6
0
25 Jul 2023
ARB: Advanced Reasoning Benchmark for Large Language Models
ARB: Advanced Reasoning Benchmark for Large Language Models
Tomohiro Sawada
Daniel Paleka
Alexander Havrilla
Pranav Tadepalli
Paula Vidas
Alexander Kranias
John J. Nay
Kshitij Gupta
Aran Komatsuzaki
ELMLRM
81
39
0
25 Jul 2023
A Real-World WebAgent with Planning, Long Context Understanding, and
  Program Synthesis
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&RoLLMAG
194
226
0
24 Jul 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language
  Models
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
ELMALM
123
156
0
20 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill
  Sets
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
134
108
0
20 Jul 2023
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities
  of Large Language Models
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang
Ziniu Hu
Pan Lu
Yanqiao Zhu
Jieyu Zhang
Satyen Subramaniam
Arjun R. Loomba
Shichang Zhang
Yizhou Sun
Wei Wang
ELMLRM
71
114
0
20 Jul 2023
Instruction-following Evaluation through Verbalizer Manipulation
Instruction-following Evaluation through Verbalizer Manipulation
Shiyang Li
Jun Yan
Hai Wang
Zheng Tang
Xiang Ren
Vijay Srinivasan
Hongxia Jin
112
27
0
20 Jul 2023
DialogStudio: Towards Richest and Most Diverse Unified Dataset
  Collection for Conversational AI
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
Jianguo Zhang
Kun Qian
Zhiwei Liu
Shelby Heinecke
Rui Meng
Ye Liu
Zhou Yu
Huan Wang
Silvio Savarese
Caiming Xiong
115
22
0
19 Jul 2023
CValues: Measuring the Values of Chinese Large Language Models from
  Safety to Responsibility
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility
Guohai Xu
Jiayi Liu
Mingshi Yan
Haotian Xu
Jinghui Si
...
Rong Zhang
Ji Zhang
Chao Peng
Feiyan Huang
Jingren Zhou
ALMELM
93
83
0
19 Jul 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple
  Choice Capabilities in Chinchilla
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Tom Lieberum
Matthew Rahtz
János Kramár
Neel Nanda
G. Irving
Rohin Shah
Vladimir Mikulik
103
115
0
18 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
534
12,130
0
18 Jul 2023
AlpaGasus: Training A Better Alpaca with Fewer Data
AlpaGasus: Training A Better Alpaca with Fewer Data
Lichang Chen
Shiyang Li
Jun Yan
Hai Wang
Kalpa Gunaratna
...
Zheng Tang
Vijay Srinivasan
Dinesh Manocha
Heng-Chiao Huang
Hongxia Jin
ALM
125
0
0
17 Jul 2023
COLLIE: Systematic Construction of Constrained Text Generation Tasks
COLLIE: Systematic Construction of Constrained Text Generation Tasks
Shunyu Yao
Howard Chen
Austin W. Hanjie
Runzhe Yang
Karthik Narasimhan
108
35
0
17 Jul 2023
Measuring Faithfulness in Chain-of-Thought Reasoning
Measuring Faithfulness in Chain-of-Thought Reasoning
Tamera Lanham
Anna Chen
Ansh Radhakrishnan
Benoit Steiner
Carson E. Denison
...
Zac Hatfield-Dodds
Jared Kaplan
J. Brauner
Sam Bowman
Ethan Perez
ReLMLRM
80
193
0
17 Jul 2023
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and
  Rule-Based Methods
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods
Steven Moore
H. A. Nguyen
Tianying Chen
John C. Stamper
ELM
66
35
0
16 Jul 2023
Do Emergent Abilities Exist in Quantized Large Language Models: An
  Empirical Study
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study
Peiyu Liu
Zikang Liu
Ze-Feng Gao
Dawei Gao
Wayne Xin Zhao
Yaliang Li
Bolin Ding
Ji-Rong Wen
MQLRM
94
35
0
16 Jul 2023
Large Language Models as Superpositions of Cultural Perspectives
Large Language Models as Superpositions of Cultural Perspectives
Grgur Kovač
Masataka Sawayama
Rémy Portelas
Cédric Colas
Peter Ford Dominey
Pierre-Yves Oudeyer
LLMAG
85
37
0
15 Jul 2023
Effective Prompt Extraction from Language Models
Effective Prompt Extraction from Language Models
Yiming Zhang
Nicholas Carlini
Daphne Ippolito
MIACVSILM
105
43
0
13 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Mian
OffRL
261
624
0
12 Jul 2023
Instruction Mining: When Data Mining Meets Large Language Model
  Finetuning
Instruction Mining: When Data Mining Meets Large Language Model Finetuning
Yihan Cao
Yanbin Kang
Chi Wang
Lichao Sun
ALM
24
36
0
12 Jul 2023
Empowering Cross-lingual Behavioral Testing of NLP Models with
  Typological Features
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features
Ester Hlavnova
Sebastian Ruder
80
5
0
11 Jul 2023
OntoChatGPT Information System: Ontology-Driven Structured Prompts for
  ChatGPT Meta-Learning
OntoChatGPT Information System: Ontology-Driven Structured Prompts for ChatGPT Meta-Learning
O. Palagin
Vladislav Kaverinskiy
Anna Litvin
Kyrylo S. Malakhov
KELM
33
25
0
11 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
150
125
0
06 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
223
1,766
0
06 Jul 2023
Style Over Substance: Evaluation Biases for Large Language Models
Style Over Substance: Evaluation Biases for Large Language Models
Minghao Wu
Alham Fikri Aji
ALMELM
147
47
0
06 Jul 2023
Becoming self-instruct: introducing early stopping criteria for minimal
  instruct tuning
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning
Waseem Alshikh
Manhal Daaboul
K. Goddard
Brock Imel
Kiran Kamble
Parikshit Kulkarni
M. Russak
ALM
13
13
0
05 Jul 2023
Previous
123...636465...676869
Next