ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
Immunization against harmful fine-tuning attacks
Immunization against harmful fine-tuning attacks
Domenic Rosati
Jan Wehner
Kai Williams
Lukasz Bartoszcze
Jan Batzner
Hassan Sajjad
Frank Rudzicz
AAML
109
22
0
26 Feb 2024
Cross-domain Chinese Sentence Pattern Parsing
Cross-domain Chinese Sentence Pattern Parsing
Yingsi Yu
Cunliang Kong
Liner Yang
Meishan Zhang
Lin Zhu
Yujie Wang
Haozhe Lin
Maosong Sun
Erhong Yang
89
0
0
26 Feb 2024
Foundation Model Transparency Reports
Foundation Model Transparency Reports
Rishi Bommasani
Kevin Klyman
Shayne Longpre
Betty Xiong
Sayash Kapoor
Nestor Maslej
Arvind Narayanan
Percy Liang
85
18
0
26 Feb 2024
ChatMusician: Understanding and Generating Music Intrinsically with LLM
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Ti-Fen Pan
Hanfeng Lin
Yi Wang
Zeyue Tian
Shangda Wu
...
Gus Xia
Roger Dannenberg
Wei Xue
Shiyin Kang
Yike Guo
176
44
0
25 Feb 2024
PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization
PeriodicLoRA: Breaking the Low-Rank Bottleneck in LoRA Optimization
Xiangdi Meng
Damai Dai
Weiyao Luo
Zhe Yang
Shaoxiang Wu
Xiaochen Wang
Peiyi Wang
Qingxiu Dong
Liang Chen
Zhifang Sui
170
13
0
25 Feb 2024
EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using
  Discharge Summaries
EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries
Sunjun Kweon
Jiyoun Kim
Heeyoung Kwak
Dongchul Cha
Hangyul Yoon
Kwanghyun Kim
Jeewon Yang
Seunghyun Won
Edward Choi
LM&MA
93
4
0
25 Feb 2024
PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA
PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA
Sheng Wang
Boyang Xue
Jiacheng Ye
Jiyue Jiang
Liheng Chen
Lingpeng Kong
Chuan Wu
92
15
0
24 Feb 2024
Look Before You Leap: Problem Elaboration Prompting Improves
  Mathematical Reasoning in Large Language Models
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models
Haoran Liao
Jidong Tian
Shaohua Hu
Hao He
Yaohui Jin
ReLMLRM
79
0
0
24 Feb 2024
Certifying Knowledge Comprehension in LLMs
Certifying Knowledge Comprehension in LLMs
Isha Chaudhary
Vedaant V. Jain
Gagandeep Singh
67
0
0
24 Feb 2024
Addressing Order Sensitivity of In-Context Demonstration Examples in
  Causal Language Models
Addressing Order Sensitivity of In-Context Demonstration Examples in Causal Language Models
Yanzheng Xiang
Hanqi Yan
Lin Gui
Yulan He
68
9
0
23 Feb 2024
How (un)ethical are instruction-centric responses of LLMs? Unveiling the
  vulnerabilities of safety guardrails to harmful queries
How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
Somnath Banerjee
Sayan Layek
Rima Hazra
Animesh Mukherjee
90
18
0
23 Feb 2024
Advancing Parameter Efficiency in Fine-tuning via Representation Editing
Advancing Parameter Efficiency in Fine-tuning via Representation Editing
Muling Wu
Tianlong Li
Xiaohua Wang
Changze Lv
Changze Lv
Zixuan Ling
Jianhao Zhu
Cenyuan Zhang
Xiaoqing Zheng
Xuanjing Huang
79
25
0
23 Feb 2024
Machine Unlearning of Pre-trained Large Language Models
Machine Unlearning of Pre-trained Large Language Models
Jin Yao
Eli Chien
Minxin Du
Xinyao Niu
Tianhao Wang
Zezhou Cheng
Xiang Yue
MU
152
51
0
23 Feb 2024
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large
  Language Models
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Wei Ye
Jindong Wang
Xing Xie
Yue Zhang
Shikun Zhang
90
28
0
23 Feb 2024
Unintended Impacts of LLM Alignment on Global Representation
Unintended Impacts of LLM Alignment on Global Representation
Michael Joseph Ryan
William B. Held
Diyi Yang
116
42
0
22 Feb 2024
tinyBenchmarks: evaluating LLMs with fewer examples
tinyBenchmarks: evaluating LLMs with fewer examples
Felipe Maia Polo
Lucas Weber
Leshem Choshen
Yuekai Sun
Gongjun Xu
Mikhail Yurochkin
ELM
102
99
0
22 Feb 2024
Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced
  Safety Alignment
Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
Jiong Wang
Jiazhao Li
Yiquan Li
Xiangyu Qi
Junjie Hu
Yixuan Li
P. McDaniel
Muhao Chen
Bo Li
Chaowei Xiao
AAMLSILM
113
22
0
22 Feb 2024
Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich
  Reasoning
Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning
Hanqi Yan
Qinglin Zhu
Xinyu Wang
Lin Gui
Yulan He
LRMLLMAG
61
7
0
22 Feb 2024
RelayAttention for Efficient Large Language Model Serving with Long
  System Prompts
RelayAttention for Efficient Large Language Model Serving with Long System Prompts
Lei Zhu
Xinjiang Wang
Wayne Zhang
Rynson W. H. Lau
88
8
0
22 Feb 2024
Watermarking Makes Language Models Radioactive
Watermarking Makes Language Models Radioactive
Tom Sander
Pierre Fernandez
Alain Durmus
Matthijs Douze
Teddy Furon
WaLM
82
20
0
22 Feb 2024
MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language
  Models in Multi-Turn Dialogues
MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
Ge Bai
Jie Liu
Xingyuan Bu
Yancheng He
Jiaheng Liu
...
Zhuoran Lin
Wenbo Su
Tiezheng Ge
Bo Zheng
Wanli Ouyang
ELMLM&MA
125
94
0
22 Feb 2024
Chain-of-Thought Unfaithfulness as Disguised Accuracy
Chain-of-Thought Unfaithfulness as Disguised Accuracy
Oliver Bentham
Nathan Stringham
Ana Marasović
LRMHILM
90
16
0
22 Feb 2024
Unveiling Linguistic Regions in Large Language Models
Unveiling Linguistic Regions in Large Language Models
Zhihao Zhang
Jun Zhao
Qi Zhang
Tao Gui
Xuanjing Huang
104
13
0
22 Feb 2024
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring
  Mathematical Reasoning of Large Language Models
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models
Yanan Wu
Jie Liu
Xingyuan Bu
Jiaheng Liu
Zhanhui Zhou
...
Haibin Chen
Tiezheng Ge
Wanli Ouyang
Wenbo Su
Bo Zheng
LRM
107
6
0
22 Feb 2024
Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt
  Politeness on LLM Performance
Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance
Ziqi Yin
Hao Wang
Kaito Horio
Daisuke Kawahara
Satoshi Sekine
116
29
0
22 Feb 2024
Balanced Data Sampling for Language Model Training with Clustering
Balanced Data Sampling for Language Model Training with Clustering
Yunfan Shao
Linyang Li
Zhaoye Fei
Hang Yan
Dahua Lin
Xipeng Qiu
95
12
0
22 Feb 2024
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in
  Instruction-Tuned Language Models
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models
Xinpeng Wang
Bolei Ma
Chengzhi Hu
Leon Weber-Genzel
Paul Röttger
Frauke Kreuter
Dirk Hovy
Barbara Plank
83
46
0
22 Feb 2024
Uncertainty-Aware Evaluation for Vision-Language Models
Uncertainty-Aware Evaluation for Vision-Language Models
Vasily Kostumov
Bulat Nutfullin
Oleg Pilipenko
Eugene Ilyushin
ELM
208
9
0
22 Feb 2024
Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning
Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning
Shen Li
Liuyi Yao
Jinyang Gao
Lan Zhang
Yaliang Li
123
13
0
22 Feb 2024
Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
  Improves LLM Generalization
Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization
Xuxi Chen
Zhendong Wang
Daouda Sow
Junjie Yang
Tianlong Chen
Yingbin Liang
Mingyuan Zhou
Zhangyang Wang
88
7
0
22 Feb 2024
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Anisha Agarwal
Aaron Chan
Shubham Chandel
Jinu Jang
Shaun Miller
Roshanak Zilouchian Moghaddam
Yevhen Mohylevskyy
Neel Sundaresan
Michele Tufano
ELM
59
17
0
22 Feb 2024
BIRCO: A Benchmark of Information Retrieval Tasks with Complex
  Objectives
BIRCO: A Benchmark of Information Retrieval Tasks with Complex Objectives
Xiaoyue Wang
Jianyou Wang
Weili Cao
Kaicheng Wang
R. Paturi
Leon Bergen
110
7
0
21 Feb 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with
  Olympiad-Level Bilingual Multimodal Scientific Problems
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELMAIMat
165
282
0
21 Feb 2024
Towards Building Multilingual Language Model for Medicine
Towards Building Multilingual Language Model for Medicine
Pengcheng Qiu
Chaoyi Wu
Xiaoman Zhang
Weixiong Lin
Haicheng Wang
Ya Zhang
Yanfeng Wang
Weidi Xie
LM&MAELM
121
90
0
21 Feb 2024
Distillation Contrastive Decoding: Improving LLMs Reasoning with
  Contrastive Decoding and Distillation
Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation
Phuc Phan
Hieu Tran
Long Phan
48
9
0
21 Feb 2024
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large
  Language Models
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models
Chenyang Lyu
Minghao Wu
Alham Fikri Aji
ELM
66
14
0
21 Feb 2024
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Yiran Ding
Li Lyna Zhang
Chengruidong Zhang
Yuanyuan Xu
Ning Shang
Jiahang Xu
Fan Yang
Mao Yang
RALM
102
164
0
21 Feb 2024
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning
Zhaorui Yang
Tianyu Pang
Hao Feng
Han Wang
Wei Chen
Minfeng Zhu
Qian Liu
ALM
99
50
0
21 Feb 2024
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common
  Knowledge
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge
Jiyoung Lee
Minwoo Kim
Seungho Kim
Junghwan Kim
Seunghyun Won
Hwaran Lee
Edward Choi
ALM
127
17
0
21 Feb 2024
LongWanjuan: Towards Systematic Measurement for Long Text Quality
LongWanjuan: Towards Systematic Measurement for Long Text Quality
Kai Lv
Xiaoran Liu
Qipeng Guo
Hang Yan
Conghui He
Xipeng Qiu
Dahua Lin
61
4
0
21 Feb 2024
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Kaijie Zhu
Jindong Wang
Qinlin Zhao
Ruochen Xu
Xing Xie
108
42
0
21 Feb 2024
ARL2: Aligning Retrievers for Black-box Large Language Models via
  Self-guided Adaptive Relevance Labeling
ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling
Lingxi Zhang
Yue Yu
Kuan-Chieh Wang
Chao Zhang
VLMRALM
62
5
0
21 Feb 2024
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large
  Language Models
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models
Yang Liu
Meng Xu
Shuo Wang
Liner Yang
Haoyu Wang
...
Cunliang Kong
Yun-Nung Chen
Yang Liu
Maosong Sun
Erhong Yang
ELMLRM
97
1
0
21 Feb 2024
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity
  within Large Language Models
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
...
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun
157
32
0
21 Feb 2024
Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks
Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks
Minju Seo
Jinheon Baek
James Thorne
Sung Ju Hwang
RALM
69
11
0
21 Feb 2024
Benchmarking Retrieval-Augmented Generation for Medicine
Benchmarking Retrieval-Augmented Generation for Medicine
Guangzhi Xiong
Qiao Jin
Zhiyong Lu
Aidong Zhang
RALM
149
199
0
20 Feb 2024
TreeEval: Benchmark-Free Evaluation of Large Language Models through
  Tree Planning
TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning
Xiang Li
Yunshi Lan
Chao Yang
ELM
63
11
0
20 Feb 2024
Towards an empirical understanding of MoE design choices
Towards an empirical understanding of MoE design choices
Dongyang Fan
Bettina Messmer
Martin Jaggi
71
10
0
20 Feb 2024
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for
  Language Models
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Haoran Li
Qingxiu Dong
Zhengyang Tang
Chaojun Wang
Xingxing Zhang
...
Wei Lu
Zhifang Sui
Benyou Wang
Wai Lam
Furu Wei
SyDa
104
63
0
20 Feb 2024
Stable Knowledge Editing in Large Language Models
Stable Knowledge Editing in Large Language Models
Zihao Wei
Liang Pang
Hanxing Ding
Jingcheng Deng
Huawei Shen
Xueqi Cheng
KELM
119
10
0
20 Feb 2024
Previous
123...505152...676869
Next