ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
    ELM
    RALM
ArXivPDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 956 papers shown
Title
Evaluating Large Language Models with Tests of Spanish as a Foreign
  Language: Pass or Fail?
Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?
Marina Mayor-Rocher
Nina Melero
Elena Merino-Gómez
María Grandury
Javier Conde
Pedro Reviriego
ELM
27
1
0
08 Sep 2024
Residual Stream Analysis with Multi-Layer SAEs
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson
Lucy Farnik
Conor Houghton
Laurence Aitchison
39
3
0
06 Sep 2024
Understanding LLM Development Through Longitudinal Study: Insights from the Open Ko-LLM Leaderboard
Understanding LLM Development Through Longitudinal Study: Insights from the Open Ko-LLM Leaderboard
Chanjun Park
Hyeonwoo Kim
LRM
66
1
0
05 Sep 2024
From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning
From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning
Wei Chen
Zhen Huang
Liang Xie
Binbin Lin
Houqiang Li
...
Deng Cai
Yonggang Zhang
Wenxiao Wang
Xu Shen
Jieping Ye
57
6
0
03 Sep 2024
Hyper-Compression: Model Compression via Hyperfunction
Hyper-Compression: Model Compression via Hyperfunction
Fenglei Fan
Juntong Fan
Dayang Wang
Jingbo Zhang
Zelin Dong
Shijun Zhang
Ge Wang
Tieyong Zeng
34
0
0
01 Sep 2024
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Jasper Dekoninck
Maximilian Baader
Martin Vechev
ALM
92
0
0
01 Sep 2024
Does Alignment Tuning Really Break LLMs' Internal Confidence?
Does Alignment Tuning Really Break LLMs' Internal Confidence?
Hongseok Oh
Wonseok Hwang
49
0
0
31 Aug 2024
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models
Jonathan Bourne
62
4
0
30 Aug 2024
An Investigation of Warning Erroneous Chat Translations in Cross-lingual
  Communication
An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication
Yunmeng Li
Jun Suzuki
Makoto Morishita
Kaori Abe
Kentaro Inui
65
1
0
28 Aug 2024
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
Bin Wang
Chunyu Xie
Dawei Leng
Yuhui Yin
MLLM
54
1
0
23 Aug 2024
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
Kunsheng Tang
Wenbo Zhou
Jie Zhang
Aishan Liu
Gelei Deng
Shuai Li
Peigui Qi
Weiming Zhang
Tianwei Zhang
Nenghai Yu
50
3
0
22 Aug 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
71
9
0
19 Aug 2024
How to Make the Most of LLMs' Grammatical Knowledge for Acceptability Judgments
How to Make the Most of LLMs' Grammatical Knowledge for Acceptability Judgments
Yusuke Ide
Yuto Nishida
Miyu Oba
Miyu Oba
Justin Vasselli
Hidetaka Kamigaito
Taro Watanabe
46
2
0
19 Aug 2024
CogLM: Tracking Cognitive Development of Large Language Models
CogLM: Tracking Cognitive Development of Large Language Models
Xinglin Wang
Peiwen Yuan
Shaoxiong Feng
Yiwei Li
Boyuan Pan
Heda Wang
Yao Hu
Kan Li
ELM
67
0
0
17 Aug 2024
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models
Kaushal Kumar Maurya
KV Aditya Srivatsa
Ekaterina Kochmar
40
2
0
16 Aug 2024
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs
Do Xuan Long
Hai Nguyen Ngoc
Tiviatis Sim
Hieu Dao
Shafiq Joty
Kenji Kawaguchi
Nancy F. Chen
Min-Yen Kan
34
8
0
16 Aug 2024
ArabLegalEval: A Multitask Benchmark for Assessing Arabic Legal
  Knowledge in Large Language Models
ArabLegalEval: A Multitask Benchmark for Assessing Arabic Legal Knowledge in Large Language Models
Faris Hijazi
Somayah Alharbi
Abdulaziz AlHussein
Harethah Shairah
Reem Alzahrani
Hebah Alshamlan
Omar Knio
G. Turkiyyah
AILaw
ELM
60
2
0
15 Aug 2024
Automated Design of Agentic Systems
Automated Design of Agentic Systems
Shengran Hu
Cong Lu
Jeff Clune
AI4CE
47
41
0
15 Aug 2024
Layerwise Recurrent Router for Mixture-of-Experts
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
81
2
0
13 Aug 2024
MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty
MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty
Yongjin Yang
Haneul Yoo
Hwaran Lee
68
1
0
13 Aug 2024
The advantages of context specific language models: the case of the Erasmian Language Model
The advantages of context specific language models: the case of the Erasmian Language Model
João Gonçalves
Nick Jelicic
Michele Murgia
Evert Stamhuis
41
0
0
13 Aug 2024
Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models
Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models
Zikai Xie
HILM
LRM
61
5
0
09 Aug 2024
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language
  Models via Weight Disentanglement
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement
Le Yu
Bowen Yu
Haiyang Yu
Fei Huang
Yongbin Li
MoMe
35
5
0
06 Aug 2024
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
Leo Micklem
Yan-Bin Shen
Wenjing Luo
Yan Zhang
Hao Liang
...
Weipeng Chen
Bin Cui
Blair Thornton
Wentao Zhang
Zenan Zhou
ELM
84
16
0
02 Aug 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa
Bhrugu Bharathi
Long Phan
Andy Zhou
Alice Gatti
...
Andy Zou
Dawn Song
Bo Li
Dan Hendrycks
Mantas Mazeika
AAML
MU
64
45
0
01 Aug 2024
Meltemi: The first open Large Language Model for Greek
Meltemi: The first open Large Language Model for Greek
Leon Voukoutis
Dimitris Roussis
Georgios Paraskevopoulos
Sokratis Sofianopoulos
Prokopis Prokopidis
Vassilis Papavasileiou
Athanasios Katsamanis
Stelios Piperidis
Vassilis Katsouros
VLM
41
8
0
30 Jul 2024
Automated Review Generation Method Based on Large Language Models
Automated Review Generation Method Based on Large Language Models
Shican Wu
Xiao Ma
Dehui Luo
Lulu Li
Xiangcheng Shi
...
Ran Luo
Chunlei Pei
Zhijian Zhao
Zhi-Jian Zhao
Jinlong Gong
77
0
0
30 Jul 2024
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
Yupeng Chen
Senmiao Wang
Zhihang Lin
Zhihang Lin
Yushun Zhang
Tian Ding
Ruoyu Sun
Ruoyu Sun
CLL
85
3
0
30 Jul 2024
Beyond Metrics: A Critical Analysis of the Variability in Large Language
  Model Evaluation Frameworks
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks
Marco AF Pimentel
Clément Christophe
Tathagata Raha
Prateek Munjal
Praveen K Kanithi
Shadab Khan
ELM
42
2
0
29 Jul 2024
Effective Large Language Model Debugging with Best-first Tree Search
Effective Large Language Model Debugging with Best-first Tree Search
Jialin Song
Jonathan Raiman
Bryan Catanzaro
LRM
51
0
0
26 Jul 2024
A deeper look at depth pruning of LLMs
A deeper look at depth pruning of LLMs
Shoaib Ahmed Siddiqui
Xin Dong
Greg Heinrich
Thomas Breuel
Jan Kautz
David M. Krueger
Pavlo Molchanov
42
7
0
23 Jul 2024
ZZU-NLP at SIGHAN-2024 dimABSA Task: Aspect-Based Sentiment Analysis
  with Coarse-to-Fine In-context Learning
ZZU-NLP at SIGHAN-2024 dimABSA Task: Aspect-Based Sentiment Analysis with Coarse-to-Fine In-context Learning
Senbin Zhu
Hanjie Zhao
Xingren Wang
Shanhong Liu
Yuxiang Jia
Hongying Zan
51
1
0
22 Jul 2024
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao
Jie Ou
Lei Wang
Fanhua Shang
Jaji Wu
MQ
68
0
0
22 Jul 2024
Compact Language Models via Pruning and Knowledge Distillation
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan
Sharath Turuvekere Sreenivas
Raviraj Joshi
Marcin Chochowski
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
SyDa
MQ
46
38
0
19 Jul 2024
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Chenze Shao
Fandong Meng
Jie Zhou
53
1
0
17 Jul 2024
LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Jung Hyun Lee
Jeonghoon Kim
J. Yang
S. Kwon
Eunho Yang
Kang Min Yoo
Dongsoo Lee
MQ
36
2
0
16 Jul 2024
MSEval: A Dataset for Material Selection in Conceptual Design to
  Evaluate Algorithmic Models
MSEval: A Dataset for Material Selection in Conceptual Design to Evaluate Algorithmic Models
Yash Jain
Daniele Grandi
Allin Groom
Brandon Cramer
Christopher McComb
44
0
0
12 Jul 2024
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing
Huanqian Wang
Yang Yue
Rui Lu
Jingxin Shi
Andrew Zhao
Shenzhi Wang
Shiji Song
Gao Huang
LM&Ro
KELM
55
6
0
11 Jul 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Yonghong Tian
Wenqi Shao
Peng Xu
Jiahao Wang
Peng Gao
Kaipeng Zhang
Ping Luo
MQ
60
26
0
10 Jul 2024
Training on the Test Task Confounds Evaluation and Emergence
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo
Florian E. Dorner
Moritz Hardt
ELM
71
7
1
10 Jul 2024
Composable Interventions for Language Models
Composable Interventions for Language Models
Arinbjorn Kolbeinsson
Kyle O'Brien
Tianjin Huang
Shanghua Gao
Shiwei Liu
...
Anurag J. Vaidya
Faisal Mahmood
Marinka Zitnik
Tianlong Chen
Thomas Hartvigsen
KELM
MU
89
4
0
09 Jul 2024
On Speeding Up Language Model Evaluation
On Speeding Up Language Model Evaluation
Jin Peng Zhou
Christian K. Belardi
Ruihan Wu
Travis Zhang
Carla P. Gomes
Wen Sun
Kilian Q. Weinberger
58
1
0
08 Jul 2024
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM
  Compression
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression
Zhichao Xu
Ashim Gupta
Tao Li
Oliver Bentham
Vivek Srikumar
54
8
0
06 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language
  Models: Challenges, Limitations, and Recommendations
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq Joty
Jimmy Huang
ELM
ALM
33
28
0
04 Jul 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
67
1
0
04 Jul 2024
How Does Quantization Affect Multilingual LLMs?
How Does Quantization Affect Multilingual LLMs?
Kelly Marchisio
Saurabh Dash
Hongyu Chen
Dennis Aumiller
Ahmet Üstün
Sara Hooker
Sebastian Ruder
MQ
57
9
0
03 Jul 2024
GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning
GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning
Hasna Chouikhi
Manel Aloui
Cyrine Ben Hammou
Ghaith Chaabane
Haithem Kchaou
Chehir Dhaouadi
44
0
0
02 Jul 2024
To Forget or Not? Towards Practical Knowledge Unlearning for Large
  Language Models
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
Bozhong Tian
Xiaozhuan Liang
Siyuan Cheng
Qingbin Liu
Mengru Wang
Dianbo Sui
Xi Chen
Huajun Chen
Ningyu Zhang
MU
37
6
0
02 Jul 2024
Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior
Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior
Pedro Henrique Luz de Araujo
Benjamin Roth
46
3
0
02 Jul 2024
AI Agents That Matter
AI Agents That Matter
Sayash Kapoor
Benedikt Stroebl
Zachary S. Siegel
Nitya Nadgir
Arvind Narayanan
62
37
0
01 Jul 2024
Previous
123...101112...181920
Next