ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Joshua Harris
Fan Grayson
Felix Feldman
Timothy Laurence
Toby Nonnenmacher
...
Leo Loman
Selina Patel
Thomas Finnie
Samuel Collins
Michael Borowitz
AI4MHLM&MAELM
141
0
0
09 May 2025
Full-Parameter Continual Pretraining of Gemma2: Insights into Fluency and Domain Knowledge
Full-Parameter Continual Pretraining of Gemma2: Insights into Fluency and Domain Knowledge
Vytenis Šliogeris
Povilas Daniušis
Arturas Nakvosas
CLL
81
0
0
09 May 2025
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Stefan Vasilev
Christian Herold
Baohao Liao
Seyyed Hadi Hashemi
Shahram Khadivi
Christof Monz
MU
490
0
0
09 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
Junxuan Zhang
Jue Wang
Yanjie Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
156
0
0
09 May 2025
KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification
KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification
Qianbo Zang
Christophe Zgrzendek
Igor Tchappi
Afshin Khadangi
Johannes Sedlmeir
VLM
84
0
0
08 May 2025
Reasoning Models Don't Always Say What They Think
Reasoning Models Don't Always Say What They Think
Yanda Chen
Joe Benton
Ansh Radhakrishnan
Jonathan Uesato
Carson E. Denison
...
Vlad Mikulik
Samuel R. Bowman
Jan Leike
Jared Kaplan
E. Perez
ReLMLRM
166
51
1
08 May 2025
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data
Yun Wang
Z. Fu
Jie Cai
Peijun Tang
Hongya Lyu
...
Jie Zhou
Guoyang Zeng
Chaojun Xiao
Xu Han
Zhiyuan Liu
129
1
0
08 May 2025
Crosslingual Reasoning through Test-Time Scaling
Crosslingual Reasoning through Test-Time Scaling
Zheng-Xin Yong
Muhammad Farid Adilazuarda
Jonibek Mansurov
Ruochen Zhang
Niklas Muennighoff
Carsten Eickhoff
Genta Indra Winata
Julia Kreutzer
Stephen H. Bach
Alham Fikri Aji
LRMELM
452
9
0
08 May 2025
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
Zhuocheng Gong
Jian Guan
Wei Wu
Huishuai Zhang
Dongyan Zhao
102
1
0
08 May 2025
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
Lennart Luettgau
Harry Coppock
Magda Dubois
Christopher Summerfield
Cozmin Ududec
107
0
0
08 May 2025
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
Kalyan Nakka
Jimmy Dani
Ausmit Mondal
Nitesh Saxena
AAML
69
0
0
08 May 2025
Advancing and Benchmarking Personalized Tool Invocation for LLMs
Advancing and Benchmarking Personalized Tool Invocation for LLMs
Xiaolin Huang
Yuefeng Huang
Wen Liu
Xingshan Zeng
Yijiao Wang
Ruiming Tang
Hong Xie
Defu Lian
94
0
0
07 May 2025
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Yehui Tang
Yichun Yin
Yaoyuan Wang
Hang Zhou
Yu Pan
...
Zhe Liu
Zhicheng Liu
Zhuowen Tu
Zilin Ding
Zongyuan Zhan
MoE
113
2
0
07 May 2025
Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters
Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters
David Exler
Mark Schutera
Markus Reischl
Luca Rettenberger
96
1
0
07 May 2025
A Reasoning-Focused Legal Retrieval Benchmark
A Reasoning-Focused Legal Retrieval Benchmark
Lucia Zheng
Neel Guha
Javokhir Arifov
Sarah Zhang
Michal Skreta
Christopher D. Manning
Peter Henderson
Daniel E. Ho
AILawRALMELM
201
5
0
06 May 2025
The Steganographic Potentials of Language Models
The Steganographic Potentials of Language Models
Artem Karpov
Tinuade Adeleke
Seong Hah Cho
Natalia Perez-Campanero
71
0
0
06 May 2025
am-ELO: A Stable Framework for Arena-based LLM Evaluation
am-ELO: A Stable Framework for Arena-based LLM Evaluation
Zirui Liu
Jiatong Li
Yan Zhuang
Qiang Liu
Shuanghong Shen
Jie Ouyang
Mingyue Cheng
Shijin Wang
186
1
0
06 May 2025
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
Tianjian Li
Daniel Khashabi
135
0
0
05 May 2025
Quantitative Analysis of Performance Drop in DeepSeek Model Quantization
Quantitative Analysis of Performance Drop in DeepSeek Model Quantization
Enbo Zhao
Yi Shen
Shuming Shi
Jieyun Huang
Z. Chen
Rongjia Du
Siqi Xiao
Jing Zhang
Ning Wang
Shiguo Lian
MQ
151
0
0
05 May 2025
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLMVLM
98
0
0
05 May 2025
Improving Model Alignment Through Collective Intelligence of Open-Source LLMS
Improving Model Alignment Through Collective Intelligence of Open-Source LLMS
Junlin Wang
Roy Xie
Shang Zhu
Jue Wang
Ben Athiwaratkun
Bhuwan Dhingra
Shuaiwen Leon Song
Ce Zhang
James Zou
ALM
73
0
0
05 May 2025
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Kazuki Fujii
Yukito Tajima
Sakae Mizuki
Hinari Shimada
Taihei Shiotani
...
Kakeru Hattori
Youmi Ma
Hiroya Takamura
Rio Yokota
Naoaki Okazaki
SyDa
158
1
0
05 May 2025
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
Dmitriy Shopkhoev
Ammar Ali
Magauiya Zhussip
Valentin Malykh
Stamatios Lefkimmiatis
N. Komodakis
Sergey Zagoruyko
VLM
506
0
0
05 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELMLRM
129
4
0
04 May 2025
An Empirical Study of Qwen3 Quantization
An Empirical Study of Qwen3 Quantization
Xingyu Zheng
Yuye Li
Haoran Chu
Yue Feng
Xudong Ma
Jie Luo
Jinyang Guo
Haotong Qin
Michele Magno
Xianglong Liu
MQ
84
6
0
04 May 2025
Measuring Hong Kong Massive Multi-Task Language Understanding
Measuring Hong Kong Massive Multi-Task Language Understanding
Chuxue Cao
Zhenghao Zhu
Junqi Zhu
Guoying Lu
Siyu Peng
Juntao Dai
Weijie Shi
Sirui Han
Yike Guo
ELM
453
0
0
04 May 2025
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
Yezhen Wang
Zhouhao Yang
Brian K Chen
Fanyi Pu
Yue Liu
Tianyu Gao
Kenji Kawaguchi
89
0
0
03 May 2025
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
Xing Hu
Zhixuan Chen
Dawei Yang
Zukang Xu
Chen Xu
Zhihang Yuan
Sifan Zhou
Jiangyong Yu
MoEMQ
112
2
0
02 May 2025
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Liaoyaqi Wang
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
155
1
0
02 May 2025
Subset Selection for Fine-Tuning: A Utility-Diversity Balanced Approach for Mathematical Domain Adaptation
Subset Selection for Fine-Tuning: A Utility-Diversity Balanced Approach for Mathematical Domain Adaptation
Madhav Kotecha
Vijendra Kumar Vaishya
Smita Gautam
Suraj Racha
123
0
0
02 May 2025
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
G. Wang
Zhiwen Chen
Bo Li
Haifeng Xu
434
1
0
02 May 2025
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth
Changhai Zhou
Yuhua Zhou
Qian Qiao
Weizhong Zhang
Cheng Jin
Cheng Jin
MQ
70
1
0
02 May 2025
Ensuring Reproducibility in Generative AI Systems for General Use Cases: A Framework for Regression Testing and Open Datasets
Ensuring Reproducibility in Generative AI Systems for General Use Cases: A Framework for Regression Testing and Open Datasets
Masumi Morishige
Ryo Koshihara
ALM
52
0
0
02 May 2025
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Yeseong Kim
119
0
0
01 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Kai Zhang
Lizhuang Ma
Jiangming Wang
Jun Wang
Weinan Zhang
Wei Zhang
MQ
82
0
0
01 May 2025
Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation
Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation
D. Sculley
Will Cukierski
Phil Culliton
Sohier Dane
Maggie Demkin
...
Addison Howard
Paul Mooney
Walter Reade
Megan Risdal
Nate Keating
132
2
0
01 May 2025
Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA
Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA
Xuanzhao Dong
Wenhui Zhu
Hao Wang
Xiwen Chen
Peijie Qiu
Rui Yin
Yi Su
Yucheng Wang
RALMMedIm
79
0
0
30 Apr 2025
MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness
MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness
Junsheng Huang
Zhitao He
Sandeep Polisetty
Q. Wang
May Fung
KELM
110
2
0
30 Apr 2025
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
Jingyang Yi
Jiazheng Wang
Sida Li
ReLMOODDLRM
430
8
0
30 Apr 2025
Phi-4-reasoning Technical Report
Phi-4-reasoning Technical Report
Marah Abdin
Sahaj Agarwal
Ahmed Hassan Awadallah
Vidhisha Balachandran
Harkirat Singh Behl
...
Vaishnavi Shrivastava
Vibhav Vineet
Yue Wu
Safoora Yousefi
Guoqing Zheng
ReLMLRM
223
15
0
30 Apr 2025
COSMOS: Predictable and Cost-Effective Adaptation of LLMs
COSMOS: Predictable and Cost-Effective Adaptation of LLMs
Jiayu Wang
Aws Albarghouthi
Frederic Sala
97
0
0
30 Apr 2025
Combatting Dimensional Collapse in LLM Pre-Training Data via Diversified File Selection
Combatting Dimensional Collapse in LLM Pre-Training Data via Diversified File Selection
Ziqing Fan
Siyuan Du
Shengchao Hu
Pingjie Wang
Li Shen
Yanzhe Zhang
Dacheng Tao
Yucheng Wang
94
2
0
29 Apr 2025
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
Woongyeong Yeo
Kangsan Kim
Soyeong Jeong
Jinheon Baek
Sung Ju Hwang
150
1
0
29 Apr 2025
Computational Reasoning of Large Language Models
Computational Reasoning of Large Language Models
Haitao Wu
Zongbo Han
Joey Tianyi Zhou
Huaxi Huang
Changqing Zhang
ELMLRM
102
0
0
29 Apr 2025
YoChameleon: Personalized Vision and Language Generation
YoChameleon: Personalized Vision and Language Generation
Thao Nguyen
Krishna Kumar Singh
Jing Shi
Trung H. Bui
Yong Jae Lee
Yuheng Li
MLLM
190
1
0
29 Apr 2025
Model Connectomes: A Generational Approach to Data-Efficient Language Models
Model Connectomes: A Generational Approach to Data-Efficient Language Models
Klemen Kotar
Greta Tuckute
168
0
0
29 Apr 2025
Automatic Legal Writing Evaluation of LLMs
Automatic Legal Writing Evaluation of LLMs
Ramon Pires
Roseval Malaquias Junior
Rodrigo Nogueira
AILawELM
128
1
0
29 Apr 2025
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
Takuya Tamura
Taro Yano
Masafumi Enomoto
Masafumi Oyamada
82
0
0
28 Apr 2025
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
Junxuan Zhang
Flood Sung
Zhiyong Yang
Yang Gao
Chongjie Zhang
LLMAG
132
0
0
28 Apr 2025
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report
Paul Kassianik
Baturay Saglam
Alexander Chen
Blaine Nelson
Anu Vellore
...
Hyrum Anderson
Kojin Oshiba
Omar Santos
Yaron Singer
Amin Karbasi
PILM
89
2
0
28 Apr 2025
Previous
123...91011...676869
Next