ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
    ELM
    RALM
ArXivPDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 875 papers shown
Title
am-ELO: A Stable Framework for Arena-based LLM Evaluation
am-ELO: A Stable Framework for Arena-based LLM Evaluation
Zirui Liu
Jiatong Li
Yan Zhuang
Qiang Liu
Shuanghong Shen
Jie Ouyang
Mingyue Cheng
Shijin Wang
44
1
0
06 May 2025
The Steganographic Potentials of Language Models
The Steganographic Potentials of Language Models
Artem Karpov
Tinuade Adeleke
Seong Hah Cho
Natalia Perez-Campanero
37
0
0
06 May 2025
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Kazuki Fujii
Yukito Tajima
Sakae Mizuki
Hinari Shimada
Taihei Shiotani
...
Kakeru Hattori
Youmi Ma
Hiroya Takamura
Rio Yokota
Naoaki Okazaki
SyDa
54
0
0
05 May 2025
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLM
VLM
50
0
0
05 May 2025
Improving Model Alignment Through Collective Intelligence of Open-Source LLMS
Improving Model Alignment Through Collective Intelligence of Open-Source LLMS
Junlin Wang
Roy Xie
Shang Zhu
Jue Wang
Ben Athiwaratkun
Bhuwan Dhingra
Shuaiwen Leon Song
Ce Zhang
James Zou
ALM
38
0
0
05 May 2025
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
Tianjian Li
Daniel Khashabi
60
0
0
05 May 2025
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations
Dmitriy Shopkhoev
Ammar Ali
Magauiya Zhussip
Valentin Malykh
Stamatios Lefkimmiatis
N. Komodakis
Sergey Zagoruyko
VLM
184
0
0
05 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
60
2
0
04 May 2025
An Empirical Study of Qwen3 Quantization
An Empirical Study of Qwen3 Quantization
Xingyu Zheng
Yuye Li
Haoran Chu
Yue Feng
Xudong Ma
Jie Luo
Jinyang Guo
Haotong Qin
Michele Magno
Xianglong Liu
MQ
29
0
0
04 May 2025
Measuring Hong Kong Massive Multi-Task Language Understanding
Measuring Hong Kong Massive Multi-Task Language Understanding
Chuxue Cao
Zhenghao Zhu
Junqi Zhu
Guoying Lu
Siyu Peng
Juntao Dai
Weijie Shi
Sirui Han
Yike Guo
ELM
184
0
0
04 May 2025
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
Yezhen Wang
Zhouhao Yang
Brian K Chen
Fanyi Pu
Bo-wen Li
Tianyu Gao
Kenji Kawaguchi
46
0
0
03 May 2025
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
Xing Hu
Zhixuan Chen
Dawei Yang
Zukang Xu
Chen Xu
Zhihang Yuan
Sifan Zhou
Jiangyong Yu
MoE
MQ
44
0
0
02 May 2025
Subset Selection for Fine-Tuning: A Utility-Diversity Balanced Approach for Mathematical Domain Adaptation
Subset Selection for Fine-Tuning: A Utility-Diversity Balanced Approach for Mathematical Domain Adaptation
Madhav Kotecha
Vijendra Kumar Vaishya
Smita Gautam
Suraj Racha
37
0
0
02 May 2025
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth
Changhai Zhou
Yuhua Zhou
Qian Qiao
Weizhong Zhang
Cheng Jin
MQ
27
0
0
02 May 2025
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Liaoyaqi Wang
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
61
0
0
02 May 2025
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
G. Wang
Z. Chen
Bo Li
Haifeng Xu
167
0
0
02 May 2025
Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation
Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation
D. Sculley
Will Cukierski
Phil Culliton
Sohier Dane
Maggie Demkin
...
Addison Howard
Paul Mooney
Walter Reade
Megan Risdal
Nate Keating
31
0
0
01 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Kaipeng Zhang
Lizhuang Ma
Yufei Guo
Jun Wang
Weixi Zhang
MQ
57
0
0
01 May 2025
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Yeseong Kim
50
0
0
01 May 2025
Phi-4-reasoning Technical Report
Phi-4-reasoning Technical Report
Marah Abdin
Sahaj Agarwal
Ahmed Hassan Awadallah
Vidhisha Balachandran
Harkirat Singh Behl
...
Vaishnavi Shrivastava
Vibhav Vineet
Yue Wu
Safoora Yousefi
Guoqing Zheng
ReLM
LRM
90
1
0
30 Apr 2025
MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness
MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness
Junsheng Huang
Zhitao He
Sandeep Polisetty
Q. Wang
May Fung
KELM
47
0
0
30 Apr 2025
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
Jingyang Yi
Jiazheng Wang
Sida Li
ReLM
OODD
LRM
180
2
0
30 Apr 2025
COSMOS: Predictable and Cost-Effective Adaptation of LLMs
COSMOS: Predictable and Cost-Effective Adaptation of LLMs
Jiayu Wang
Aws Albarghouthi
Frederic Sala
57
0
0
30 Apr 2025
Automatic Legal Writing Evaluation of LLMs
Automatic Legal Writing Evaluation of LLMs
Ramon Pires
Roseval Malaquias Junior
Rodrigo Nogueira
AILaw
ELM
83
0
0
29 Apr 2025
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
Woongyeong Yeo
Kangsan Kim
Soyeong Jeong
Jinheon Baek
Sung Ju Hwang
54
1
0
29 Apr 2025
Model Connectomes: A Generational Approach to Data-Efficient Language Models
Model Connectomes: A Generational Approach to Data-Efficient Language Models
Klemen Kotar
Greta Tuckute
54
0
0
29 Apr 2025
Computational Reasoning of Large Language Models
Computational Reasoning of Large Language Models
Haitao Wu
Zongbo Han
Joey Tianyi Zhou
Huaxi Huang
Changqing Zhang
ELM
LRM
62
0
0
29 Apr 2025
YoChameleon: Personalized Vision and Language Generation
YoChameleon: Personalized Vision and Language Generation
Thao Nguyen
Krishna Kumar Singh
Jing Shi
Trung H. Bui
Yong Jae Lee
Yuheng Li
MLLM
82
0
0
29 Apr 2025
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report
Paul Kassianik
Baturay Saglam
Alexander Chen
Blaine Nelson
Anu Vellore
...
Hyrum Anderson
Kojin Oshiba
Omar Santos
Yaron Singer
Amin Karbasi
PILM
66
0
0
28 Apr 2025
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Athinagoras Skiadopoulos
Mark Zhao
Swapnil Gandhi
Thomas Norrie
Shrijeet Mukherjee
Christos Kozyrakis
MoE
91
0
0
28 Apr 2025
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
Takuya Tamura
Taro Yano
Masafumi Enomoto
Masafumi Oyamada
50
0
0
28 Apr 2025
Bi-directional Model Cascading with Proxy Confidence
Bi-directional Model Cascading with Proxy Confidence
David Warren
Mark Dras
49
0
0
27 Apr 2025
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
Nan Lu
Ethan X. Fang
Junwei Lu
185
0
0
27 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
91
2
0
26 Apr 2025
When2Call: When (not) to Call Tools
When2Call: When (not) to Call Tools
Hayley Ross
Ameya Sunil Mahabaleshwarkar
Yoshi Suhara
95
0
0
26 Apr 2025
Stabilizing Reasoning in Medical LLMs with Continued Pretraining and Reasoning Preference Optimization
Stabilizing Reasoning in Medical LLMs with Continued Pretraining and Reasoning Preference Optimization
Wataru Kawakami
Keita Suzuki
Junichiro Iwasawa
LRM
75
0
0
25 Apr 2025
Efficient Single-Pass Training for Multi-Turn Reasoning
Efficient Single-Pass Training for Multi-Turn Reasoning
Ritesh Goru
Shanay Mehta
Prateek Jain
LRM
32
0
0
25 Apr 2025
FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation
FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation
Yulia Otmakhova
Hung Thinh Truong
Rahmad Mahendra
Zenan Zhai
Rongxin Zhu
Daniel Beck
Jey Han Lau
ELM
72
0
0
24 Apr 2025
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining
Fengze Liu
Weidong Zhou
Binbin Liu
Zhimiao Yu
Yifan Zhang
...
Yifeng Yu
Bingni Zhang
Xiaohuan Zhou
Taifeng Wang
Yong Cao
66
1
0
23 Apr 2025
UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models
UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models
Yu Zheng
Longyi Liu
Yuming Lin
Jie Feng
Guozhen Zhang
Depeng Jin
Yong Li
ELM
78
0
0
23 Apr 2025
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
ParamΔΔΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
85
0
0
23 Apr 2025
The Rise of Small Language Models in Healthcare: A Comprehensive Survey
The Rise of Small Language Models in Healthcare: A Comprehensive Survey
Muskan Garg
Shaina Raza
Shebuti Rayana
Xingyi Liu
Sunghwan Sohn
LM&MA
AILaw
92
0
0
23 Apr 2025
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Yuxin Jiang
Yijiao Wang
Chuhan Wu
Xinyi Dai
Yan Xu
...
Yucheng Wang
Xin Jiang
Lifeng Shang
R. Tang
Luu Anh Tuan
36
0
0
22 Apr 2025
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Shi Qiu
Shaoyang Guo
Zhuo-Yang Song
Yizhou Sun
Zeyu Cai
...
Ming-xing Luo
Muhan Zhang
Yaodong Yang
Muhan Zhang
Hua Xing Zhu
AIMat
LRM
32
0
0
22 Apr 2025
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
91
0
0
22 Apr 2025
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Jasper Götting
Pedro Medeiros
Jon G Sanders
Nathaniel Li
Long Phan
Karam Elabd
Lennart Justen
Dan Hendrycks
Seth Donoughe
ELM
63
2
0
21 Apr 2025
Efficient Pretraining Length Scaling
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
171
0
0
21 Apr 2025
A Self-Improving Coding Agent
A Self-Improving Coding Agent
Maxime Robeyns
Martin Szummer
Laurence Aitchison
LLMAG
46
0
0
21 Apr 2025
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Jiliang Ni
Jiachen Pu
Zhongyi Yang
Kun Zhou
Hui Wang
Xiaoliang Xiao
Dakui Wang
Xin Li
Jingfeng Luo
Conggang Hu
39
0
0
18 Apr 2025
GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning
GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning
Liangyu Xu
Yingxiu Zhao
Jingbo Wang
Yingyao Wang
Bu Pi
...
Jihao Gu
Xinfeng Li
Xiaoyong Zhu
Jun Song
Jian Xu
LRM
207
1
0
17 Apr 2025
Previous
12345...161718
Next