ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Neusha Javidnia
B. Rouhani
F. Koushanfar
554
0
0
14 Mar 2025
MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling
R. Teo
T. Nguyen
MoE
149
2
0
14 Mar 2025
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li
Mehdi Rezagholizadeh
Mingyu Yang
Vikram Appia
Emad Barsoum
VLM
106
1
0
14 Mar 2025
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
Hao Cui
Zahra Shamsi
Gowoon Cheon
Xuejian Ma
Shutong Li
...
Eun-Ah Kim
M. Brenner
Viren Jain
Sameera Ponda
Subhashini Venugopalan
ELMLRM
144
4
0
14 Mar 2025
GNNs as Predictors of Agentic Workflow Performances
Yanzhe Zhang
Yuchen Hou
Bohan Tang
Shuo Chen
Muhan Zhang
Xiaowen Dong
Tian Jin
LLMAGAI4CE
118
2
0
14 Mar 2025
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
Weihao Xuan
Rui Yang
Heli Qi
Qingcheng Zeng
Yunze Xiao
...
Edison Marrese-Taylor
Shijian Lu
Yusuke Iwasawa
Yutaka Matsuo
Irene Li
ELM
217
7
0
13 Mar 2025
From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM
Kshitij Ambilduke
Ben Peters
Sonal Sannigrahi
Anil Keshwani
Tsz Kin Lam
Bruno Martins
Marcely Zanon Boito
André F. T. Martins
110
2
0
13 Mar 2025
Data Caricatures: On the Representation of African American Language in Pretraining Corpora
Nicholas Deas
Blake Vente
Amith Ananthram
Jessica A. Grieser
D. Patton
Shana Kleiner
James Shepard
Kathleen McKeown
79
0
0
13 Mar 2025
Numerical Error Analysis of Large Language Models
Stanislav Budzinskiy
Wenyi Fang
Longbin Zeng
Philipp Petersen
92
1
0
13 Mar 2025
Compute Optimal Scaling of Skills: Knowledge vs Reasoning
Compute Optimal Scaling of Skills: Knowledge vs Reasoning
Nicholas Roberts
Niladri S. Chatterji
Sharan Narang
Mike Lewis
Dieuwke Hupkes
117
2
0
13 Mar 2025
DarkBench: Benchmarking Dark Patterns in Large Language Models
Esben Kran
Hieu Minh "Jord" Nguyen
Akash Kundu
Sami Jawhar
Jinsuk Park
Mateusz Maria Jurewicz
105
3
0
13 Mar 2025
Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling
Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling
Qiyuan Deng
X. Bai
Kehai Chen
Yaowei Wang
Liqiang Nie
Min Zhang
OffRL
121
0
0
13 Mar 2025
MentalChat16K: A Benchmark Dataset for Conversational Mental Health Assistance
MentalChat16K: A Benchmark Dataset for Conversational Mental Health Assistance
Jia Xu
Tianyi Wei
Bojian Hou
Patryk Orzechowski
Shu Yang
Ruochen Jin
Rachael Paulbeck
Joost B. Wagenaar
George Demiris
Li Shen
AI4MH
81
1
0
13 Mar 2025
OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problem with Reasoning Large Language Model
Bowen Zhang
Pengcheng Luo
LRMAI4CELLMAG
130
2
0
13 Mar 2025
Reinforcement Learning is all You Need
Yongsheng Lian
ReLMOffRLLRM
97
0
0
12 Mar 2025
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin
Hansi Zeng
Zhenrui Yue
Dong Wang
Sercan O. Arik
Dong Wang
Hamed Zamani
Jiawei Han
RALMReLMKELMOffRLAI4TSLRM
228
122
0
12 Mar 2025
Probabilistic Reasoning with LLMs for k-anonymity Estimation
Probabilistic Reasoning with LLMs for k-anonymity Estimation
Jonathan Zheng
Sauvik Das
Alan Ritter
Wei Xu
132
0
0
12 Mar 2025
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
Chuan Qin
Xiusi Chen
Chengrui Wang
Pengmin Wu
Xi Chen
...
Han Wu
Chong Li
Yuanchun Zhou
H. Xiong
Hengshu Zhu
ELM
91
2
0
12 Mar 2025
Token Weighting for Long-Range Language Modeling
Falko Helm
Nico Daheim
Iryna Gurevych
109
1
0
12 Mar 2025
Teaching LLMs How to Learn with Contextual Fine-Tuning
Younwoo Choi
Muhammad Adil Asif
Ziwen Han
John Willes
Rahul G. Krishnan
LRM
99
2
0
12 Mar 2025
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Iván Arcuschin
Jett Janiak
Robert Krzyzanowski
Senthooran Rajamanoharan
Neel Nanda
Arthur Conmy
ReLMLRM
188
20
0
11 Mar 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLMReLMLRM
113
4
0
11 Mar 2025
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
Zhiyuan Zeng
Yizhong Wang
Hannaneh Hajishirzi
Pang Wei Koh
ELM
107
8
0
11 Mar 2025
Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models
Bozhi Luan
Wengang Zhou
Hao Feng
Zhe Wang
Xiaosong Li
Haoyang Li
VLM
131
0
0
11 Mar 2025
Accurate INT8 Training Through Dynamic Block-Level Fallback
Accurate INT8 Training Through Dynamic Block-Level Fallback
Pengle Zhang
Jia Wei
Jintao Zhang
Jun-Jie Zhu
Jianfei Chen
MQ
171
9
0
11 Mar 2025
RigoChat 2: an adapted language model to Spanish using a bounded dataset and reduced hardware
Gonzalo Santamaría Gómez
Guillem García Subies
Pablo Gutiérrez Ruiz
Mario González Valero
Natàlia Fuertes
...
Nuria Aldama García
David Betancur Sánchez
Kateryna Sushkova
Marta Guerrero Nieto
Á. Jiménez
122
0
0
11 Mar 2025
DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering
Sher Badshah
Hassan Sajjad
134
1
0
11 Mar 2025
Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality
Alex Fang
Hadi Pouransari
Matt Jordan
Alexander Toshev
Vaishaal Shankar
Ludwig Schmidt
Tom Gunter
109
0
0
10 Mar 2025
ALLVB: All-in-One Long Video Understanding Benchmark
ALLVB: All-in-One Long Video Understanding Benchmark
Xichen Tan
Yuanjing Luo
Yunfan Ye
Fang Liu
Zhiping Cai
MLLMVLM
169
1
0
10 Mar 2025
Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies
Luyi Jiang
Jiasi Chen
Lu Lu
Xinwei Peng
Lihao Liu
Junjun He
Jie Xu
ELMLM&MA
80
0
0
10 Mar 2025
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
Xun Liang
Hanyu Wang
Huayi Lai
Pengnian Qi
Shichao Song
Jiawei Yang
Jihao Zhao
Feiyu Xiong
Simin Niu
Zhiyu Li
VLM
86
0
0
10 Mar 2025
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
Xiangru Tang
Daniel Shao
Jiwoong Sohn
Jiapeng Chen
Jiayi Zhang
...
Yilun Zhao
Chenglin Wu
Wenqi Shi
Arman Cohan
Mark B. Gerstein
AI4MHLRMELMLM&MA
130
10
0
10 Mar 2025
Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
Yongqiang Yao
Jingru Tan
Kaihuan Liang
Feizhao Zhang
Yazhe Niu
Jiahao Hu
Ruihao Gong
Dahua Lin
Ningyi Xu
98
0
0
10 Mar 2025
RePO: ReLU-based Preference Optimization
Junkang Wu
Kexin Huang
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
Xiang Wang
108
1
0
10 Mar 2025
Small Vision-Language Models: A Survey on Compact Architectures and Techniques
Nitesh Patnaik
Navdeep Nayak
Himani Bansal Agrawal
Moinak Chinmoy Khamaru
Gourav Bal
Saishree Smaranika Panda
Rishi Raj
Vishal Meena
Kartheek Vadlamani
VLM
97
0
0
09 Mar 2025
Effectiveness of Zero-shot-CoT in Japanese Prompts
Shusuke Takayama
Ian Frank
LRM
76
0
0
09 Mar 2025
Evaluating and Aligning Human Economic Risk Preferences in LLMs
Qingbin Liu
Yi Yang
Kar Yan Tam
123
0
0
09 Mar 2025
MoFE: Mixture of Frozen Experts Architecture
Jean Seo
Jaeyoon Kim
Hyopil Shin
MoE
500
0
0
09 Mar 2025
WildIFEval: Instruction Following in the Wild
Gili Lior
Asaf Yehudai
Ariel Gera
L. Ein-Dor
143
0
0
09 Mar 2025
Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity
HyunJin Kim
Xiaoyuan Yi
Jing Yao
Muhua Huang
Jinyeong Bak
James Evans
Xing Xie
95
0
0
08 Mar 2025
SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?
Xudong Lu
Haohao Gao
Renshou Wu
Shuai Ren
Xiaoxin Chen
Hongsheng Li
Fangyuan Li
ELM
107
0
0
08 Mar 2025
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong Lu
Yinghao Chen
Renshou Wu
Haohao Gao
Xi Chen
...
Fangyuan Li
Yafei Wen
Xiaoxin Chen
Shuai Ren
Hongsheng Li
165
0
0
08 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
LRM
220
7
0
08 Mar 2025
Text-Speech Language Models with Improved Cross-Modal Transfer by Aligning Abstraction Levels
Santiago Cuervo
Adel Moumen
Yanis Labrak
Sameer Khurana
Antoine Laurent
Mickael Rouvier
R. Marxer
138
1
0
08 Mar 2025
IteRABRe: Iterative Recovery-Aided Block Reduction
Haryo Akbarianto Wibowo
Haiyue Song
Hideki Tanaka
Masao Utiyama
Alham Fikri Aji
Raj Dabre
89
1
0
08 Mar 2025
The Society of HiveMind: Multi-Agent Optimization of Foundation Model Swarms to Unlock the Potential of Collective Intelligence
Noah Mamie
Susie Xi Rao
LLMAGAI4CE
136
1
0
07 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
Chenyu Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoEALM
196
5
0
07 Mar 2025
RocketEval: Efficient Automated LLM Evaluation via Grading Checklist
Tianjun Wei
Wei Wen
Ruizhi Qiao
Xing Sun
Jianghong Ma
ALMELM
75
2
0
07 Mar 2025
Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs
Zara Siddique
Irtaza Khalid
Liam D. Turner
Luis Espinosa-Anke
LLMSV
161
2
0
07 Mar 2025
IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining
Yixiao Li
Xianzhi Du
Ajay Jaiswal
Tao Lei
T. Zhao
Chong-Jun Wang
Jianyu Wang
77
1
0
07 Mar 2025
Previous
123...141516...676869
Next