ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
Evaluating Multimodal Generative AI with Korean Educational Standards
Evaluating Multimodal Generative AI with Korean Educational Standards
Sangkwon Park
Geewook Kim
AI4EdELM
116
0
0
24 Feb 2025
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Zhenheng Tang
Xiang Liu
Qian Wang
Peijie Dong
Bingsheng He
Xiaowen Chu
Bo Li
LRM
122
2
0
24 Feb 2025
Large Language Models and Mathematical Reasoning Failures
Large Language Models and Mathematical Reasoning Failures
Johan Boye
Birger Moell
ELMLRM
85
4
0
24 Feb 2025
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
Marthe Ballon
Andres Algaba
Vincent Ginis
LRMReLM
104
17
0
24 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
151
1
0
24 Feb 2025
Model Lakes
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
179
2
0
24 Feb 2025
Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks
Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks
Andrei Chernov
MoE
85
0
0
24 Feb 2025
Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
Martin Kuo
Jingyang Zhang
Jianyi Zhang
Minxue Tang
Louis DiValentin
...
William Chen
Amin Hass
Tianlong Chen
Yuxiao Chen
Haoyang Li
MUKELM
123
4
0
24 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
212
9
0
24 Feb 2025
Forecasting Rare Language Model Behaviors
Erik Jones
Meg Tong
Jesse Mu
Mohammed Mahfoud
Jan Leike
Roger C. Grosse
Jared Kaplan
William Fithian
Ethan Perez
Mrinank Sharma
99
1
0
24 Feb 2025
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
Ruixuan Huang
Xunguang Wang
Zongjie Li
Daoyuan Wu
Shuai Wang
ALMELM
139
0
0
24 Feb 2025
LightThinker: Thinking Step-by-Step Compression
LightThinker: Thinking Step-by-Step Compression
Jintian Zhang
Yuqi Zhu
Mengshu Sun
Yujie Luo
Shuofei Qiao
Lun Du
Da Zheng
Ningyu Zhang
N. Zhang
LRMLLMAG
140
34
0
24 Feb 2025
Speed and Conversational Large Language Models: Not All Is About Tokens per Second
Speed and Conversational Large Language Models: Not All Is About Tokens per Second
Javier Conde
Miguel González
Pedro Reviriego
Zhen Gao
Shanshan Liu
Fabrizio Lombardi
101
4
0
23 Feb 2025
Entropy-Lens: The Information Signature of Transformer Computations
Entropy-Lens: The Information Signature of Transformer Computations
Riccardo Ali
Francesco Caso
Christopher Irwin
Pietro Lio
108
3
0
23 Feb 2025
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale
Jiaxi Li
Xingxing Zhang
Xun Wang
Xiaolong Huang
Li Dong
Liang Wang
Si-Qing Chen
Wei Lu
Furu Wei
SyDa
476
1
0
23 Feb 2025
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
Simin Chen
Yiming Chen
Zexin Li
Yifan Jiang
Zhongwei Wan
...
Dezhi Ran
Tianle Gu
Haoyang Li
Tao Xie
Baishakhi Ray
95
6
0
23 Feb 2025
Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs
Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs
Jonathan Rystrøm
Hannah Rose Kirk
Scott A. Hale
96
7
0
23 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
469
1
0
22 Feb 2025
LegalBench.PT: A Benchmark for Portuguese Law
LegalBench.PT: A Benchmark for Portuguese Law
Beatriz Canaverde
Telmo Pessoa Pires
Leonor Melo Ribeiro
Andre F. T. Martins
AILawELM
81
0
0
22 Feb 2025
Moving Beyond Medical Exam Questions: A Clinician-Annotated Dataset of Real-World Tasks and Ambiguity in Mental Healthcare
Moving Beyond Medical Exam Questions: A Clinician-Annotated Dataset of Real-World Tasks and Ambiguity in Mental Healthcare
Max Lamparth
Declan Grabb
Amy Franks
Scott Gershan
Kaitlyn N. Kunstman
...
Monika Drummond Roots
Manu Sharma
Aryan Shrivastava
N. Vasan
Colleen Waickman
137
2
0
22 Feb 2025
A generative approach to LLM harmfulness detection with special red flag tokens
A generative approach to LLM harmfulness detection with special red flag tokens
Sophie Xhonneux
David Dobre
Mehrnaz Mohfakhami
Leo Schwinn
Gauthier Gidel
184
2
0
22 Feb 2025
Self-Taught Agentic Long Context Understanding
Self-Taught Agentic Long Context Understanding
Yufan Zhuang
Xiaodong Yu
Jialian Wu
Xingwu Sun
Zihan Wang
Jiang Liu
Yusheng Su
Jingbo Shang
Zicheng Liu
Emad Barsoum
LRM
87
0
0
21 Feb 2025
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Shuo Tang
Xianghe Pang
Zexi Liu
Bohan Tang
Guangyi Liu
Xiaowen Dong
Yanjie Wang
Yanfeng Wang
Tian Jin
SyDaLLMAG
233
7
0
21 Feb 2025
Mind the Gap! Static and Interactive Evaluations of Large Audio Models
Mind the Gap! Static and Interactive Evaluations of Large Audio Models
Minzhi Li
William B. Held
Michael Joseph Ryan
Kunat Pipatanakul
Potsawee Manakul
Hao Zhu
Diyi Yang
AuLLMALM
95
2
0
21 Feb 2025
C3AI: Crafting and Evaluating Constitutions for Constitutional AI
Yara Kyrychenko
Ke Zhou
Edyta Bogucka
Daniele Quercia
ELM
95
5
0
21 Feb 2025
Sparsity May Be All You Need: Sparse Random Parameter Adaptation
Sparsity May Be All You Need: Sparse Random Parameter Adaptation
Jesus Rios
Pierre Dognin
Ronny Luss
Karthikeyan N. Ramamurthy
209
1
0
21 Feb 2025
R-LoRA: Randomized Multi-Head LoRA for Efficient Multi-Task Learning
R-LoRA: Randomized Multi-Head LoRA for Efficient Multi-Task Learning
Jinda Liu
Yi-Ju Chang
Yuan Wu
162
0
0
21 Feb 2025
Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews
Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews
Mengqiao Liu
Tevin Wang
Cassandra A. Cohen
Sarah Li
Chenyan Xiong
LRM
120
0
0
21 Feb 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRMAI4CEReLMKELM
412
3
0
21 Feb 2025
Federated Fine-Tuning of Large Language Models: Kahneman-Tversky vs. Direct Preference Optimization
Federated Fine-Tuning of Large Language Models: Kahneman-Tversky vs. Direct Preference Optimization
Fernando Spadea
Oshani Seneviratne
80
1
0
21 Feb 2025
Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
John Burden
Marko Tesic
Lorenzo Pacchiardi
José Hernández-Orallo
77
1
0
21 Feb 2025
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Jingbo Yang
Bairu Hou
Wei Wei
Yujia Bao
Shiyu Chang
VLM
188
3
0
21 Feb 2025
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Joshua Harris
Timothy Laurence
Leo Loman
Fan Grayson
Toby Nonnenmacher
...
Hamish Mohammed
Thomas Finnie
Luke Hounsome
Michael Borowitz
Steven Riley
LM&MAAI4MH
148
5
0
20 Feb 2025
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Yuhao Du
Zehan Li
Pengyu Cheng
Zhihong Chen
Yuejiao Xie
Xiang Wan
Anningzhe Gao
115
1
0
20 Feb 2025
Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models
Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models
Shuqi Liu
Han Wu
Bowei He
Xiongwei Han
Mingxuan Yuan
Linqi Song
MoMe
136
3
0
20 Feb 2025
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models
Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models
M. Russinovich
Ahmed Salem
CLLMU
145
3
0
20 Feb 2025
Multilingual Language Model Pretraining using Machine-translated Data
Multilingual Language Model Pretraining using Machine-translated Data
Jiayi Wang
Yao Lu
Maurice Weber
Max Ryabinin
David Ifeoluwa Adelani
Yihong Chen
Raphael Tang
Pontus Stenetorp
LRM
130
5
0
20 Feb 2025
LUME: LLM Unlearning with Multitask Evaluations
LUME: LLM Unlearning with Multitask Evaluations
Anil Ramakrishna
Yixin Wan
Xiaomeng Jin
Kai-Wei Chang
Zhiqi Bu
Bhanukiran Vinzamuri
Volkan Cevher
Mingyi Hong
Rahul Gupta
CLLMU
192
14
0
20 Feb 2025
Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models
Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models
Seonil Son
Ju-Min Oh
Heegon Jin
Cheolhun Jang
Jeongbeom Jeong
Kuntae Kim
149
1
0
20 Feb 2025
Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans
Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans
Masha Fedzechkina
Eleonora Gualdoni
Sinead Williamson
Katherine Metcalf
Skyler Seto
B. Theobald
90
1
0
20 Feb 2025
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Anton Razzhigaev
Matvey Mikhalchuk
Temurbek Rahmatullaev
Elizaveta Goncharova
Polina Druzhinina
Ivan Oseledets
Andrey Kuznetsov
125
5
0
20 Feb 2025
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
Faster WIND: Accelerating Iterative Best-of-NNN Distillation for LLM Alignment
Tong Yang
Jincheng Mei
H. Dai
Zixin Wen
Shicong Cen
Dale Schuurmans
Yuejie Chi
Bo Dai
120
4
0
20 Feb 2025
Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease
Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease
Elliot Schumacher
Dhruv Naik
Anitha Kannan
LM&MA
66
0
0
20 Feb 2025
LESA: Learnable LLM Layer Scaling-Up
LESA: Learnable LLM Layer Scaling-Up
Yifei Yang
Zouying Cao
Xinbei Ma
Yao Yao
L. Qin
Zhongfu Chen
Hai Zhao
179
0
0
20 Feb 2025
An explainable transformer circuit for compositional generalization
An explainable transformer circuit for compositional generalization
Cheng Tang
Brenden Lake
Mehrdad Jazayeri
LRM
149
3
0
19 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Zehan Li
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Yansen Wang
83
0
0
19 Feb 2025
RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering
RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering
Sichu Liang
Linhai Zhang
Hongyu Zhu
Wenwen Wang
Yulan He
Deyu Zhou
RALM
101
0
0
19 Feb 2025
GneissWeb: Preparing High Quality Data for LLMs at Scale
GneissWeb: Preparing High Quality Data for LLMs at Scale
Hajar Emami-Gohari
S. Kadhe
Syed Yousaf Shah. Constantin Adam
Abdulhamid A. Adebayo
Praneet Adusumilli
...
Issei Yoshida
Syed Zawad
Petros Zerfos
Yi Zhou
Bishwaranjan Bhattacharjee
68
1
0
19 Feb 2025
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
Jiaqi Zhao
Miao Zhang
Ming Wang
Yuzhang Shang
Kaihao Zhang
Weili Guan
Yaowei Wang
Min Zhang
MQ
114
1
0
18 Feb 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
Binghai Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Zhikai Wu
LM&MAELMAI4MH
123
10
0
18 Feb 2025
Previous
123...171819...676869
Next