ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use
Shan Chen
Pedro Moreira
Yuxin Xiao
Sam Schmidgall
J. Warner
Hugo J. W. L. Aerts
Thomas Hartvigsen
Jack Gallifant
Danielle S. Bitterman
ELM
65
0
0
20 May 2025
sudoLLM : On Multi-role Alignment of Language Models
sudoLLM : On Multi-role Alignment of Language Models
Soumadeep Saha
Akshay Chaturvedi
Joy Mahapatra
Utpal Garain
45
0
0
20 May 2025
s3: You Don't Need That Much Data to Train a Search Agent via RL
s3: You Don't Need That Much Data to Train a Search Agent via RL
Pengcheng Jiang
Xueqiang Xu
Jiacheng Lin
Jinfeng Xiao
Zifeng Wang
Jimeng Sun
Jiawei Han
OffRLRALMAI4TSLRM
113
1
0
20 May 2025
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora
Yingli Shen
Wen Lai
Shuo Wang
Kangyang Luo
Alexander Fraser
Maosong Sun
84
0
0
20 May 2025
Incorporating Token Usage into Prompting Strategy Evaluation
Incorporating Token Usage into Prompting Strategy Evaluation
Chris Sypherd
Sergei Petrov
Sonny George
Vaishak Belle
LLMAG
56
0
0
20 May 2025
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
Yanggan Gu
Zhaoyi Yan
Yuanyi Wang
Yiming Zhang
Qi Zhou
Leilei Gan
Hongxia Yang
72
0
0
20 May 2025
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
Wonje Jeung
Sangyeon Yoon
Minsuk Kahng
Albert No
LRMLLMSV
198
1
0
20 May 2025
Cross-Lingual Optimization for Language Transfer in Large Language Models
Cross-Lingual Optimization for Language Transfer in Large Language Models
Jungseob Lee
Seongtae Hong
Hyeonseok Moon
Heuiseok Lim
64
0
0
20 May 2025
ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data
ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data
Xinzhe Zheng
Sijie Ji
Jiawei Sun
Ruoxin Chen
Wei Gao
Mani Srivastava
AI4MHLRM
58
0
0
20 May 2025
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models
Yakun Zhu
Zhongzhen Huang
Linjie Mu
Yutong Huang
Wei Nie
Jiaji Liu
Shaoting Zhang
Pengfei Liu
Xiaofan Zhang
LM&MAELMLRM
167
0
0
20 May 2025
Enhancing LLMs via High-Knowledge Data Selection
Enhancing LLMs via High-Knowledge Data Selection
Feiyu Duan
Xuemiao Zhang
Sirui Wang
Haoran Que
Yuqi Liu
Wenge Rong
Xunliang Cai
237
0
0
20 May 2025
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
Tomer Gafni
Asaf Karnieli
Yair Hanani
MQ
74
0
0
20 May 2025
The Energy Cost of Reasoning: Analyzing Energy Usage in LLMs with Test-time Compute
The Energy Cost of Reasoning: Analyzing Energy Usage in LLMs with Test-time Compute
Yunho Jin
Gu-Yeon Wei
David Brooks
LRM
115
0
0
20 May 2025
Fragments to Facts: Partial-Information Fragment Inference from LLMs
Fragments to Facts: Partial-Information Fragment Inference from LLMs
Lucas Rosenblatt
Bin Han
Robert Wolfe
Bill Howe
AAML
61
0
0
20 May 2025
Safety Alignment Can Be Not Superficial With Explicit Safety Signals
Safety Alignment Can Be Not Superficial With Explicit Safety Signals
Jianwei Li
Jung-Eng Kim
AAML
189
1
0
19 May 2025
Learnware of Language Models: Specialized Small Language Models Can Do Big
Learnware of Language Models: Specialized Small Language Models Can Do Big
Zhi-Hao Tan
Zi-Chen Zhao
Hao-Yu Shi
Xin-Yu Zhang
Peng Tan
Yang Yu
Zhi Zhou
137
0
0
19 May 2025
Incentivizing Truthful Language Models via Peer Elicitation Games
Incentivizing Truthful Language Models via Peer Elicitation Games
Baiting Chen
Tong Zhu
Jiale Han
Lexin Li
Gang Li
Xiaowu Dai
124
0
0
19 May 2025
Shadow-FT: Tuning Instruct via Base
Shadow-FT: Tuning Instruct via Base
Taiqiang Wu
Runming Yang
Jiayi Li
Pengfei Hu
Ngai Wong
Yujiu Yang
239
0
0
19 May 2025
An Empirical Study of Many-to-Many Summarization with Large Language Models
An Empirical Study of Many-to-Many Summarization with Large Language Models
Jiaan Wang
Fandong Meng
Zengkui Sun
Yunlong Liang
Yuxuan Cao
Jiarong Xu
Haoxiang Shi
Jie Zhou
47
0
0
19 May 2025
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Yu Fan
Jingwei Ni
Jakob Merane
Etienne Salimbeni
Yang Tian
...
Mrinmaya Sachan
Alexander Stremitzer
Christoph Engel
Elliott Ash
Joel Niklaus
AILawELM
128
0
0
19 May 2025
Improving Multilingual Language Models by Aligning Representations through Steering
Improving Multilingual Language Models by Aligning Representations through Steering
Omar Mahmoud
B. L. Semage
Thommen George Karimpanal
Santu Rana
LLMSV
84
0
0
19 May 2025
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
Debarpan Bhattacharya
Apoorva Kulkarni
Sriram Ganapathy
84
0
0
19 May 2025
ProDS: Preference-oriented Data Selection for Instruction Tuning
ProDS: Preference-oriented Data Selection for Instruction Tuning
Wenya Guo
Zhengkun Zhang
Xumeng Liu
Ying Zhang
Ziyu Lu
Haoze Zhu
Xubo Liu
Ruxue Yan
119
0
0
19 May 2025
CoT-Kinetics: A Theoretical Modeling Assessing LRM Reasoning Process
CoT-Kinetics: A Theoretical Modeling Assessing LRM Reasoning Process
Jinhe Bi
Danqi Yan
Yifan Wang
Wenke Huang
Haokun Chen
...
Mang Ye
Xun Xiao
Hinrich Schuetze
Volker Tresp
Yunpu Ma
LRM
116
9
0
19 May 2025
R3: Robust Rubric-Agnostic Reward Models
R3: Robust Rubric-Agnostic Reward Models
David Anugraha
Zilu Tang
Lester James V. Miranda
Hanyang Zhao
Mohammad Rifqi Farhansyah
Garry Kuwanto
Derry Wijaya
Genta Indra Winata
215
1
0
19 May 2025
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
Safal Shrestha
Minwu Kim
Aadim Nepal
Anubhav Shrestha
Keith Ross
OffRLReLMLRM
79
0
0
19 May 2025
Krikri: Advancing Open Large Language Models for Greek
Krikri: Advancing Open Large Language Models for Greek
Dimitris Roussis
Leon Voukoutis
Georgios Paraskevopoulos
Sokratis Sofianopoulos
Prokopis Prokopidis
Vassilis Papavasileiou
Athanasios Katsamanis
Stelios Piperidis
Vassilis Katsouros
ALM
98
1
0
19 May 2025
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference
Shuqing Luo
Pingzhi Li
Jie Peng
Hanrui Wang
Yang
Zhao
Yu Cheng
Tianlong Chen
MoE
96
0
0
19 May 2025
Relation Extraction or Pattern Matching? Unravelling the Generalisation Limits of Language Models for Biographical RE
Relation Extraction or Pattern Matching? Unravelling the Generalisation Limits of Language Models for Biographical RE
Varvara Arzt
Allan Hanbury
Michael Wiegand
Gábor Recski
Terra Blevins
70
0
0
18 May 2025
ExpertSteer: Intervening in LLMs through Expert Knowledge
ExpertSteer: Intervening in LLMs through Expert Knowledge
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
LLMSV
181
0
0
18 May 2025
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
Wenqiao Zhu
Ji Liu
Lulu Wang
Jun Wu
Yulun Zhang
106
0
0
18 May 2025
Truth Neurons
Truth Neurons
Haohang Li
Yupeng Cao
Yangyang Yu
Jordan W. Suchow
Zining Zhu
HILMMILMKELM
73
0
0
18 May 2025
HybridServe: Efficient Serving of Large AI Models with Confidence-Based Cascade Routing
HybridServe: Efficient Serving of Large AI Models with Confidence-Based Cascade Routing
Leyang Xue
Yao Fu
Luo Mai
Mahesh K. Marina
136
0
0
18 May 2025
PSC: Extending Context Window of Large Language Models via Phase Shift Calibration
PSC: Extending Context Window of Large Language Models via Phase Shift Calibration
Wenqiao Zhu
Chao Xu
Lulu Wang
Jun Wu
107
1
0
18 May 2025
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Yang Zhao
Kai Xiong
Xiao Ding
Li Du
YangouOuyang
...
Wentao Zhang
Bin Liu
Dong Hu
Bing Qin
Ting Liu
OffRL
85
0
0
18 May 2025
HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
146
0
0
18 May 2025
Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration
Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration
Tatia Tsmindashvili
Ana Kolkhidashvili
Dachi Kurtskhalia
Nino Maghlakelidze
Elene Mekvabishvili
Guram Dentoshvili
Orkhan Shamilov
Zaal Gachechiladze
Steven Saporta
David Dachi Choladze
183
0
0
18 May 2025
Wisdom from Diversity: Bias Mitigation Through Hybrid Human-LLM Crowds
Wisdom from Diversity: Bias Mitigation Through Hybrid Human-LLM Crowds
Axel Abels
Tom Lenaerts
56
0
0
18 May 2025
Towards Budget-Friendly Model-Agnostic Explanation Generation for Large Language Models
Towards Budget-Friendly Model-Agnostic Explanation Generation for Large Language Models
Junhao Liu
Haonan Yu
Xin Zhang
LRM
185
0
0
18 May 2025
Teach2Eval: An Indirect Evaluation Method for LLM by Judging How It Teaches
Teach2Eval: An Indirect Evaluation Method for LLM by Judging How It Teaches
Yuhang Zhou
Xutian Chen
Yixin Cao
Yuchen Ni
Yu He
...
Xiang Liu
Jian Zhang
Chuanjun Ji
Guangnan Ye
Xipeng Qiu
ELM
61
0
0
18 May 2025
Mutual-Taught for Co-adapting Policy and Reward Models
Mutual-Taught for Co-adapting Policy and Reward Models
Tianyuan Shi
Canbin Huang
Fanqi Wan
Longguang Zhong
Ziyi Yang
Weizhou Shen
Xiaojun Quan
Ming Yan
36
0
0
17 May 2025
OMAC: A Broad Optimization Framework for LLM-Based Multi-Agent Collaboration
OMAC: A Broad Optimization Framework for LLM-Based Multi-Agent Collaboration
Shijun Li
Hilaf Hasson
Joydeep Ghosh
LLMAG
112
0
0
17 May 2025
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Jing Huang
Junyi Tao
Thomas Icard
Diyi Yang
Christopher Potts
OODD
87
0
0
17 May 2025
Evaluating the Logical Reasoning Abilities of Large Reasoning Models
Evaluating the Logical Reasoning Abilities of Large Reasoning Models
Hanmeng Liu
Yiran Ding
Zhizhang Fu
Chaoli Zhang
Xiaozhang Liu
Yue Zhang
ELMLRM
71
1
0
17 May 2025
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities
Jingxue Chen
Qingkun Tang
Qianchun Lu
Siyuan Fang
98
0
0
17 May 2025
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
Vincent Koc
LM&MA
82
0
0
17 May 2025
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
Puning Yang
Qizhou Wang
Zhuo Huang
Tongliang Liu
Chengqi Zhang
Bo Han
MU
120
0
0
17 May 2025
Why Not Act on What You Know? Unleashing Safety Potential of LLMs via Self-Aware Guard Enhancement
Why Not Act on What You Know? Unleashing Safety Potential of LLMs via Self-Aware Guard Enhancement
Peng Ding
Jun Kuang
Zongyu Wang
Xuezhi Cao
Xunliang Cai
Jiajun Chen
Shujian Huang
99
0
0
17 May 2025
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
Ning Lu
Shengcai Liu
Jiahao Wu
Weiyu Chen
Zhirui Zhang
Yew-Soon Ong
Qi Wang
Ke Tang
106
3
0
17 May 2025
The AI Gap: How Socioeconomic Status Affects Language Technology Interactions
The AI Gap: How Socioeconomic Status Affects Language Technology Interactions
Elisa Bassignana
Amanda Cercas Curry
Dirk Hovy
68
0
0
17 May 2025
Previous
123...789...676869
Next