ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
Policy-to-Language: Train LLMs to Explain Decisions with Flow-Matching Generated Rewards
Policy-to-Language: Train LLMs to Explain Decisions with Flow-Matching Generated Rewards
Xinyi Yang
Liang Zeng
Heng Dong
Chao Yu
Xiaojun Wu
H. Yang
Yu Wang
Milind Tambe
Tonghan Wang
143
4
0
18 Feb 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
Binghai Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Zhikai Wu
LM&MAELMAI4MH
123
10
0
18 Feb 2025
Language Models Can Predict Their Own Behavior
Language Models Can Predict Their Own Behavior
Dhananjay Ashok
Jonathan May
ReLMAI4TSLRM
124
2
0
18 Feb 2025
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?
Yucheng Shi
Tianze Yang
Canyu Chen
Quanzheng Li
Tianming Liu
Xuzhao Li
Ninghao Liu
MedIm
141
4
0
18 Feb 2025
SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings
SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings
Weikai Lu
Hao Peng
Huiping Zhuang
Cen Chen
Huiping Zhuang
84
0
0
18 Feb 2025
Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization
Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization
Willy Chan
Michael Souliman
Jakob Nordhagen
Alycia Lee
Elyas Obbad
Kai Fronsdal Sanmi Koyejo
62
0
0
18 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
248
2
0
18 Feb 2025
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Weizhe Yuan
Jane Dwivedi-Yu
Song Jiang
Karthik Padthe
Yang Li
...
Ilia Kulikov
Kyunghyun Cho
Yuandong Tian
Jason Weston
Xian Li
ReLMLRM
162
20
0
18 Feb 2025
Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
Daiki Chijiwa
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Susumu Takeuchi
133
0
0
18 Feb 2025
Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance
Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance
Guangxiang Zhao
Saier Hu
Xiaoqi Jian
Jinzhu Wu
Yuhan Wu
Change Jia
Lin Sun
Xiangzheng Zhang
175
1
0
18 Feb 2025
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis
Wenbo Zhang
Hengrui Cai
Wenyu Chen
110
1
0
17 Feb 2025
Which Retain Set Matters for LLM Unlearning? A Case Study on Entity Unlearning
Which Retain Set Matters for LLM Unlearning? A Case Study on Entity Unlearning
Hwan Chang
Hwanhee Lee
MU
87
1
0
17 Feb 2025
PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection
PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection
Jinhe Bi
Yifan Wang
Danqi Yan
Xun Xiao
Artur Hecker
Volker Tresp
Yunpu Ma
VLM
236
16
0
17 Feb 2025
Designing Role Vectors to Improve LLM Inference Behaviour
Designing Role Vectors to Improve LLM Inference Behaviour
Daniele Potertì
Andrea Seveso
Fabio Mercorio
LLMSV
98
1
0
17 Feb 2025
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
Xinyu Zhang
Yuxuan Dong
Yongpeng Wu
Jiaxing Huang
Chengyou Jia
Basura Fernando
Mike Zheng Shou
Lingling Zhang
Jun Liu
AIMatReLMLRM
114
13
0
17 Feb 2025
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Yingshui Tan
Yilei Jiang
Yongbin Li
Qingbin Liu
Xingyuan Bu
Wenbo Su
Xiangyu Yue
Xiaoyong Zhu
Bo Zheng
ALM
161
6
0
17 Feb 2025
Improve LLM-as-a-Judge Ability as a General Ability
Improve LLM-as-a-Judge Ability as a General Ability
Jiachen Yu
Shaoning Sun
Xiaohui Hu
Jiaxu Yan
Kaidong Yu
Xuelong Li
ELM
155
7
0
17 Feb 2025
Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception
Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception
Shiyu Ni
Keping Bi
Jiafeng Guo
Lulu Yu
Baolong Bi
Xueqi Cheng
92
5
0
17 Feb 2025
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Xueru Wen
Jie Lou
Yaojie Lu
Hongyu Lin
Xing Yu
Xinyu Lu
Xianpei Han
Jia Zheng
Debing Zhang
Le Sun
ALM
125
7
0
17 Feb 2025
Exploring Translation Mechanism of Large Language Models
Exploring Translation Mechanism of Large Language Models
Hongbin Zhang
Kehai Chen
Xuefeng Bai
Xiucheng Li
Yang Xiang
Min Zhang
147
1
0
17 Feb 2025
Leveraging Uncertainty Estimation for Efficient LLM Routing
Leveraging Uncertainty Estimation for Efficient LLM Routing
Tuo Zhang
Asal Mehradfar
Dimitrios Dimitriadis
Salman Avestimehr
144
1
0
16 Feb 2025
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
Jafar Isbarov
Arofat Akhundjanova
Mammad Hajili
Kavsar Huseynova
Dmitry Gaynullin
...
Amina Alisheva
Aizirek Turdubaeva
Abdullatif Köksal
Samir Rustamov
Duygu Ataman
ELM
78
0
0
16 Feb 2025
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
Shahriar Kabir Nahin
R. N. Nandi
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Kowsher
Apu Chandraw Shill
Md Ibrahim
Mehadi Hasan Menon
Tareq Al Muntasir
Firoj Alam
189
0
0
16 Feb 2025
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
Yao-Ching Yu
Tsun-Han Chiang
Cheng-Wei Tsai
Chien-Ming Huang
Wen-Kwang Tsao
119
7
0
16 Feb 2025
Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation
Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation
Hieu Nguyen
Zihao He
Shoumik Atul Gandre
Ujjwal Pasupulety
Sharanya Kumari Shivakumar
Kristina Lerman
HILM
130
2
0
16 Feb 2025
Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment
Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment
Somnath Banerjee
Sayan Layek
Pratyush Chatterjee
Animesh Mukherjee
Rima Hazra
LLMSV
149
1
0
16 Feb 2025
Valuable Hallucinations: Realizable Non-realistic Propositions
Valuable Hallucinations: Realizable Non-realistic Propositions
Qiucheng Chen
Bo Wang
LRM
138
0
0
16 Feb 2025
Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning
Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning
Gangwei Jiang
Caigao Jiang
Zhaoyi Li
Siqiao Xue
Jun-ping Zhou
Linqi Song
Defu Lian
Yin Wei
CLLMU
168
2
0
16 Feb 2025
Quantifying the Capability Boundary of DeepSeek Models: An Application-Driven Performance Analysis
Quantifying the Capability Boundary of DeepSeek Models: An Application-Driven Performance Analysis
Kaikai Zhao
Zhaoxiang Liu
Xuejiao Lei
Rongjia Du
Zhenhong Long
...
Minjie Hua
Kai Wang
Wen Liu
Ning Wang
Kai Wang
ELMLRM
109
1
0
16 Feb 2025
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
Qiujie Xie
Qingqiu Li
Zhuohao Yu
Yuejie Zhang
Yue Zhang
Linyi Yang
ELM
132
5
0
15 Feb 2025
1bit-Merging: Dynamic Quantized Merging for Large Language Models
1bit-Merging: Dynamic Quantized Merging for Large Language Models
Shuqi Liu
Yuxuan Yao
Bowei He
Zehua Liu
Xiongwei Han
Mingxuan Yuan
Han Wu
Linqi Song
MoMeMQ
155
2
0
15 Feb 2025
Large Language Diffusion Models
Large Language Diffusion Models
Shen Nie
Fengqi Zhu
Zebin You
Xiaolu Zhang
Jingyang Ou
Jun Hu
Jun Zhou
Yankai Lin
Ji-Rong Wen
Chongxuan Li
280
55
0
14 Feb 2025
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
Shreyan Biswas
Alexander Erlei
U. Gadiraju
166
4
0
13 Feb 2025
LoXR: Performance Evaluation of Locally Executing LLMs on XR Devices
LoXR: Performance Evaluation of Locally Executing LLMs on XR Devices
Dawar Khan
Xinyu Liu
Omar Mena
Donggang Jia
Alexandre Kouyoumdjian
I. Viola
86
0
0
13 Feb 2025
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
Xu Huang
Wenhao Zhu
Hanxu Hu
Zeang Sheng
Lei Li
Shujian Huang
Fei Yuan
ELM
180
4
0
11 Feb 2025
Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
Ziyi Wang
Muneeza Azmart
Ang Li
R. Horesh
Mikhail Yurochkin
220
2
0
11 Feb 2025
When More is Less: Understanding Chain-of-Thought Length in LLMs
When More is Less: Understanding Chain-of-Thought Length in LLMs
Yuyang Wu
Yifei Wang
Tianqi Du
Stefanie Jegelka
Yisen Wang
Yisen Wang
LRM
158
51
0
11 Feb 2025
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
Zican Dong
Junyi Li
Jinhao Jiang
Mingyu Xu
Wayne Xin Zhao
Bin Wang
Xin Wu
VLM
371
5
0
11 Feb 2025
JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation
JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation
Shenyi Zhang
Yuchen Zhai
Keyan Guo
Hongxin Hu
Shengnan Guo
Zheng Fang
Lingchen Zhao
Chao Shen
Cong Wang
Qian Wang
AAML
143
4
0
11 Feb 2025
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Nurit Cohen-Inger
Yehonatan Elisha
Bracha Shapira
Lior Rokach
Seffi Cohen
ELM
157
1
0
11 Feb 2025
Unbiased Evaluation of Large Language Models from a Causal Perspective
Unbiased Evaluation of Large Language Models from a Causal Perspective
Meilin Chen
Jian Tian
Liang Ma
Di Xie
Weijie Chen
Jiang Zhu
ALMELM
166
0
0
10 Feb 2025
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez-Llorca
ELM
292
6
0
10 Feb 2025
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
Chaoqun Liu
Wenxuan Zhang
Jiahao Ying
Mahani Aljunied
Anh Tuan Luu
Lidong Bing
ELM
142
4
0
10 Feb 2025
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models
Xingrun Xing
Zheng Liu
Shitao Xiao
Boyan Gao
Yiming Liang
Wanpeng Zhang
Haokun Lin
Guoqi Li
Jiajun Zhang
LRM
270
2
0
10 Feb 2025
The Curse of Depth in Large Language Models
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
136
7
0
09 Feb 2025
Reinforced Lifelong Editing for Language Models
Reinforced Lifelong Editing for Language Models
Zherui Li
Houcheng Jiang
Hao Chen
Baolong Bi
Zhenhong Zhou
Fei Sun
Sihang Li
Xinze Wang
KELM
157
8
0
09 Feb 2025
Delta - Contrastive Decoding Mitigates Text Hallucinations in Large Language Models
Delta - Contrastive Decoding Mitigates Text Hallucinations in Large Language Models
Cheng Peng Huang
Hao-Yuan Chen
HILM
143
1
0
09 Feb 2025
LM2: Large Memory Models
LM2: Large Memory Models
Jikun Kang
Wenqi Wu
Filippos Christianos
Alex J. Chan
Fraser Greenlee
George Thomas
Marvin Purtorab
Andy Toulis
KELM
199
0
0
09 Feb 2025
Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions
Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions
Jingxin Xu
Guoshun Nan
Sheng Guan
Sicong Leng
Yang Liu
Zixiao Wang
Yuyang Ma
Zhili Zhou
Yanzhao Hou
Xiaofeng Tao
LM&MA
116
0
0
08 Feb 2025
Evaluating Vision-Language Models for Emotion Recognition
Evaluating Vision-Language Models for Emotion Recognition
Sree Bhattacharyya
James Z. Wang
VLM
145
2
0
08 Feb 2025
Previous
123...181920...676869
Next