ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
BioHopR: A Benchmark for Multi-Hop, Multi-Answer Reasoning in Biomedical Domain
BioHopR: A Benchmark for Multi-Hop, Multi-Answer Reasoning in Biomedical Domain
Yunsoo Kim
Yusuf Abdulle
Honghan Wu
LRM
29
0
0
28 May 2025
Resolving Knowledge Conflicts in Domain-specific Data Selection: A Case Study on Medical Instruction-tuning
Resolving Knowledge Conflicts in Domain-specific Data Selection: A Case Study on Medical Instruction-tuning
Qihuang Zhong
Liang Ding
Fei Liao
Juhua Liu
Bo Du
Dacheng Tao
45
0
0
28 May 2025
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Zero-Shot Vision Encoder Grafting via LLM Surrogates
Kaiyu Yue
Vasu Singla
Menglin Jia
John Kirchenbauer
Rifaa Qadri
Zikui Cai
A. Bhatele
Furong Huang
Tom Goldstein
VLM
66
0
0
28 May 2025
Multi-MLLM Knowledge Distillation for Out-of-Context News Detection
Multi-MLLM Knowledge Distillation for Out-of-Context News Detection
Yimeng Gu
Zhao Tong
Ignacio Castro
Shu Wu
Gareth Tyson
20
0
0
28 May 2025
GitGoodBench: A Novel Benchmark For Evaluating Agentic Performance On Git
GitGoodBench: A Novel Benchmark For Evaluating Agentic Performance On Git
Tobias Lindenbauer
Egor Bogomolov
Yaroslav Zharov
31
0
0
28 May 2025
BLUR: A Benchmark for LLM Unlearning Robust to Forget-Retain Overlap
BLUR: A Benchmark for LLM Unlearning Robust to Forget-Retain Overlap
Shengyuan Hu
Neil Kale
Pratiksha Thaker
Yiwei Fu
Steven Wu
Virginia Smith
MUAAMLCLL
14
0
0
28 May 2025
Precise In-Parameter Concept Erasure in Large Language Models
Precise In-Parameter Concept Erasure in Large Language Models
Yoav Gur-Arieh
Clara Suslik
Yihuai Hong
Fazl Barez
Mor Geva
KELMMU
92
0
0
28 May 2025
THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models
THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models
Zhiyuan Li
Yi-Ju Chang
Yuan Wu
LLMAGLRM
74
0
0
28 May 2025
EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles
EnsemW2S: Enhancing Weak-to-Strong Generalization with Large Language Model Ensembles
Aakriti Agrawal
Mucong Ding
Zora Che
Chenghao Deng
Anirudh Satheesh
Bang An
Bayan Bruss
John Langford
Furong Huang
59
0
0
28 May 2025
Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts
Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts
Xue Zhang
Yunlong Liang
Fandong Meng
Songming Zhang
Yufeng Chen
Jinan Xu
Jie Zhou
MoECLL
38
0
0
28 May 2025
GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
Qingchen Yu
Zifan Zheng
Ding Chen
Simin Niu
Bo Tang
Feiyu Xiong
Zhiyu Li
ELMLRM
44
0
0
28 May 2025
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators
John Mendonça
A. Lavie
Isabel Trancoso
53
0
0
28 May 2025
Large Language Models Often Know When They Are Being Evaluated
Large Language Models Often Know When They Are Being Evaluated
Joe Needham
Giles Edkins
Govind Pimpale
Henning Bartsch
Marius Hobbhahn
LLMAGELMALM
21
0
0
28 May 2025
Pre-Training Curriculum for Multi-Token Prediction in Language Models
Pre-Training Curriculum for Multi-Token Prediction in Language Models
Ansar Aynetdinov
Alan Akbik
LRM
42
0
0
28 May 2025
Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
Paramita Mirza
Lucas Weber
Fabian Küch
49
0
0
28 May 2025
Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate
Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate
Ashim Gupta
Maitrey Mehta
Zhichao Xu
Vivek Srikumar
49
0
0
28 May 2025
Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets
Efficient Ensemble for Fine-tuning Language Models on Multiple Datasets
Dongyue Li
Ziniu Zhang
Lu Wang
Hongyang R. Zhang
39
1
0
28 May 2025
Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
Fakhraddin Alwajih
Samar Magdy
Abdellah El Mekki
Omer Nacar
Youssef Nafea
...
Mohamedou cheikh tourad
Ismail Berrada
Mustafa Jarrar
Shady Shehata
Muhammad Abdul-Mageed
VLM
72
0
0
28 May 2025
Understanding (Un)Reliability of Steering Vectors in Language Models
Understanding (Un)Reliability of Steering Vectors in Language Models
Joschka Braun
Carsten Eickhoff
David M. Krueger
Seyed Ali Bahrainian
Dmitrii Krasheninnikov
LLMSV
78
1
0
28 May 2025
Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs
Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs
Jakub Podolak
Rajeev Verma
ReLMLRM
25
0
0
28 May 2025
Advancing Expert Specialization for Better MoE
Advancing Expert Specialization for Better MoE
Hongcan Guo
Haolang Lu
Guoshun Nan
Bolun Chu
Jialin Zhuang
Yuan Yang
Wenhao Che
Sicong Leng
Qimei Cui
Xudong Jiang
MoEMoMe
97
0
0
28 May 2025
Are Language Models Consequentialist or Deontological Moral Reasoners?
Are Language Models Consequentialist or Deontological Moral Reasoners?
Keenan Samway
Max Kleiman-Weiner
David Guzman Piedrahita
Rada Mihalcea
Bernhard Schölkopf
Zhijing Jin
ELMLRM
26
0
0
27 May 2025
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties
Jiyoung Lee
Seungho Kim
Jieun Han
Jun-Min Lee
Kitaek Kim
Alice Oh
E. Choi
59
0
0
27 May 2025
The Multilingual Divide and Its Impact on Global AI Safety
The Multilingual Divide and Its Impact on Global AI Safety
Aidan Peppin
Julia Kreutzer
Alice Schoenauer Sebag
Kelly Marchisio
Beyza Ermis
...
Wei-Yin Ko
Ahmet Üstün
Matthias Gallé
Marzieh Fadaee
Sara Hooker
ELM
75
1
0
27 May 2025
Efficient Large Language Model Inference with Neural Block Linearization
Efficient Large Language Model Inference with Neural Block Linearization
Mete Erdogan
F. Tonin
Volkan Cevher
78
0
0
27 May 2025
Research Community Perspectives on "Intelligence" and Large Language Models
Research Community Perspectives on "Intelligence" and Large Language Models
Bertram Højer
Terne Sasha Thorn Jakobsen
Anna Rogers
Stefan Heinrich
47
0
0
27 May 2025
Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing
Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing
Raoyuan Zhao
Abdullatif Köksal
Ali Modarressi
Michael A. Hedderich
Hinrich Schutze
46
0
0
27 May 2025
Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation
Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation
Tharindu Kumarage
Ninareh Mehrabi
Anil Ramakrishna
Xinyan Zhao
R. Zemel
Kai-Wei Chang
Aram Galstyan
Rahul Gupta
Charith Peris
LRM
30
0
0
27 May 2025
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge
Fengqing Jiang
Fengbo Ma
Zhangchen Xu
Yuetai Li
Bhaskar Ramasubramanian
Luyao Niu
Bo Li
Xianyan Chen
Zhen Xiang
Radha Poovendran
ALMELM
70
1
0
27 May 2025
SHE-LoRA: Selective Homomorphic Encryption for Federated Tuning with Heterogeneous LoRA
SHE-LoRA: Selective Homomorphic Encryption for Federated Tuning with Heterogeneous LoRA
Jianmin Liu
Li Yan
Borui Li
Lei Yu
Chao Shen
31
0
0
27 May 2025
Why Do More Experts Fail? A Theoretical Analysis of Model Merging
Why Do More Experts Fail? A Theoretical Analysis of Model Merging
Zijing Wang
Xingle Xu
Yongkang Liu
Yiqun Zhang
Peiqin Lin
Shi Feng
Xiaocui Yang
Daling Wang
Hinrich Schütze
MoMe
43
0
0
27 May 2025
Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations
Explainability of Large Language Models using SMILE: Statistical Model-agnostic Interpretability with Local Explanations
Zeinab Dehghani
Koorosh Aslansefat
Adil Khan
Mohammed Naveed Akram
MILMLRM
134
0
0
27 May 2025
Towards Objective Fine-tuning: How LLMs' Prior Knowledge Causes Potential Poor Calibration?
Towards Objective Fine-tuning: How LLMs' Prior Knowledge Causes Potential Poor Calibration?
Ziming Wang
Zeyu Shi
Haoyi Zhou
Shiqi Gao
Qingyun Sun
Jianxin Li
26
0
0
27 May 2025
LLMs Think, But Not In Your Flow: Reasoning-Level Personalization for Black-Box Large Language Models
LLMs Think, But Not In Your Flow: Reasoning-Level Personalization for Black-Box Large Language Models
Jieyong Kim
Tongyoung Kim
Soojin Yoon
Jaehyung Kim
Dongha Lee
LRM
78
0
0
27 May 2025
Hardware-Efficient Attention for Fast Decoding
Hardware-Efficient Attention for Fast Decoding
Ted Zadouri
Hubert Strauss
Tri Dao
68
2
0
27 May 2025
Beyond Templates: Dynamic Adaptation of Reasoning Demonstrations via Feasibility-Aware Exploration
Beyond Templates: Dynamic Adaptation of Reasoning Demonstrations via Feasibility-Aware Exploration
Yong Wu
Weihang Pan
Ke Li
Chen Binhui
Ping Li
Binbin Lin
LRM
73
0
0
27 May 2025
Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding
Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding
Patara Trirat
Wonyong Jeong
Sung Ju Hwang
89
0
0
26 May 2025
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking
Pengxiang Li
Shilin Yan
Joey Tsai
Renrui Zhang
Ruichuan An
Ziyu Guo
Xiaowei Gao
63
1
0
26 May 2025
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
J. Yang
Dongfu Jiang
Lipeng He
Sherman Siu
Yuxuan Zhang
...
Yi Lu
Quy Duc Do
Ziyan Jiang
Ping Nie
Wenhu Chen
35
0
0
26 May 2025
EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding
EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding
Zhaowei Zhang
Minghua Yi
Mengmeng Wang
Fengshuo Bai
Zilong Zheng
Yipeng Kang
Yaodong Yang
63
1
0
26 May 2025
Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs
Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs
Artem Vazhentsev
Lyudmila Rvanova
Gleb Kuzmin
Ekaterina Fadeeva
Ivan Lazichny
...
Maxim Panov
Timothy Baldwin
Mrinmaya Sachan
Preslav Nakov
Artem Shelmanov
EDLHILM
84
0
0
26 May 2025
The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models
The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language Models
Shashata Sawmya
Micah Adler
Nir Shavit
MILM
31
0
0
26 May 2025
FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets
FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets
Dannong Wang
Jaisal Patel
Daochen Zha
Steve Yang
Xiao-Yang Liu
31
0
0
26 May 2025
CP-Router: An Uncertainty-Aware Router Between LLM and LRM
CP-Router: An Uncertainty-Aware Router Between LLM and LRM
Jiayuan Su
Fulin Lin
Zhaopeng Feng
Han Zheng
Teng Wang
Zhenyu Xiao
Xinlong Zhao
Zuozhu Liu
Lu Cheng
Hongwei Wang
54
0
0
26 May 2025
Inconsistent Tokenizations Cause Language Models to be Perplexed by Japanese Grammar
Inconsistent Tokenizations Cause Language Models to be Perplexed by Japanese Grammar
Andrew Gambardella
Takeshi Kojima
Yusuke Iwasawa
Yutaka Matsuo
20
0
0
26 May 2025
Lifelong Safety Alignment for Language Models
Lifelong Safety Alignment for Language Models
Haoyu Wang
Zeyu Qin
Yifei Zhao
C. Du
Min Lin
Xueqian Wang
Tianyu Pang
KELMCLL
70
1
0
26 May 2025
Token-Importance Guided Direct Preference Optimization
Token-Importance Guided Direct Preference Optimization
Yang Ning
Lin Hai
Liu Yibo
Tian Baoliang
Liu Guoqing
Zhang Haijun
71
0
0
26 May 2025
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models
Hao Kang
Zichun Yu
Chenyan Xiong
MoE
76
0
0
26 May 2025
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning
Jaehun Jung
Seungju Han
Ximing Lu
Skyler Hallinan
David Acuna
Shrimai Prabhumoye
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
Yejin Choi
SyDa
19
1
0
26 May 2025
Interleaved Reasoning for Large Language Models via Reinforcement Learning
Interleaved Reasoning for Large Language Models via Reinforcement Learning
Roy Xie
David Qiu
Deepak Gopinath
Dong Lin
Yanchao Sun
Chong-Jun Wang
Saloni Potdar
Bhuwan Dhingra
KELMLRM
73
0
0
26 May 2025
Previous
123456...676869
Next