ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
Shen Yuan
Yin Zheng
Taifeng Wang
Binbin Liu
Hongteng Xu
MoMe
44
0
0
01 Jul 2025
Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs
Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs
Ricardo Rei
Nuno M. Guerreiro
José P. Pombal
Joao Alves
Pedro Teixeirinha
Amin Farajian
André F. T. Martins
ALMLRM
10
0
0
20 Jun 2025
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
Revisiting LoRA through the Lens of Parameter Redundancy: Spectral Encoding Helps
Jiashun Cheng
Aochuan Chen
Nuo Chen
Ziqi Gao
Yuhan Li
Jia Li
Fugee Tsung
17
0
0
20 Jun 2025
The Role of Model Confidence on Bias Effects in Measured Uncertainties
The Role of Model Confidence on Bias Effects in Measured Uncertainties
Xinyi Liu
Weiguang Wang
Hangfeng He
12
0
0
20 Jun 2025
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
Tongtian Yue
Longteng Guo
Yepeng Tang
Zijia Zhao
Xinxin Zhu
Hua Huang
Jing Liu
MLLMVLM
16
0
0
20 Jun 2025
Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation
Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation
Jiahao Cheng
Tiancheng Su
Jia Yuan
Guoxiu He
Jiawei Liu
Xinqi Tao
Jingwen Xie
Huaxia Li
HILMLRM
16
0
0
20 Jun 2025
Arch-Router: Aligning LLM Routing with Human Preferences
Arch-Router: Aligning LLM Routing with Human Preferences
Co Tran
Salman Paracha
Adil Hafeez
Shuguang Chen
20
0
0
19 Jun 2025
Mr. Snuffleupagus at SemEval-2025 Task 4: Unlearning Factual Knowledge from LLMs Using Adaptive RMU
Mr. Snuffleupagus at SemEval-2025 Task 4: Unlearning Factual Knowledge from LLMs Using Adaptive RMU
Arjun Dosajh
Mihika Sanghi
MU
12
0
0
19 Jun 2025
Learning-Time Encoding Shapes Unlearning in LLMs
Learning-Time Encoding Shapes Unlearning in LLMs
Ruihan Wu
Konstantin Garov
Kamalika Chaudhuri
MU
22
0
0
18 Jun 2025
RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation
RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation
Xinnuo Xu
Rachel Lawrence
Kshitij Dubey
Atharva Pandey
Risa Ueno
Fabian Falck
A. Nori
Rahul Sharma
Amit Sharma
Javier González
LRM
28
0
0
18 Jun 2025
SLR: An Automated Synthesis Framework for Scalable Logical Reasoning
SLR: An Automated Synthesis Framework for Scalable Logical Reasoning
Lukas Helff
Ahmad Omar
Felix Friedrich
Wolfgang Stammer
Antonia Wüst
Tim Woydt
Rupert Mitchell
P. Schramowski
Kristian Kersting
LRM
25
0
0
18 Jun 2025
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
Feng He
Zijun Chen
Xinnian Liang
Tingting Ma
Yunqi Qiu
Shuangzhi Wu
Junchi Yan
LRM
71
0
0
18 Jun 2025
RATTENTION: Towards the Minimal Sliding Window Size in Local-Global Attention Models
RATTENTION: Towards the Minimal Sliding Window Size in Local-Global Attention Models
Bailin Wang
Chang Lan
Chong-Jun Wang
Ruoming Pang
15
0
0
18 Jun 2025
Finance Language Model Evaluation (FLaME)
Finance Language Model Evaluation (FLaME)
Glenn Matlin
Mika Okamoto
Huzaifa Pardawala
Yang Yang
Sudheer Chava
AIFinLRM
30
0
0
18 Jun 2025
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
Ads Dawson
Rob Mulla
Nick Landers
Shane Caldwell
ELM
26
0
0
17 Jun 2025
Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality
Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality
Yuto Harada
Yusuke Yamauchi
Yusuke Oda
Yohei Oseki
Yusuke Miyao
Yu Takagi
ALM
31
0
0
17 Jun 2025
Essential-Web v1.0: 24T tokens of organized web data
Essential-Web v1.0: 24T tokens of organized web data
Essential AI
Andrew Hojel
Michael Pust
Tim Romanski
Yash Vanjani
...
Platon Mazarakis
Saad Jamal
Saurabh Srivastava
Somanshu Singla
Ashish Vaswani
24
0
0
17 Jun 2025
Improving LoRA with Variational Learning
Improving LoRA with Variational Learning
Bai Cong
Nico Daheim
Yuesong Shen
Rio Yokota
Mohammad Emtiyaz Khan
Thomas Möllenhoff
33
0
0
17 Jun 2025
SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models
SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models
Gyuhak Kim
Sumiran Thakur
Su Min Park
Wei Wei
Yujia Bao
17
0
0
17 Jun 2025
Align-then-Unlearn: Embedding Alignment for LLM Unlearning
Align-then-Unlearn: Embedding Alignment for LLM Unlearning
Philipp Spohn
Leander Girrbach
Jessica Bader
Zeynep Akata
MU
16
0
0
16 Jun 2025
ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models
ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models
Junho Yoon
Geom Lee
Donghyeon Jeon
Inho Kang
Seung-Hoon Na
MQVLM
32
0
0
16 Jun 2025
Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles
Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles
Antara Raaghavi Bhattacharya
Isabel Papadimitriou
Kathryn Davidson
David Alvarez-Melis
LRM
16
0
0
16 Jun 2025
Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
Qiming Ge
Shuhao Xing
Songyang Gao
Yunhua Zhou
Yicheng Zou
...
Zhi Chen
Hang Yan
Qi Zhang
Q. Guo
Kai Chen
30
0
0
16 Jun 2025
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
Badr AlKhamissi
C. Nicolò De Sabbata
Zeming Chen
Martin Schrimpf
Antoine Bosselut
MoELRM
24
0
0
16 Jun 2025
A Practical Guide for Evaluating LLMs and LLM-Reliant Systems
A Practical Guide for Evaluating LLMs and LLM-Reliant Systems
Ethan M. Rudd
Christopher Andrews
Philip Tully
ELM
37
0
0
16 Jun 2025
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
Zhengyu Hu
Jianxun Lian
Zheyuan Xiao
Seraphina Zhang
Tianfu Wang
Nicholas Jing Yuan
Xing Xie
Hui Xiong
ELMLRM
27
0
0
16 Jun 2025
Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Houcheng Jiang
Zetong Zhao
Junfeng Fang
Haokai Ma
Ruipeng Wang
Yang Deng
Xiang Wang
Xiangnan He
KELMAAML
27
0
0
16 Jun 2025
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
Kaiyuan Chen
Y. Ren
Yang Liu
Xiaobo Hu
Haotong Tian
...
Yuan Jiang
Zexuan Liu
Zihan Yin
Zijian Ma
Zhiwen Mo
32
0
0
16 Jun 2025
BOW: Bottlenecked Next Word Exploration
BOW: Bottlenecked Next Word Exploration
Ming shen
Zhikun Xu
Xiao Ye
Jacob Dineen
Ben Zhou
OffRLLRM
30
0
0
16 Jun 2025
Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills
Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills
Changsheng Wang
Chongyu Fan
Yihua Zhang
Jinghan Jia
Dennis Wei
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MUKELMLRM
45
0
0
15 Jun 2025
Domain Specific Benchmarks for Evaluating Multimodal Large Language Models
Domain Specific Benchmarks for Evaluating Multimodal Large Language Models
Khizar Anjuma
Muhammad Arbab Arshad
Kadhim Hayawi
Efstathios Polyzos
A. Tariq
...
Nishith Reddy Mannuru
Ravi Varma Kumar Bevara
Taslim Mahbub
Muhammad Zeeshan Akram
Sakib Shahriar
ELMLRM
52
0
0
15 Jun 2025
Assessing the Role of Data Quality in Training Bilingual Language Models
Assessing the Role of Data Quality in Training Bilingual Language Models
Skyler Seto
Maartje ter Hoeve
Maureen de Seyssel
David Grangier
17
0
0
15 Jun 2025
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models
Yan Sun
Qixin Zhang
Zhiyuan Yu
Xikun Zhang
Li Shen
Dacheng Tao
25
0
0
15 Jun 2025
Universal Jailbreak Suffixes Are Strong Attention Hijackers
Universal Jailbreak Suffixes Are Strong Attention Hijackers
Matan Ben-Tov
Mor Geva
Mahmood Sharif
29
0
0
15 Jun 2025
TagRouter: Learning Route to LLMs through Tags for Open-Domain Text Generation Tasks
TagRouter: Learning Route to LLMs through Tags for Open-Domain Text Generation Tasks
Zhou Chen
Zhiqiang Wei
Yuqi Bai
Xue Xiong
Jianmin Wu
3DV
16
0
0
14 Jun 2025
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
Asghar Ghorbani
Hanieh Fattahi
18
0
0
14 Jun 2025
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics
Vineeth Dorna
Anmol Mekala
Wenlong Zhao
Andrew McCallum
Zachary Chase Lipton
J. Zico Kolter
Pratyush Maini
MUELM
27
0
0
14 Jun 2025
Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization
Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization
Filip Sondej
Yushi Yang
Mikołaj Kniejski
Marcel Windys
MU
44
0
0
14 Jun 2025
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
Zhenyu Hou
Ziniu Hu
Yujiang Li
Rui Lu
Jie Tang
Yuxiao Dong
OffRLLRM
13
0
0
13 Jun 2025
Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards
Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards
Jaehoon Yun
Jiwoong Sohn
Jungwoo Park
Hyunjae Kim
Xiangru Tang
...
Minhyeok Ko
Qingyu Chen
Mark B. Gerstein
Michael Moor
Jaewoo Kang
LRMLM&MA
20
0
0
13 Jun 2025
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Hao Xu
Jiacheng Liu
Yejin Choi
Noah A. Smith
Hannaneh Hajishirzi
20
0
0
13 Jun 2025
Curriculum-Guided Layer Scaling for Language Model Pretraining
Curriculum-Guided Layer Scaling for Language Model Pretraining
Karanpartap Singh
Neil Band
Ehsan Adeli
ALMLRM
37
0
0
13 Jun 2025
Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback
Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback
Dongwei Jiang
Alvin Zhang
Andrew Wang
Nicholas Andrews
Daniel Khashabi
LRM
27
0
0
13 Jun 2025
Improving Large Language Model Safety with Contrastive Representation Learning
Improving Large Language Model Safety with Contrastive Representation Learning
Samuel Simko
Mrinmaya Sachan
Bernhard Schölkopf
Zhijing Jin
AAML
15
0
0
13 Jun 2025
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Xiaozhe Li
Jixuan Chen
Xinyu Fang
Shengyuan Ding
Haodong Duan
Qingwen Liu
Kai-xiang Chen
LLMAGLRM
106
0
0
12 Jun 2025
"Check My Work?": Measuring Sycophancy in a Simulated Educational Context
"Check My Work?": Measuring Sycophancy in a Simulated Educational Context
Chuck Arvin
115
0
0
12 Jun 2025
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models
Aleksandra Sorokovikova
Pavel Chizhov
Iuliia Eremenko
Ivan P. Yamshchikov
98
0
0
12 Jun 2025
One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers
One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers
Diana Abagyan
Alejandro Salamanca
Andres Felipe Cruz-Salinas
Kris Cao
Hangyu Lin
Acyr Locatelli
Marzieh Fadaee
Ahmet Üstün
Sara Hooker
CLL
131
0
0
12 Jun 2025
Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning
Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning
Jikai Jin
Vasilis Syrgkanis
Sham Kakade
Hanlin Zhang
ELM
122
1
0
12 Jun 2025
Defensive Adversarial CAPTCHA: A Semantics-Driven Framework for Natural Adversarial Example Generation
Defensive Adversarial CAPTCHA: A Semantics-Driven Framework for Natural Adversarial Example Generation
Xia Du
Xiaoyuan Liu
Jizhe Zhou
Zheng Lin
Chi-Man Pun
Zhe Chen
Tao Li
Zhe Chen
Wei Ni
Jun Luo
AAML
126
0
0
12 Jun 2025
1234...676869
Next