ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
    ELM
    RALM
ArXivPDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 875 papers shown
Title
Improving Multilingual Language Models by Aligning Representations through Steering
Improving Multilingual Language Models by Aligning Representations through Steering
Omar Mahmoud
B. L. Semage
Thommen George Karimpanal
Santu Rana
LLMSV
2
0
0
19 May 2025
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Yu Fan
Jingwei Ni
Jakob Merane
Etienne Salimbeni
Yang Tian
...
Mrinmaya Sachan
Alexander Stremitzer
Christoph Engel
Elliott Ash
Joel Niklaus
AILaw
ELM
26
0
0
19 May 2025
An Empirical Study of Many-to-Many Summarization with Large Language Models
An Empirical Study of Many-to-Many Summarization with Large Language Models
Jiaan Wang
Fandong Meng
Zengkui Sun
Yunlong Liang
Yuxuan Cao
Jiarong Xu
Haoxiang Shi
Jie Zhou
17
0
0
19 May 2025
Towards Budget-Friendly Model-Agnostic Explanation Generation for Large Language Models
Towards Budget-Friendly Model-Agnostic Explanation Generation for Large Language Models
Junhao Liu
Haonan Yu
Xin Zhang
LRM
4
0
0
18 May 2025
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Yang Zhao
Kai Xiong
Xiao Ding
Li Du
YangouOuyang
...
Feiyu Xiong
Bin Liu
Dong Hu
Bing Qin
Ting Liu
OffRL
4
0
0
18 May 2025
PSC: Extending Context Window of Large Language Models via Phase Shift Calibration
PSC: Extending Context Window of Large Language Models via Phase Shift Calibration
Wenqiao Zhu
Chao Xu
Lulu Wang
Jun Wu
2
1
0
18 May 2025
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
Vincent Koc
LM&MA
7
0
0
17 May 2025
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
Puning Yang
Qizhou Wang
Zhuo Huang
Tongliang Liu
Chengqi Zhang
Bo Han
MU
21
0
0
17 May 2025
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models
Xiaomin Li
Mingye Gao
Yuexing Hao
Taoran Li
Guangya Wan
Zihan Wang
Yijun Wang
LM&MA
ELM
AI4MH
12
0
0
16 May 2025
GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge Subtraction
GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge Subtraction
Mohammadtaha Bagherifard
Sahar Rajabi
Ali Edalat
Yadollah Yaghoobzadeh
KELM
32
0
0
16 May 2025
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Zhigang Wang
Jiaqi Zeng
Olivier Delalleau
Hoo-Chang Shin
Felipe Soares
Alexander Bukharin
Ellie Evans
Yi Dong
Oleksii Kuchaiev
22
0
0
16 May 2025
Evaluations at Work: Measuring the Capabilities of GenAI in Use
Evaluations at Work: Measuring the Capabilities of GenAI in Use
Brandon Lepine
Gawesha Weerantunga
Juho Kim
Pamela Mishkin
Matthew Beane
17
0
0
15 May 2025
On the Evaluation of Engineering Artificial General Intelligence
On the Evaluation of Engineering Artificial General Intelligence
Sandeep Neema
Susmit Jha
Adam Nagel
Ethan Lew
Chandrasekar Sureshkumar
Aleksa Gordic
Chase Shimmin
Hieu Nguygen
Paul Eremenko
ELM
22
0
0
15 May 2025
Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning
Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning
Yoichi Ishibashi
Taro Yano
Masafumi Oyamada
SyDa
LRM
44
0
0
15 May 2025
Parallel Scaling Law for Language Models
Parallel Scaling Law for Language Models
Mouxiang Chen
Binyuan Hui
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
Jianling Sun
Junyang Lin
Zhongxin Liu
MoE
LRM
37
0
0
15 May 2025
Qwen3 Technical Report
Qwen3 Technical Report
An Yang
A. Li
Baosong Yang
Beichen Zhang
Binyuan Hui
...
Zekun Wang
Zeyu Cui
Zhenru Zhang
Zhenhong Zhou
Zihan Qiu
LLMAG
OSLM
LRM
45
0
0
14 May 2025
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"
Pedro M. P. Curvo
Mara Dragomir
Salvador Torpes
Mohammadmahdi Rahimi
LLMAG
21
0
0
14 May 2025
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models
Abdullah Mushtaq
Imran Taj
Rafay Naeem
Ibrahim Ghaznavi
Junaid Qadir
26
0
0
14 May 2025
Layered Unlearning for Adversarial Relearning
Layered Unlearning for Adversarial Relearning
Timothy Qian
Vinith Suriyakumar
Ashia Wilson
Dylan Hadfield-Menell
MU
28
0
0
14 May 2025
Evaluating LLM Metrics Through Real-World Capabilities
Evaluating LLM Metrics Through Real-World Capabilities
Justin K Miller
Wenjia Tang
ELM
ALM
47
0
0
13 May 2025
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models
Xiaoyang Chen
Xinan Dai
Yu Du
Qian Feng
Naixu Guo
...
J. Xu
Yiyang Yu
Zhengyuan Yang
Hongji Zha
Ruichong Zhang
LRM
39
1
0
13 May 2025
Small but Significant: On the Promise of Small Language Models for Accessible AIED
Small but Significant: On the Promise of Small Language Models for Accessible AIED
Yumou Wei
Paulo Carvalho
John Stamper
SyDa
45
0
0
13 May 2025
DeltaEdit: Enhancing Sequential Editing in Large Language Models by Controlling Superimposed Noise
DeltaEdit: Enhancing Sequential Editing in Large Language Models by Controlling Superimposed Noise
Ding Cao
Yuchen Cai
Rongxi Guo
Xiaoxiao He
Guiquan Liu
KELM
43
0
0
12 May 2025
Assessing the Chemical Intelligence of Large Language Models
Assessing the Chemical Intelligence of Large Language Models
Nicholas T. Runcie
Charlotte M. Deane
Fergus Imrie
ELM
LRM
40
0
0
12 May 2025
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
Zhehao Zhang
Weijie Xu
Fanyou Wu
Chandan K. Reddy
29
0
0
12 May 2025
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning
Xiaotian Lin
Yanlin Qi
Yizhang Zhu
Themis Palpanas
Chengliang Chai
Nan Tang
Yuyu Luo
26
0
0
12 May 2025
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Kai Hua
Steven Wu
Ge Zhang
Ke Shen
LRM
28
0
0
12 May 2025
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Rei Higuchi
Taiji Suzuki
36
0
0
12 May 2025
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
Jinuk Kim
Marwa El Halabi
W. Park
Clemens JS Schaefer
Deokjae Lee
Yeonhong Park
Jae W. Lee
Hyun Oh Song
MQ
34
0
0
11 May 2025
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Zihan Qiu
Zhaoxiang Wang
Bo Zheng
Zeyu Huang
Kaiyue Wen
...
Fei Huang
Suozhi Huang
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
40
0
0
10 May 2025
Attention Is Not All You Need: The Importance of Feedforward Networks in Transformer Models
Attention Is Not All You Need: The Importance of Feedforward Networks in Transformer Models
Isaac Gerber
34
0
0
10 May 2025
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
HamidReza Imani
Jiaxin Peng
Peiman Mohseni
Abdolah Amirany
Tarek A. El-Ghazawi
MoE
31
0
0
10 May 2025
xGen-small Technical Report
xGen-small Technical Report
Erik Nijkamp
Bo Pang
Egor Pakhomov
Akash Gokul
Jin Qu
Silvio Savarese
Yingbo Zhou
Caiming Xiong
LLMAG
58
0
0
10 May 2025
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
Patrick Blumenberg
Thomas Graave
Tim Fingscheidt
MQ
24
0
0
10 May 2025
LLMs Outperform Experts on Challenging Biology Benchmarks
LLMs Outperform Experts on Challenging Biology Benchmarks
Lennart Justen
ELM
30
0
0
09 May 2025
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2
Vytenis Šliogeris
Povilas Daniušis
Arturas Nakvosas
CLL
40
0
0
09 May 2025
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Stefan Vasilev
Christian Herold
Baohao Liao
Seyyed Hadi Hashemi
Shahram Khadivi
Christof Monz
MU
191
0
0
09 May 2025
Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies
Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies
Massimiliano Pronesti
Joao Bettencourt-Silva
Paul Flanagan
Alessandra Pascale
Oisin Redmond
Anya Belz
Yufang Hou
38
0
0
09 May 2025
A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets
A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets
Ryan Lagasse
Aidan Kiernans
Avijit Ghosh
Shiri Dori-Hacohen
31
0
0
09 May 2025
Stability in Single-Peaked Strategic Resource Selection Games
Stability in Single-Peaked Strategic Resource Selection Games
Henri Zeiler
32
0
0
09 May 2025
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Joshua Harris
Fan Grayson
Felix Feldman
Timothy Laurence
Toby Nonnenmacher
...
Leo Loman
Selina Patel
Thomas Finnie
Samuel Collins
Michael Borowitz
AI4MH
LM&MA
ELM
54
0
0
09 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
J. Zhang
Jue Wang
Yanjie Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
54
0
0
09 May 2025
KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification
KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification
Qianbo Zang
Christophe Zgrzendek
Igor Tchappi
Afshin Khadangi
Johannes Sedlmeir
VLM
35
0
0
08 May 2025
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
Lennart Luettgau
Harry Coppock
Magda Dubois
Christopher Summerfield
Cozmin Ududec
31
0
0
08 May 2025
Reasoning Models Don't Always Say What They Think
Reasoning Models Don't Always Say What They Think
Yanda Chen
Joe Benton
Ansh Radhakrishnan
Jonathan Uesato
Carson E. Denison
...
Vlad Mikulik
Samuel R. Bowman
Jan Leike
Jared Kaplan
E. Perez
ReLM
LRM
68
14
1
08 May 2025
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities
Kalyan Nakka
Jimmy Dani
Ausmit Mondal
Nitesh Saxena
AAML
35
0
0
08 May 2025
Crosslingual Reasoning through Test-Time Scaling
Crosslingual Reasoning through Test-Time Scaling
Zheng-Xin Yong
Muhammad Farid Adilazuarda
Jonibek Mansurov
Ruochen Zhang
Niklas Muennighoff
Carsten Eickhoff
Genta Indra Winata
Julia Kreutzer
Stephen H. Bach
Alham Fikri Aji
LRM
ELM
187
0
0
08 May 2025
RICo: Refined In-Context Contribution for Automatic Instruction-Tuning Data Selection
RICo: Refined In-Context Contribution for Automatic Instruction-Tuning Data Selection
Yixin Yang
Qingxiu Dong
Linli Yao
Fangwei Zhu
Zhifang Sui
48
0
0
08 May 2025
Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters
Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters
David Exler
Mark Schutera
Markus Reischl
Luca Rettenberger
56
0
0
07 May 2025
A Reasoning-Focused Legal Retrieval Benchmark
A Reasoning-Focused Legal Retrieval Benchmark
Lucia Zheng
Neel Guha
Javokhir Arifov
Sarah Zhang
Michal Skreta
Christopher D. Manning
Peter Henderson
Daniel E. Ho
AILaw
RALM
ELM
102
3
0
06 May 2025
1234...161718
Next