Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.03300
Cited By
v1
v2
v3 (latest)
Measuring Massive Multitask Language Understanding
7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Measuring Massive Multitask Language Understanding"
50 / 3,408 papers shown
Title
Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff
Maximilian Holsman
Yukun Huang
Bhuwan Dhingra
128
0
0
28 Feb 2025
The Power of Personality: A Human Simulation Perspective to Investigate Large Language Model Agents
Yifan Duan
Yihong Tang
Xuefeng Bai
Kehai Chen
Junlin Li
Min Zhang
LLMAG
532
2
0
28 Feb 2025
BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology
Ludovico Mitchener
Jon M. Laurent
Benjamin Tenmann
Siddharth Narayanan
Geemi P Wellawatte
A. White
Lorenzo Sani
Samuel G. Rodriques
LLMAG
LM&MA
ELM
131
9
0
28 Feb 2025
Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective
Yuko Nakagi
Keigo Tada
Sota Yoshino
Shinji Nishimoto
Yu Takagi
LRM
128
0
0
28 Feb 2025
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
Hanjiang Hu
Alexander Robey
Changliu Liu
AAML
LLMSV
105
2
0
28 Feb 2025
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
Xujie Yuan
Yongxu Liu
Shimin Di
Shiwen Wu
Libin Zheng
Rui Meng
Lei Chen
Xiaofang Zhou
Jian Yin
148
0
0
28 Feb 2025
PaCA: Partial Connection Adaptation for Efficient Fine-Tuning
Sunghyeon Woo
Sol Namkung
Sunwoo Lee
Inho Jeong
Beomseok Kim
Dongsuk Jeon
117
1
0
28 Feb 2025
Contextualizing biological perturbation experiments through language
Menghua Wu
Russell Littman
Jacob Levine
Lin Qiu
Tommaso Biancalani
David Richmond
Jan-Christian Huetter
71
0
0
28 Feb 2025
SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models
Grigor Nalbandyan
Rima Shahbazyan
Evelina Bakhturina
ELM
71
2
0
28 Feb 2025
Plan2Align: Predictive Planning Based Test-Time Preference Alignment for Large Language Models
Kuang-Da Wang
Teng-Ruei Chen
Yu-Heng Hung
Shuoyang Ding
Yueh-Hua Wu
Yu-Chun Wang
Chao-Han Huck Yang
Chao-Han Huck Yang
Wen-Chih Peng
Ping-Chun Hsieh
118
0
0
28 Feb 2025
Digital Player: Evaluating Large Language Models based Human-like Agent in Games
Jinqiao Wang
Kai Wang
Shaojie Lin
Runze Wu
Bihan Xu
...
Zhipeng Hu
Z. Fan
Le Li
Tangjie Lyu
Changjie Fan
LLMAG
ELM
AI4CE
137
1
0
28 Feb 2025
Collective Reasoning Among LLMs: A Framework for Answer Validation Without Ground Truth
Seyed Pouyan Mousavi Davoudi
Alireza Shafiee Fard
Alireza Amiri-Margavi
Mahdi Jafari
LRM
113
0
0
28 Feb 2025
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong
Kamilė Stankevičiūtė
Chao-gang Wan
Anmol Kabra
Raphael Thesmar
Johann Lee
Julius Klenke
Carla P. Gomes
Kilian Q. Weinberger
LRM
RALM
119
0
0
27 Feb 2025
PolyPrompt: Automating Knowledge Extraction from Multilingual Language Models with Dynamic Prompt Generation
Nathan Roll
153
1
0
27 Feb 2025
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
Guizhen Chen
Weiwen Xu
Hao Zhang
Hou Pong Chan
Chaoqun Liu
Lidong Bing
Deli Zhao
Anh Tuan Luu
Yu Rong
ReLM
LRM
107
4
0
27 Feb 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
136
8
0
27 Feb 2025
Similarity-Distance-Magnitude Universal Verification
Allen Schmaltz
UQCV
AAML
552
0
0
27 Feb 2025
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants
Franck Cappello
Sandeep Madireddy
Robert Underwood
N. Getty
Nicholas Chia
...
M. Rafique
Eliu A. Huerta
Yangqiu Song
Ian Foster
Rick L. Stevens
130
1
0
27 Feb 2025
Learning to Generate Structured Output with Schema Reinforcement Learning
Yaojie Lu
Haolun Li
Xin Cong
Zhong Zhang
Yesai Wu
Yankai Lin
Zhiyuan Liu
Fangming Liu
Maosong Sun
93
1
0
26 Feb 2025
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
MinHyung Lee
Shinbok Lee
Gaeun Seo
177
1
0
26 Feb 2025
ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction
Jeesu Jung
Chanjun Park
Sangkeun Jung
111
0
0
26 Feb 2025
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions
Gyeongje Cho
Yeonkyoung So
Jaejin Lee
ELM
126
0
0
26 Feb 2025
Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing
Akshat Gupta
Christine Fang
Atahan Ozdemir
Maochuan Lu
Ahmed Alaa
Thomas Hartvigsen
Gopala Anumanchipalli
KELM
123
0
0
26 Feb 2025
Low-Confidence Gold: Refining Low-Confidence Samples for Efficient Instruction Tuning
Hongyi Cal
Jie Li
Wenzhen Dong
99
0
0
26 Feb 2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng
Yunjia Qi
Xiaozhi Wang
Zijun Yao
Bin Xu
Lei Hou
Juanzi Li
ALM
LRM
101
7
0
26 Feb 2025
Shh, don't say that! Domain Certification in LLMs
Cornelius Emde
Alasdair Paren
Preetham Arvind
Maxime Kayser
Tom Rainforth
Thomas Lukasiewicz
Guohao Li
Philip Torr
Adel Bibi
116
2
0
26 Feb 2025
Self-rewarding correction for mathematical reasoning
Wei Xiong
Hanning Zhang
Chenlu Ye
Lichang Chen
Nan Jiang
Tong Zhang
ReLM
KELM
LRM
162
22
0
26 Feb 2025
Stay Focused: Problem Drift in Multi-Agent Debate
Jonas Becker
Lars Benedikt Kaesberg
Andreas Stephan
Jan Philip Wahle
Terry Ruas
Bela Gipp
145
2
0
26 Feb 2025
BIG-Bench Extra Hard
Mehran Kazemi
Bahare Fatemi
Hritik Bansal
John Palowitch
Chrysovalantis Anastasiou
...
Kate Olszewska
Yi Tay
Vinh Q. Tran
Quoc V. Le
Orhan Firat
ELM
LRM
302
13
0
26 Feb 2025
Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMs
Jungsoo Park
Junmo Kang
Gabriel Stanovsky
Alan Ritter
126
0
0
26 Feb 2025
Do LLMs exhibit demographic parity in responses to queries about Human Rights?
Rafiya Javed
Jackie Kay
David Yanni
Abdullah Zaini
Anushe Sheikh
Maribeth Rauh
Iason Gabriel
Laura Weidinger
114
0
0
26 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
218
4
0
26 Feb 2025
Voting or Consensus? Decision-Making in Multi-Agent Debate
Lars Benedikt Kaesberg
Jonas Becker
Jan Philip Wahle
Terry Ruas
Bela Gipp
148
7
0
26 Feb 2025
A General Framework to Enhance Fine-tuning-based LLM Unlearning
J. Ren
Zhenwei Dai
Xianfeng Tang
Hui Liu
Jingying Zeng
...
R. Goutam
Suhang Wang
Yue Xing
Qi He
Hui Liu
MU
278
3
0
25 Feb 2025
MergeIT: From Selection to Merging for Efficient Instruction Tuning
Hongyi Cai
Yuqian Fu
Hongming Fu
Bo Zhao
MoMe
123
0
0
25 Feb 2025
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Matthew Barker
Andrew Bell
Evan Thomas
James Carr
Thomas Andrews
Umang Bhatt
167
2
0
25 Feb 2025
RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
Jianhao Yan
Yun Luo
Yue Zhang
LLMAG
109
2
0
25 Feb 2025
What Makes the Preferred Thinking Direction for LLMs in Multiple-choice Questions?
Yizhe Zhang
Richard He Bai
Zijin Gu
Ruixiang Zhang
Jiatao Gu
Emmanuel Abbe
Samy Bengio
Navdeep Jaitly
BDL
LRM
152
1
0
25 Feb 2025
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
Jake Poznanski
Aman Rangapur
Jon Borchardt
Jason Dunkelberger
Regan Huff
Daniel Lin
Aman Rangapur
Christopher Wilhelm
Kyle Lo
Luca Soldaini
174
7
0
25 Feb 2025
WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging
Ahmed Elhady
Eneko Agirre
Mikel Artetxe
SyDa
79
1
0
25 Feb 2025
Compressing Language Models for Specialized Domains
Miles Williams
G. Chrysostomou
Vitor Jeronymo
Nikolaos Aletras
MQ
118
0
0
25 Feb 2025
Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support
G. Wang
Minyu Gao
Shuai Yang
Ya Zhang
Lizhi He
...
Yexuan Zhang
Wanyue Li
Lu Chen
Jintao Fei
Xin Li
407
2
0
25 Feb 2025
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
Ru Wang
Wei Huang
Selena Song
Haoyu Zhang
Yusuke Iwasawa
Y. Matsuo
Jiaxian Guo
OODD
LRM
130
3
0
25 Feb 2025
Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data
Siqi Guo
Ilgee Hong
Vicente Balmaseda
Changlong Yu
Liang Qiu
Xin Liu
Haoming Jiang
Tuo Zhao
Tianbao Yang
102
0
0
25 Feb 2025
Evaluating Multimodal Generative AI with Korean Educational Standards
Sangkwon Park
Geewook Kim
AI4Ed
ELM
116
0
0
24 Feb 2025
Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks
Andrei Chernov
MoE
85
0
0
24 Feb 2025
Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
Martin Kuo
Jingyang Zhang
Jianyi Zhang
Minxue Tang
Louis DiValentin
...
William Chen
Amin Hass
Tianlong Chen
Yuxiao Chen
Haoyang Li
MU
KELM
123
4
0
24 Feb 2025
Forecasting Rare Language Model Behaviors
Erik Jones
Meg Tong
Jesse Mu
Mohammed Mahfoud
Jan Leike
Roger C. Grosse
Jared Kaplan
William Fithian
Ethan Perez
Mrinank Sharma
99
1
0
24 Feb 2025
LongAttn: Selecting Long-context Training Data via Token-level Attention
Longyun Wu
Dawei Zhu
Guangxiang Zhao
Zhuocheng Yu
Junfeng Ran
Xiangyu Wong
Lin Sun
Sujian Li
108
2
0
24 Feb 2025
Evaluating the Effectiveness of Large Language Models in Automated News Article Summarization
Lionel Richy Panlap Houamegni
Fatih Gedikli
67
0
0
24 Feb 2025
Previous
1
2
3
...
16
17
18
...
67
68
69
Next