Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.12022
Cited By
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
20 November 2023
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MH
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GPQA: A Graduate-Level Google-Proof Q&A Benchmark"
50 / 139 papers shown
Title
Detection and Mitigation of Hallucination in Large Reasoning Models: A Mechanistic Perspective
Zhongxiang Sun
Qipeng Wang
Haoyu Wang
Xiao Zhang
Jun Xu
HILM
LRM
9
0
0
19 May 2025
R1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model
Ali Naseh
Harsh Chaudhari
Jaechul Roh
Mingshi Wu
Alina Oprea
Amir Houmansadr
AAML
ELM
12
0
0
19 May 2025
Disentangling Reasoning and Knowledge in Medical Large Language Models
Rahul Thapa
Qingyang Wu
Kevin Wu
Harrison Zhang
Angela Zhang
...
Joseph Boen
Shriya Reddy
Ben Athiwaratkun
Shuaiwen Leon Song
James Zou
ELM
AI4MH
LM&MA
LRM
25
0
0
16 May 2025
Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning
Yoichi Ishibashi
Taro Yano
Masafumi Oyamada
SyDa
LRM
44
0
0
15 May 2025
Beyond Áha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Zhiyuan Hu
Yansen Wang
Hanze Dong
Yuhui Xu
Amrita Saha
Caiming Xiong
Bryan Hooi
Junnan Li
LRM
24
0
0
15 May 2025
How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference
Nidhal Jegham
Marwen Abdelatti
Lassad Elmoubarki
Abdeltawab Hendawi
26
0
0
14 May 2025
Qwen3 Technical Report
An Yang
A. Li
Baosong Yang
Beichen Zhang
Binyuan Hui
...
Zekun Wang
Zeyu Cui
Zhenru Zhang
Zhenhong Zhou
Zihan Qiu
LLMAG
OSLM
LRM
45
0
0
14 May 2025
Evaluating LLM Metrics Through Real-World Capabilities
Justin K Miller
Wenjia Tang
ELM
ALM
47
0
0
13 May 2025
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Kai Hua
Steven Wu
Ge Zhang
Ke Shen
LRM
28
0
0
12 May 2025
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Xuechen Zhang
Zijian Huang
Chenshun Ni
Ziyang Xiong
Jiacheng Chen
Samet Oymak
ReLM
LRM
45
0
0
12 May 2025
Learning from Peers in Reasoning Models
Tongxu Luo
Wenyu Du
Jiaxi Bi
Stephen Chung
Zhengyang Tang
Hao Yang
M. Zhang
Benyou Wang
LRM
41
0
0
12 May 2025
Assessing the Chemical Intelligence of Large Language Models
Nicholas T. Runcie
Charlotte M. Deane
Fergus Imrie
ELM
LRM
40
0
0
12 May 2025
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Stanislas Laborde
Martin Cousseau
Antoun Yaacoub
Lionel Prevost
MQ
23
0
0
12 May 2025
xGen-small Technical Report
Erik Nijkamp
Bo Pang
Egor Pakhomov
Akash Gokul
Jin Qu
Silvio Savarese
Yingbo Zhou
Caiming Xiong
LLMAG
58
0
0
10 May 2025
LLMs Outperform Experts on Challenging Biology Benchmarks
Lennart Justen
ELM
30
0
0
09 May 2025
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information
Joshua Harris
Fan Grayson
Felix Feldman
Timothy Laurence
Toby Nonnenmacher
...
Leo Loman
Selina Patel
Thomas Finnie
Samuel Collins
Michael Borowitz
AI4MH
LM&MA
ELM
54
0
0
09 May 2025
Stability in Single-Peaked Strategic Resource Selection Games
Henri Zeiler
32
0
0
09 May 2025
RICo: Refined In-Context Contribution for Automatic Instruction-Tuning Data Selection
Yixin Yang
Qingxiu Dong
Linli Yao
Fangwei Zhu
Zhifang Sui
48
0
0
08 May 2025
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
Ziqing Qiao
Yongheng Deng
Jiali Zeng
Dong Wang
Lai Wei
Fandong Meng
Jie Zhou
Ju Ren
Yaoxue Zhang
LRM
54
0
0
08 May 2025
Reasoning Models Don't Always Say What They Think
Yanda Chen
Joe Benton
Ansh Radhakrishnan
Jonathan Uesato
Carson E. Denison
...
Vlad Mikulik
Samuel R. Bowman
Jan Leike
Jared Kaplan
E. Perez
ReLM
LRM
68
14
1
08 May 2025
Improving Model Alignment Through Collective Intelligence of Open-Source LLMS
Junlin Wang
Roy Xie
Shang Zhu
Jue Wang
Ben Athiwaratkun
Bhuwan Dhingra
Shuaiwen Leon Song
Ce Zhang
James Zou
ALM
38
0
0
05 May 2025
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study
Xiaoyu Tian
Sitong Zhao
Haotian Wang
Shuaiting Chen
Yiping Peng
Yunjie Ji
Han Zhao
Xiangang Li
OffRL
LRM
37
0
0
04 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
60
2
0
04 May 2025
Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers
Alice Rueda
Mohammed S. Hassan
Argyrios Perivolaris
Bazen G. Teferra
Reza Samavi
...
Y. Wu
Wenjie Qu
Bo Cao
Divya Sharma
Sridhar Krishnan Venkat Bhat
ELM
LRM
58
0
0
02 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
Yiming Li
LRM
72
2
0
01 May 2025
Phi-4-reasoning Technical Report
Marah Abdin
Sahaj Agarwal
Ahmed Hassan Awadallah
Vidhisha Balachandran
Harkirat Singh Behl
...
Vaishnavi Shrivastava
Vibhav Vineet
Yue Wu
Safoora Yousefi
Guoqing Zheng
ReLM
LRM
90
1
0
30 Apr 2025
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
X. Li
Jiajie Jin
Guanting Dong
Hongjin Qian
Yutao Zhu
Yongkang Wu
Ji-Rong Wen
Zhicheng Dou
LLMAG
LRM
100
2
0
30 Apr 2025
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
Jingyang Yi
Jiazheng Wang
Sida Li
ReLM
OODD
LRM
183
2
0
30 Apr 2025
Automatic Legal Writing Evaluation of LLMs
Ramon Pires
Roseval Malaquias Junior
Rodrigo Nogueira
AILaw
ELM
86
0
0
29 Apr 2025
Computational Reasoning of Large Language Models
Haitao Wu
Zongbo Han
Joey Tianyi Zhou
Huaxi Huang
Changqing Zhang
ELM
LRM
62
0
0
29 Apr 2025
Security Steerability is All You Need
Itay Hazan
Idan Habler
Ron Bitton
Itsik Mantin
AAML
80
0
0
28 Apr 2025
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
Takuya Tamura
Taro Yano
Masafumi Enomoto
Masafumi Oyamada
50
0
0
28 Apr 2025
EvidenceBench: A Benchmark for Extracting Evidence from Biomedical Papers
Jie Wang
Weili Cao
Kaicheng Wang
Xiaoyue Wang
Ashish Dalvi
...
David E. Neal
Maxim Khan
Christopher D. Rosin
R. Paturi
Leon Bergen
33
0
0
25 Apr 2025
DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training
Xiaoyu Tian
Sitong Zhao
Haotian Wang
Shuaiting Chen
Yiping Peng
Yunjie Ji
Han Zhao
Xiangang Li
LRM
59
1
0
24 Apr 2025
Param
Δ
Δ
Δ
for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
85
0
0
23 Apr 2025
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRL
ALM
LRM
44
1
0
23 Apr 2025
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Shi Qiu
Shaoyang Guo
Zhuo-Yang Song
Yizhou Sun
Zeyu Cai
...
Ming-xing Luo
Muhan Zhang
Yaodong Yang
Muhan Zhang
Hua Xing Zhu
AIMat
LRM
32
0
0
22 Apr 2025
Dynamic Early Exit in Reasoning Models
Chenxu Yang
Qingyi Si
Yongjie Duan
Zheliang Zhu
Chenyu Zhu
Zheng Lin
Zheng Lin
Li Cao
Weiping Wang
ReLM
LRM
48
0
0
22 Apr 2025
Synergizing RAG and Reasoning: A Systematic Review
Yunfan Gao
Yun Xiong
Yijie Zhong
Yuxi Bi
Ming Xue
Haoyu Wang
LRM
AI4CE
141
2
0
22 Apr 2025
A Self-Improving Coding Agent
Maxime Robeyns
Martin Szummer
Laurence Aitchison
LLMAG
46
0
0
21 Apr 2025
Trillion 7B Technical Report
Sungjun Han
Juyoung Suk
Suyeong An
Hyungguk Kim
Kyuseok Kim
Wonsuk Yang
Seungtaek Choi
Jamin Shin
155
1
0
21 Apr 2025
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Jasper Götting
Pedro Medeiros
Jon G Sanders
Nathaniel Li
Long Phan
Karam Elabd
Lennart Justen
Dan Hendrycks
Seth Donoughe
ELM
63
2
0
21 Apr 2025
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
44
0
0
21 Apr 2025
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
174
0
0
21 Apr 2025
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods
Junlin Wang
Shang Zhu
Jon Saad-Falcon
Ben Athiwaratkun
Qingyang Wu
Jue Wang
Shuaiwen Leon Song
Ce Zhang
Bhuwan Dhingra
James Y. Zou
LRM
53
1
0
18 Apr 2025
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Yutong Xia
Ao Qu
Yunhan Zheng
Yihong Tang
Dingyi Zhuang
...
Cathy Wu
Roger Zimmermann
Lijun Sun
Roger Zimmermann
Jinhua Zhao
AI4CE
111
0
0
15 Apr 2025
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter
Shrimai Prabhumoye
Matvei Novikov
Seungju Han
Ying Lin
...
Eric Nyberg
Yejin Choi
M. Patwary
M. Shoeybi
Bryan Catanzaro
ReLM
OffRL
LRM
200
0
1
15 Apr 2025
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
Wenyuan Zhang
Shuaiyi Nie
Xinghua Zhang
Zefeng Zhang
Tingwen Liu
ELM
LRM
49
2
0
14 Apr 2025
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang
C. Qu
Zuming Huang
Wei Chu
Fangzhen Lin
Wenhu Chen
OffRL
ReLM
SyDa
LRM
VLM
77
1
0
10 Apr 2025
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion
Longguang Zhong
Fanqi Wan
Ziyi Yang
Guosheng Liang
Tianyuan Shi
Xiaojun Quan
MoMe
57
0
0
09 Apr 2025
1
2
3
Next