Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.12022
Cited By
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
20 November 2023
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MH
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"GPQA: A Graduate-Level Google-Proof Q&A Benchmark"
50 / 289 papers shown
Title
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRL
ALM
LRM
143
9
0
23 Apr 2025
Dynamic Early Exit in Reasoning Models
Chenxu Yang
Qingyi Si
Yongjie Duan
Zheliang Zhu
Chenyu Zhu
Zheng Lin
Zheng Lin
Li Cao
Weiping Wang
ReLM
LRM
182
22
0
22 Apr 2025
Synergizing RAG and Reasoning: A Systematic Review
Yunfan Gao
Yun Xiong
Yijie Zhong
Yuxi Bi
Ming Xue
Haoyu Wang
LRM
AI4CE
452
7
0
22 Apr 2025
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Shi Qiu
Shaoyang Guo
Zhuo-Yang Song
Yizhou Sun
Zeyu Cai
...
Ming-xing Luo
Muhan Zhang
Yaodong Yang
Muhan Zhang
Hua Xing Zhu
AIMat
LRM
134
9
0
22 Apr 2025
Tina: Tiny Reasoning Models via LoRA
Shangshang Wang
Julian Asilis
Ömer Faruk Akgül
Enes Burak Bilgin
Ollie Liu
Willie Neiswanger
OffRL
LRM
137
9
0
22 Apr 2025
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Li Sheng
Li Sheng
Xuekai Zhu
...
Youbang Sun
Zhiyuan Ma
Lifan Yuan
Ning Ding
Bowen Zhou
OffRL
421
31
0
22 Apr 2025
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
477
0
0
21 Apr 2025
A Self-Improving Coding Agent
Maxime Robeyns
Martin Szummer
Laurence Aitchison
LLMAG
138
1
0
21 Apr 2025
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Jasper Götting
Pedro Medeiros
Jon G Sanders
Nathaniel Li
Long Phan
Karam Elabd
Lennart Justen
Dan Hendrycks
Seth Donoughe
ELM
139
5
0
21 Apr 2025
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
178
17
0
21 Apr 2025
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Yutong Xia
Ao Qu
Yunhan Zheng
Yihong Tang
Dingyi Zhuang
...
Cathy Wu
Roger Zimmermann
Lijun Sun
Roger Zimmermann
Jinhua Zhao
AI4CE
398
2
0
15 Apr 2025
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter
Shrimai Prabhumoye
Matvei Novikov
Seungju Han
Ying Lin
...
Eric Nyberg
Yejin Choi
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ReLM
OffRL
LRM
468
4
1
15 Apr 2025
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Can Jin
Hongwu Peng
Qixin Zhang
Yujin Tang
Dimitris N. Metaxas
Tong Che
LLMAG
LRM
437
9
0
14 Apr 2025
RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability
Yuanhang Zhang
Zihao Zeng
Dongbai Li
Yao Huang
Zhijie Deng
Yinpeng Dong
LRM
101
10
0
14 Apr 2025
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
Wenyuan Zhang
Jiawei Sheng
Xinghua Zhang
Zefeng Zhang
Tingwen Liu
ELM
LRM
127
5
0
14 Apr 2025
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance
Ram Mohan Rao Kadiyala
Siddartha Pullakhandam
Siddhant Gupta
Drishti Sharma
Jebish Purbey
Kanwal Mehreen
Muhammad Arham
Hamza Farooq
132
0
0
13 Apr 2025
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis
Xin Gao
Qizhi Pei
Zinan Tang
Yongqian Li
Honglin Lin
Jiang Wu
Conghui He
Lijun Wu
SyDa
108
0
0
11 Apr 2025
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang
Chao Qu
Zuming Huang
Wei Chu
Fangzhen Lin
Wenhu Chen
OffRL
ReLM
SyDa
LRM
VLM
161
40
0
10 Apr 2025
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion
Longguang Zhong
Fanqi Wan
Ziyi Yang
Guosheng Liang
Tianyuan Shi
Xiaojun Quan
MoMe
130
1
0
09 Apr 2025
Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations
Zican Dong
Han Peng
Peiyu Liu
Wayne Xin Zhao
Dong Wu
Feng Xiao
Ziyi Wang
MoE
86
2
0
09 Apr 2025
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Qingyang Zhang
Haitao Wu
Changqing Zhang
Peilin Zhao
Yatao Bian
ReLM
LRM
186
19
0
08 Apr 2025
SEA-LION: Southeast Asian Languages in One Network
Raymond Ng
Thanh Ngan Nguyen
Yuli Huang
Ngee Chia Tai
Wai Yi Leong
...
David Ong Tat-Wee
B. Liu
William-Chandra Tjhi
Min Zhang
Leslie Teo
131
14
0
08 Apr 2025
Universal Collection of Euclidean Invariants between Pairs of Position-Orientations
Gijs Bellaard
B. Smets
R. Duits
133
0
0
04 Apr 2025
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Philip Lippmann
Jie Yang
LRM
173
0
0
02 Apr 2025
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Mohan Zhang
Pingzhi Li
Jie Peng
Mufan Qiu
Tianlong Chen
MoE
215
0
0
02 Apr 2025
Z1: Efficient Test-time Scaling with Code
Zhaojian Yu
Yinghao Wu
Yilun Zhao
Arman Cohan
Xiao-Ping Zhang
LRM
115
14
0
01 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
139
2
0
01 Apr 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Ziyi Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
205
5
0
01 Apr 2025
Efficient Inference for Large Reasoning Models: A Survey
Yi Liu
Jiaying Wu
Yufei He
Hongcheng Gao
Hongyu Chen
Baolong Bi
Jiaheng Zhang
Zhiqi Huang
Bryan Hooi
Bryan Hooi
LLMAG
LRM
179
17
0
29 Mar 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Weinan Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Yueting Zhuang
LM&Ro
LRM
190
9
0
27 Mar 2025
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search
Yunhai Hu
Yilun Zhao
Chen Zhao
Arman Cohan
ReLM
LRM
134
1
0
26 Mar 2025
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking
Xiaoyu Tian
Sitong Zhao
Haotian Wang
Shuaiting Chen
Yunjie Ji
Yiping Peng
Han Zhao
Xiangang Li
ReLM
ELM
LRM
117
10
0
25 Mar 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
202
137
0
24 Mar 2025
OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery
Vignesh Prabhakar
Md Amirul Islam
Adam Atanas
Yansen Wang
J. N. Han
...
Rucha Apte
Robert Clark
Kang Xu
Zihan Wang
Kai Liu
LRM
230
5
0
22 Mar 2025
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey
Xiaoou Liu
Tiejin Chen
Longchao Da
Chacha Chen
Zhen Lin
Hua Wei
HILM
146
8
0
20 Mar 2025
Command R7B Arabic: A Small, Enterprise Focused, Multilingual, and Culturally Aware Arabic LLM
Yazeed Alnumay
Alexandre Barbet
Anna Bialas
William Darling
Shaan Desai
...
Stephanie Howe
Olivia Lasche
Justin Lee
Anirudh Shrinivason
Jennifer Tracey
126
0
0
18 Mar 2025
Safety Evaluation and Enhancement of DeepSeek Models in Chinese Contexts
Wenjing Zhang
Xuejiao Lei
Zhaoxiang Liu
Limin Han
Jiaojiao Zhao
...
Beibei Huang
Rongjia Du
Ning Wang
Kai Wang
Shiguo Lian
ELM
117
1
0
18 Mar 2025
The Amazon Nova Family of Models: Technical Report and Model Card
Amazon AGI
Aaron Langford
A. Shah
Abhanshu Gupta
Abhimanyu Bhatter
...
Benjamin Biggs
Benjamin Ott
Bhanu Vinzamuri
Bharath Venkatesh
Bhavana Ganesh
26
21
0
17 Mar 2025
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
Hao Cui
Zahra Shamsi
Gowoon Cheon
Xuejian Ma
Shutong Li
...
Eun-Ah Kim
M. Brenner
Viren Jain
Sameera Ponda
Subhashini Venugopalan
ELM
LRM
144
4
0
14 Mar 2025
Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation Engineering
Xinyu Tang
Xiaolei Wang
Zhihao Lv
Yingqian Min
Wayne Xin Zhao
Binbin Hu
Ziqi Liu
Qing Cui
LRM
157
9
0
14 Mar 2025
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
Chuan Qin
Xiusi Chen
Chengrui Wang
Pengmin Wu
Xi Chen
...
Han Wu
Chong Li
Yuanchun Zhou
H. Xiong
Hengshu Zhu
ELM
91
2
0
12 Mar 2025
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Tianhe Lin
Jian Xie
Siyu Yuan
Deqing Yang
ReLM
LRM
153
3
0
10 Mar 2025
WildIFEval: Instruction Following in the Wild
Gili Lior
Asaf Yehudai
Ariel Gera
L. Ein-Dor
143
0
0
09 Mar 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan
Yongliang Shen
Yuhang Liu
Jin Jiang
Hao Fei
Jian Shao
Yueting Zhuang
LRM
ReLM
144
10
0
09 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
LRM
229
7
0
08 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
Chenyu Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
198
5
0
07 Mar 2025
LLMs Can Generate a Better Answer by Aggregating Their Own Responses
Zichong Li
Xinyu Feng
Yuheng Cai
Zixuan Zhang
Tianyi Liu
Chen Liang
Weizhu Chen
Haoyu Wang
Tiejun Zhao
LRM
117
2
0
06 Mar 2025
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
Yi Shen
Jing Zhang
Jieyun Huang
Shuming Shi
Wenjing Zhang
Jiangze Yan
Rongjia Du
Ning Wang
Kai Wang
Shiguo Lian
LRM
132
54
0
06 Mar 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
121
6
0
06 Mar 2025
Are Large Language Models Good In-context Learners for Financial Sentiment Analysis?
Xinyu Wei
Luojia Liu
AIFin
127
2
0
06 Mar 2025
Previous
1
2
3
4
5
6
Next