ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.12022
  4. Cited By
GPQA: A Graduate-Level Google-Proof Q&A Benchmark

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

20 November 2023
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
    AI4MHELM
ArXiv (abs)PDFHTML

Papers citing "GPQA: A Graduate-Level Google-Proof Q&A Benchmark"

50 / 289 papers shown
Title
MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation
Shen Yuan
Yin Zheng
Taifeng Wang
Binbin Liu
Hongteng Xu
MoMe
51
0
0
01 Jul 2025
BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning
BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning
Xuechen Zhang
Zijian Huang
Yingcong Li
Chenshun Ni
Jiasi Chen
Samet Oymak
OffRLMoELRM
34
0
0
20 Jun 2025
Arch-Router: Aligning LLM Routing with Human Preferences
Arch-Router: Aligning LLM Routing with Human Preferences
Co Tran
Salman Paracha
Adil Hafeez
Shuguang Chen
29
0
0
19 Jun 2025
DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling
DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling
Fei Wang
Xingchen Wan
Ruoxi Sun
Jiefeng Chen
Sercan Ö. Arık
LRM
24
0
0
19 Jun 2025
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute
Sheng Liu
Tianlang Chen
Pan Lu
Haotian Ye
Yizheng Chen
Lei Xing
James Zou
ReLMLRM
19
0
0
18 Jun 2025
SLR: An Automated Synthesis Framework for Scalable Logical Reasoning
SLR: An Automated Synthesis Framework for Scalable Logical Reasoning
Lukas Helff
Ahmad Omar
Felix Friedrich
Wolfgang Stammer
Antonia Wüst
Tim Woydt
Rupert Mitchell
P. Schramowski
Kristian Kersting
LRM
32
0
0
18 Jun 2025
Optimizing Length Compression in Large Reasoning Models
Optimizing Length Compression in Large Reasoning Models
Zhengxiang Cheng
Dongping Chen
Mingyang Fu
Tianyi Zhou
OffRLMQLRM
35
0
0
17 Jun 2025
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
Kaiyuan Chen
Y. Ren
Yang Liu
Xiaobo Hu
Haotong Tian
...
Yuan Jiang
Zexuan Liu
Zihan Yin
Zijian Ma
Zhiwen Mo
42
0
0
16 Jun 2025
Domain Specific Benchmarks for Evaluating Multimodal Large Language Models
Domain Specific Benchmarks for Evaluating Multimodal Large Language Models
Khizar Anjuma
Muhammad Arbab Arshad
Kadhim Hayawi
Efstathios Polyzos
A. Tariq
...
Nishith Reddy Mannuru
Ravi Varma Kumar Bevara
Taslim Mahbub
Muhammad Zeeshan Akram
Sakib Shahriar
ELMLRM
54
0
0
15 Jun 2025
Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models
Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models
Kaiyuan Liu
Chen Shen
Zhanwei Zhang
Junjie Liu
Xiaosong Yuan
Jieping Ye
ReLMLRM
46
0
0
14 Jun 2025
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
Hao Xu
Jiacheng Liu
Yejin Choi
Noah A. Smith
Hannaneh Hajishirzi
27
0
0
13 Jun 2025
Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback
Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback
Dongwei Jiang
Alvin Zhang
Andrew Wang
Nicholas Andrews
Daniel Khashabi
LRM
27
0
0
13 Jun 2025
How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts?
How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts?
Sohee Yang
Sang-Woo Lee
Nora Kassner
Daniela Gottesman
Sebastian Riedel
Mor Geva
LRM
117
0
0
12 Jun 2025
Code Execution as Grounded Supervision for LLM Reasoning
Code Execution as Grounded Supervision for LLM Reasoning
Dongwon Jung
Wenxuan Zhou
Muhao Chen
OffRLLRM
98
0
0
12 Jun 2025
ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization
ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization
Zhensheng Jin
Xinze Li
Yifan Ji
Chunyi Peng
Zhenghao Liu
Qi Shi
Y. Yan
Shuo Wang
Furong Peng
Ge Yu
LRM
108
0
0
12 Jun 2025
Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training
Shurui Gui
Shuiwang Ji
LRM
70
0
0
11 Jun 2025
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
Prakamya Mishra
Jiang-Long Liu
Jialian Wu
Xiaodong Yu
Zicheng Liu
Emad Barsoum
LRM
58
0
0
11 Jun 2025
RePO: Replay-Enhanced Policy Optimization
RePO: Replay-Enhanced Policy Optimization
Siheng Li
Zhanhui Zhou
W. Lam
Chao Yang
Chaochao Lu
OffRL
85
0
0
11 Jun 2025
Improved Supervised Fine-Tuning for Large Language Models to Mitigate Catastrophic Forgetting
Improved Supervised Fine-Tuning for Large Language Models to Mitigate Catastrophic Forgetting
Fei Ding
Baiqiao Wang
CLL
91
0
0
11 Jun 2025
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Xinyu Yang
Yuwei An
Hongyi Liu
Tianqi Chen
Beidi Chen
SyDaLRM
150
0
0
11 Jun 2025
Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained Reflection
Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained Reflection
Zongxian Yang
Jiayu Qian
Zegao Peng
Haoyu Zhang
Z. Huang
LRM
25
0
0
11 Jun 2025
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
Polina Kirichenko
Mark Ibrahim
Kamalika Chaudhuri
Samuel J. Bell
LRM
25
0
0
10 Jun 2025
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
Chenlong Wang
Yuanning Feng
Dongping Chen
Zhaoyang Chu
Ranjay Krishna
Tianyi Zhou
LRM
30
0
0
10 Jun 2025
Reinforcement Learning Teachers of Test Time Scaling
Edoardo Cetin
Tianyu Zhao
Yujin Tang
OffRLReLMLRM
55
0
0
10 Jun 2025
LEANN: A Low-Storage Vector Index
LEANN: A Low-Storage Vector Index
Yichuan Wang
Shu Liu
Zhifei Li
Yongji Wu
Ziming Mao
...
Yang Zhou
Ion Stoica
Sewon Min
Matei A. Zaharia
Joseph E. Gonzalez
25
0
0
09 Jun 2025
HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains
HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains
Shijie Wang
Yilun Zhang
Zeyu Lai
Dexing Kong
30
0
0
09 Jun 2025
From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?
Zhanke Zhou
Xiao Feng
Zhaocheng Zhu
Jiangchao Yao
Sanmi Koyejo
Bo Han
LRM
22
0
0
09 Jun 2025
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Mickel Liu
L. Jiang
Yancheng Liang
S. Du
Yejin Choi
Tim Althoff
Natasha Jaques
AAMLLRM
30
0
0
09 Jun 2025
Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting
Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting
Lennart Meincke
Ethan R. Mollick
Lilach Mollick
Dan Shapiro
LRM
14
0
0
08 Jun 2025
dots.llm1 Technical Report
dots.llm1 Technical Report
Bi Huo
Bin Tu
Cheng Qin
Da Zheng
Debing Zhang
...
Yuqiu Ji
Ze Wen
Zhenhai Liu
Zichao Li
Zilong Liao
MoE
61
0
0
06 Jun 2025
Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language Models
Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language Models
Peijie Liu
Fengli Xu
Yong Li
LRM
58
0
0
06 Jun 2025
Large Language Models are Demonstration Pre-Selectors for Themselves
Large Language Models are Demonstration Pre-Selectors for Themselves
Jiarui Jin
Yuwei Wu
Haoxuan Li
Xiaoting He
Weinan Zhang
Y. Yang
Yong Yu
Jun Wang
Mengyue Yang
70
0
0
06 Jun 2025
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design
Lin Sun
Weihong Lin
Jinzhu Wu
Yongfu Zhu
Xiaoqi Jian
...
Change Jia
Linglin Zhang
Sai-er Hu
Yuhan Wu
Xiangzheng Zhang
ELMLRM
131
0
0
05 Jun 2025
Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification
Chengwu Liu
Ye Yuan
Yichun Yin
Yan Xu
Xin Xu
Zaoyu Chen
Yasheng Wang
Lifeng Shang
Qun Liu
Ming Zhang
LRM
148
0
0
05 Jun 2025
Quantifying Cross-Modality Memorization in Vision-Language Models
Yuxin Wen
Yangsibo Huang
Tom Goldstein
Ravi Kumar
Badih Ghazi
Chiyuan Zhang
115
0
0
05 Jun 2025
Dissecting Long Reasoning Models: An Empirical Study
Yongyu Mu
Jiali Zeng
Bei Li
Xinyan Guan
Fandong Meng
Jie Zhou
Tong Xiao
Jingbo Zhu
OffRLLRM
113
0
0
05 Jun 2025
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Xin Jin
Zhenguo Li
James T. Kwok
Yu Zhang
LRM
108
0
0
05 Jun 2025
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
Kejian Zhu
Zhuoran Jin
Hongbang Yuan
Jiachun Li
Shangqing Tu
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
VLMLRM
82
0
0
04 Jun 2025
MiMo-VL Technical Report
MiMo-VL Technical Report
Xiaomi LLM-Core Team
Zihao Yue
Zhenru Lin
Yifan Song
Weikun Wang
...
Di Zhang
Chong Ma
Chang Liu
Can Cai
Bingquan Xia
OffRLMoEVLMLRM
89
0
0
04 Jun 2025
Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration
Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration
Junqi Gao
Zhichang Guo
Dazhi Zhang
Dong Li
Runze Liu
Pengfei Li
Kai Tian
Biqing Qi
26
0
0
04 Jun 2025
EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation
EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation
Jinghan Jia
Hadi Reisizadeh
Chongyu Fan
Nathalie Baracaldo
Mingyi Hong
Sijia Liu
LRM
133
0
0
04 Jun 2025
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
Shihan Dou
Ming Zhang
Chenhao Huang
Jiayi Chen
F. Chen
...
Wei Chengzhi
Lin Yan
Qi Zhang
Xuanjing Huang
Xuanjing Huang
ELM
88
0
0
03 Jun 2025
One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL
One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL
Hyungjoo Chae
Dongjin Kang
J. Kim
Beong-woo Kwak
Sunghyun Park
Haeju Park
Jinyoung Yeo
M. Lee
Kyungjae Lee
ReLMLRM
57
0
0
03 Jun 2025
Answer Convergence as a Signal for Early Stopping in Reasoning
Answer Convergence as a Signal for Early Stopping in Reasoning
Xin Liu
Lu Wang
LRM
68
0
0
03 Jun 2025
DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
Jennifer Chen
Aidar Myrzakhan
Yaxin Luo
Hassaan Muhammad Khan
Sondos Mahmoud Bsharat
Zhiqiang Shen
VLM
56
0
0
02 Jun 2025
Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes
Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes
Meng Li
Michael Vrazitulis
David Schlangen
63
0
0
02 Jun 2025
ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation
ChemAU: Harness the Reasoning of LLMs in Chemical Research with Adaptive Uncertainty Estimation
Xinyi Liu
Lipeng Ma
Yixuan Li
Weidong Yang
Qingyuan Zhou
Jiayi Song
Shuhao Li
Ben Fei
LRM
55
0
0
01 Jun 2025
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
Zihang Liu
Tianyu Pang
Oleg Balabanov
Chaoqun Yang
Tianjin Huang
L. Yin
Yaoqing Yang
Shiwei Liu
LRM
55
1
0
01 Jun 2025
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
Thinh Pham
Nguyen Nguyen
Pratibha Zunjare
Weiyuan Chen
Yu-Min Tseng
Tu Vu
RALMReLMELMALMLRM
96
0
0
01 Jun 2025
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions
Weijie Xu
Shixian Cui
Xi Fang
Chi Xue
Stephanie Eckman
Chandan K. Reddy
ELM
39
0
0
31 May 2025
123456
Next