ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.19414
  4. Cited By
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation

Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation

26 February 2025
Shiven Sinha
Shashwat Goel
Ponnurangam Kumaraguru
Jonas Geiping
Matthias Bethge
Ameya Prabhu
    ReLM
    ELM
    LRM
ArXivPDFHTML

Papers citing "Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation"

10 / 10 papers shown
Title
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
161
1,503
0
22 Jan 2025
Leveraging Print Debugging to Improve Code Generation in Large Language
  Models
Leveraging Print Debugging to Improve Code Generation in Large Language Models
Xueyu Hu
Kun Kuang
Jiankai Sun
Hongxia Yang
Leilei Gan
22
10
0
10 Jan 2024
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
257
211
0
20 Oct 2023
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool
Gelei Deng
Yi Liu
Víctor Mayoral-Vilches
Peng Liu
Yuekang Li
Yuan Xu
Tianwei Zhang
Yang Liu
M. Pinzger
Stefan Rass
LLMAG
37
85
0
13 Aug 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
226
4,085
0
09 Jun 2023
Measuring Progress on Scalable Oversight for Large Language Models
Measuring Progress on Scalable Oversight for Large Language Models
Sam Bowman
Jeeyoon Hyun
Ethan Perez
Edwin Chen
Craig Pettit
...
Tristan Hume
Yuntao Bai
Zac Hatfield-Dodds
Benjamin Mann
Jared Kaplan
ALM
ELM
56
125
0
04 Nov 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
36
627
0
07 Feb 2022
A Survey on Automated Fact-Checking
A Survey on Automated Fact-Checking
Zhijiang Guo
Michael Schlichtkrull
Andreas Vlachos
65
470
0
26 Aug 2021
CounterExample Guided Neural Synthesis
CounterExample Guided Neural Synthesis
Elizabeth Polgreen
Ralph Abboud
Daniel Kroening
NAI
17
9
0
25 Jan 2020
FEVER: a large-scale dataset for Fact Extraction and VERification
FEVER: a large-scale dataset for Fact Extraction and VERification
James Thorne
Andreas Vlachos
Christos Christodoulopoulos
Arpit Mittal
HILM
107
1,633
0
14 Mar 2018
1