Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.19414
Cited By
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation
26 February 2025
Shiven Sinha
Shashwat Goel
Ponnurangam Kumaraguru
Jonas Geiping
Matthias Bethge
Ameya Prabhu
ReLM
ELM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation"
10 / 10 papers shown
Title
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
218
1,503
0
22 Jan 2025
Leveraging Print Debugging to Improve Code Generation in Large Language Models
Xueyu Hu
Kun Kuang
Jiankai Sun
Hongxia Yang
Leilei Gan
24
11
0
10 Jan 2024
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
257
211
0
20 Oct 2023
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool
Gelei Deng
Yi Liu
Víctor Mayoral-Vilches
Peng Liu
Yuekang Li
Yuan Xu
Tianwei Zhang
Yang Liu
M. Pinzger
Stefan Rass
LLMAG
37
87
0
13 Aug 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
236
4,186
0
09 Jun 2023
Measuring Progress on Scalable Oversight for Large Language Models
Sam Bowman
Jeeyoon Hyun
Ethan Perez
Edwin Chen
Craig Pettit
...
Tristan Hume
Yuntao Bai
Zac Hatfield-Dodds
Benjamin Mann
Jared Kaplan
ALM
ELM
56
125
0
04 Nov 2022
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
44
627
0
07 Feb 2022
A Survey on Automated Fact-Checking
Zhijiang Guo
Michael Schlichtkrull
Andreas Vlachos
67
479
0
26 Aug 2021
CounterExample Guided Neural Synthesis
Elizabeth Polgreen
Ralph Abboud
Daniel Kroening
NAI
21
9
0
25 Jan 2020
FEVER: a large-scale dataset for Fact Extraction and VERification
James Thorne
Andreas Vlachos
Christos Christodoulopoulos
Arpit Mittal
HILM
113
1,633
0
14 Mar 2018
1