Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.13323
Cited By
v1
v2
v3 (latest)
Are Large Language Models Memorizing Bug Benchmarks?
20 November 2024
Daniel Ramos
Claudia Mamede
Kush Jain
Paulo Canelas
Catarina Gamboa
Claire Le Goues
PILM
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Are Large Language Models Memorizing Bug Benchmarks?"
30 / 30 papers shown
Title
The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason
Shanchao Liang
Spandan Garg
Roshanak Zilouchian Moghaddam
17
1
0
14 Jun 2025
Rethinking the effects of data contamination in Code Intelligence
Zhen Yang
Hongyi Lin
Yifan He
Jie Xu
Zeyu Sun
Shuo Liu
P. Wang
Zhongxing Yu
Qingyuan Liang
45
0
0
03 Jun 2025
Do LLMs Consider Security? An Empirical Study on Responses to Programming Questions
Amirali Sajadi
Binh Le
A. Nguyen
Kostadin Damevski
Preetha Chatterjee
104
3
0
20 Feb 2025
RepairBench: Leaderboard of Frontier Models for Program Repair
André Silva
Martin Monperrus
KELM
60
9
0
27 Sep 2024
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
...
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
VLM
MoE
OSLM
149
922
0
31 Jul 2024
Agentless: Demystifying LLM-based Software Engineering Agents
Chunqiu Steven Xia
Yinlin Deng
Soren Dunn
Lingming Zhang
LLMAG
116
121
0
01 Jul 2024
CodeGemma: Open Code Models Based on Gemma
CodeGemma Team
Heri Zhao
Jeffrey Hui
Joshua Howland
Nam Nguyen
...
Ale Jakse Hartman
Bin Ni
Kathy Korevec
Kelly Schaefer
Scott Huffman
VLM
113
129
0
17 Jun 2024
Benchmarking Benchmark Leakage in Large Language Models
Ruijie Xu
Zengzhi Wang
Run-Ze Fan
Pengfei Liu
129
54
0
29 Apr 2024
Concerned with Data Contamination? Assessing Countermeasures in Code Language Model
Jialun Cao
Wuqi Zhang
Shing-Chi Cheung
60
20
0
25 Mar 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
148
448
0
12 Mar 2024
StarCoder 2 and The Stack v2: The Next Generation
Anton Lozhkov
Raymond Li
Loubna Ben Allal
Federico Cassano
J. Lamy-Poirier
...
Sean M. Hughes
Thomas Wolf
Arjun Guha
Leandro von Werra
H. D. Vries
OSLM
ELM
83
362
0
29 Feb 2024
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair
André Silva
Sen Fang
Martin Monperrus
MoMe
KELM
133
51
0
25 Dec 2023
Detecting Pretraining Data from Large Language Models
Weijia Shi
Anirudh Ajith
Mengzhou Xia
Yangsibo Huang
Daogao Liu
Terra Blevins
Danqi Chen
Luke Zettlemoyer
MIALM
122
201
0
25 Oct 2023
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LRM
151
2,260
0
10 Oct 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez
John Yang
Alexander Wettig
Shunyu Yao
Kexin Pei
Ofir Press
Karthik Narasimhan
ELM
138
647
0
10 Oct 2023
Large Language Models for Test-Free Fault Localization
Aidan Z. H. Yang
Ruben Martins
Claire Le Goues
Vincent J. Hellendoorn
LRM
81
100
0
03 Oct 2023
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model Evaluation
Yucheng Li
89
35
0
19 Sep 2023
Code Llama: Open Foundation Models for Code
Baptiste Rozière
Jonas Gehring
Fabian Gloeckle
Sten Sootla
Itai Gat
...
Hugo Touvron
Louis Martin
Nicolas Usunier
Thomas Scialom
Gabriel Synnaeve
ELM
ALM
140
2,110
0
24 Aug 2023
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning
Yun Luo
Zhen Yang
Fandong Meng
Yafu Li
Jie Zhou
Yue Zhang
CLL
KELM
201
319
0
17 Aug 2023
Keep the Conversation Going: Fixing 162 out of 337 bugs for
0.42
e
a
c
h
u
s
i
n
g
C
h
a
t
G
P
T
0.42 each using ChatGPT
0.42
e
a
c
h
u
s
in
g
C
ha
tGPT
Chun Xia
Lingming Zhang
KELM
LRM
118
121
0
01 Apr 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.6K
13,533
0
27 Feb 2023
The Stack: 3 TB of permissively licensed source code
Denis Kocetkov
Raymond Li
Loubna Ben Allal
Jia Li
Chenghao Mou
...
Sean M. Hughes
Thomas Wolf
Dzmitry Bahdanau
Leandro von Werra
H. D. Vries
109
339
0
20 Nov 2022
Leakage and the Reproducibility Crisis in ML-based Science
Sayash Kapoor
Arvind Narayanan
73
180
0
14 Jul 2022
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Kushal Tirumala
Aram H. Markosyan
Luke Zettlemoyer
Armen Aghajanyan
TDI
125
197
0
22 May 2022
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
Erik Nijkamp
Bo Pang
Hiroaki Hayashi
Lifu Tu
Haiquan Wang
Yingbo Zhou
Silvio Savarese
Caiming Xiong
ELM
177
1,054
0
25 Mar 2022
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
218
2,023
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
268
5,695
0
07 Jul 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
Basel Alomair
Jacob Steinhardt
ELM
AIMat
ALM
300
712
0
20 May 2021
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
207
4,580
0
07 Sep 2020
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
320
8,189
0
16 Jun 2016
1