Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.03091
Cited By
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
5 June 2023
Tianyang Liu
Canwen Xu
Julian McAuley
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems"
50 / 110 papers shown
Title
LongCodeBench: Evaluating Coding LLMs at 1M Context Windows
Stefano Rando
Luca Romani
Alessio Sampieri
Yuta Kyuragi
Luca Franco
Fabio Galasso
Tatsunori Hashimoto
John Yang
LLMAG
45
0
0
12 May 2025
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
Kai Xu
YiWei Mao
XinYi Guan
ZiLong Feng
45
0
0
12 May 2025
SweRank: Software Issue Localization with Code Ranking
R. Reddy
Tarun Suresh
JaeHyeok Doo
Yong-Jin Liu
Xuan-Phi Nguyen
Yingbo Zhou
Semih Yavuz
Caiming Xiong
Heng Ji
Chenyu You
31
0
0
07 May 2025
YABLoCo: Yet Another Benchmark for Long Context Code Generation
Aidar Valeev
Roman Garaev
Vadim Lomshakov
Irina Piontkovskaya
Vladimir Ivanov
Israel Adewuyi
50
0
0
07 May 2025
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch
Zimu Lu
Yiran Yang
Houxing Ren
Haotian Hou
Han Xiao
Ke Wang
Weikang Shi
Aojun Zhou
Mingjie Zhan
Yiming Li
LLMAG
47
0
0
06 May 2025
SecRepoBench: Benchmarking LLMs for Secure Code Generation in Real-World Repositories
Connor Dilgren
Purva Chiniya
Luke Griffith
Yu Ding
Yizheng Chen
52
1
0
29 Apr 2025
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Minju Seo
Jinheon Baek
Seongyun Lee
Sung Ju Hwang
AI4CE
44
1
0
24 Apr 2025
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
Anirudh Khatry
Robert Zhang
Jia Pan
Ziteng Wang
Qiaochu Chen
Greg Durrett
Isil Dillig
39
0
0
21 Apr 2025
RTLRepoCoder: Repository-Level RTL Code Completion through the Combination of Fine-Tuning and Retrieval Augmentation
Peiyang Wu
Nan Guo
Junliang Lv
Xiao Xiao
Mingyu Yan
42
1
0
11 Apr 2025
Safe Screening Rules for Group OWL Models
Runxue Bao
Quanchao Lu
Yanfu Zhang
43
0
0
04 Apr 2025
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
Daoguang Zan
Zhirong Huang
Wei Liu
Hanwu Chen
L. Zhang
...
Jing Su
Tianyu Liu
Rui Long
Kai Shen
Liang Xiang
53
2
0
03 Apr 2025
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
Anjiang Wei
Tarun Suresh
Jiannan Cao
Naveen Kannan
Yuheng Wu
Kai Yan
Diyi Yang
Alex Aiken
Alex Aiken
ELM
LRM
46
0
0
29 Mar 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLM
ELM
LRM
220
0
0
28 Mar 2025
GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments
M. Vu
Gerald Ebmer
Alexander Watcher
Marc-Philip Ecker
Giang Nguyen
Tobias Glueck
77
0
0
18 Mar 2025
Landscape Complexity for the Empirical Risk of Generalized Linear Models: Discrimination between Structured Data
Theodoros G. Tsironis
Aris L. Moustakas
66
0
0
18 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
85
1
0
17 Mar 2025
Compute Optimal Scaling of Skills: Knowledge vs Reasoning
Nicholas Roberts
Niladri S. Chatterji
Sharan Narang
Mike Lewis
Dieuwke Hupkes
54
2
0
13 Mar 2025
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code
Dhruv Gautam
Spandan Garg
Jinu Jang
Neel Sundaresan
Roshanak Zilouchian Moghaddam
LLMAG
LRM
78
2
0
10 Mar 2025
DependEval: Benchmarking LLMs for Repository Dependency Understanding
Junjia Du
Yadi Liu
Hongcheng Guo
Jiawei Wang
Haojian Huang
Yunyi Ni
Zhiyu Li
51
1
0
09 Mar 2025
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
Wei Li
Xin Zhang
Zhongxin Guo
Shaoguang Mao
Wen Luo
Guangyue Peng
Yangyu Huang
Houfeng Wang
Scarlett Li
57
0
0
09 Mar 2025
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
Roham Koohestani
Philippe de Bekker
Maliheh Izadi
VLM
52
0
0
07 Mar 2025
Transferable Foundation Models for Geometric Tasks on Point Cloud Representations: Geometric Neural Operators
Blaine Quackenbush
P. Atzberger
3DPC
AI4CE
73
0
0
06 Mar 2025
SolBench: A Dataset and Benchmark for Evaluating Functional Correctness in Solidity Code Completion and Repair
Zaoyu Chen
Haoran Qin
Nuo Chen
Xiangyu Zhao
Lei Xue
Xiapu Luo
Xiao-Ming Wu
53
0
0
03 Mar 2025
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Dayu Yang
Tianyang Liu
Daoan Zhang
Antoine Simoulin
Xiaoyi Liu
...
Zhaopu Teng
Xin Qian
Grey Yang
Jiebo Luo
Julian McAuley
ReLM
OffRL
LRM
91
4
0
26 Feb 2025
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation
K. Yan
Hongcheng Guo
Xuanqing Shi
Jinfeng Xu
Yaonan Gu
Zehan Li
ALM
99
0
0
26 Feb 2025
CodeSwift: Accelerating LLM Inference for Efficient Code Generation
Qianhui Zhao
L. Zhang
Fang Liu
Xiaoli Lian
Qiaoyuanhe Meng
Ziqian Jiao
Zetong Zhou
Borui Zhang
Runlin Guo
Jia Li
43
0
0
24 Feb 2025
LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification
Penghui Yang
Cunxiao Du
Fengzhuo Zhang
Haonan Wang
Tianyu Pang
Chao Du
Bo An
RALM
47
0
0
24 Feb 2025
Code Summarization Beyond Function Level
Vladimir Makharev
Vladimir Ivanov
45
0
0
23 Feb 2025
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
Bohan Lyu
Siqiao Huang
Zichen Liang
Qi-An Sun
Jiaming Zhang
ELM
LRM
62
0
0
16 Feb 2025
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories
Yijia Xiao
Runhui Wang
Luyang Kong
Davor Golac
Wei Wang
LLMAG
228
0
0
10 Feb 2025
Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference
Weizhi Fei
Xueyan Niu
Guoqing Xie
Yingqing Liu
Bo Bai
Wei Han
38
1
0
22 Jan 2025
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
Jialun Cao
Yuk-Kit Chan
Zixuan Ling
Wenxuan Wang
Shuqing Li
...
Pinjia He
Shuai Wang
Zibin Zheng
Michael R. Lyu
Shing-Chi Cheung
ALM
71
1
0
18 Jan 2025
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
Zhaojian Yu
Yilun Zhao
Arman Cohan
Xiao-Ping Zhang
LRM
41
5
0
03 Jan 2025
LoL-PIM: Long-Context LLM Decoding with Scalable DRAM-PIM System
Hyucksung Kwon
Kyungmo Koo
Janghyeon Kim
W. Lee
Minjae Lee
...
Yongkee Kwon
Ilkon Kim
Euicheol Lim
John Kim
Jungwook Choi
74
4
0
28 Dec 2024
Modality-Projection Universal Model for Comprehensive Full-Body Medical Imaging Segmentation
Yixin Chen
Lin Gao
Yajuan Gao
Rui Wang
Jingge Lian
...
Y. Duan
Leiying Chai
Hongbin Han
Zhaoping Cheng
Zhaoheng Xie
50
0
0
26 Dec 2024
CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval
Yongxu Liu
Rui Meng
Chenyu You
Silvio Savarese
Caiming Xiong
Yingbo Zhou
Semih Yavuz
96
3
0
19 Nov 2024
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
Janghwan Lee
Jiwoong Park
Jinseok Kim
Yongjik Kim
Jungju Oh
Jinwook Oh
Jungwook Choi
44
2
0
15 Nov 2024
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Siming Huang
Tianhao Cheng
J.K. Liu
Jiaran Hao
L. Song
...
Ge Zhang
Zili Wang
Yuan Qi
Yinghui Xu
Wei Chu
ALM
83
18
0
07 Nov 2024
Crystal: Illuminating LLM Abilities on Language and Code
Tianhua Tao
Junbo Li
Bowen Tan
Hongyi Wang
William Marshall
...
Joel Hestness
Natalia Vassilieva
Zhiqiang Shen
Eric P. Xing
Zhengzhong Liu
54
4
0
06 Nov 2024
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models
Nizar Islah
Justine Gehring
Diganta Misra
Eilif B. Muller
Irina Rish
Terry Yue Zhuo
Massimo Caccia
SyDa
50
1
0
05 Nov 2024
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
Krishna Teja Chitty-Venkata
Siddhisanket Raskar
B. Kale
Farah Ferdaus
Aditya Tanikanti
Ken Raffenetti
Valerie Taylor
M. Emani
V. Vishwanath
49
7
0
31 Oct 2024
Can Language Models Replace Programmers? REPOCOD Says Ñot Yet'
Shanchao Liang
Yiran Hu
Nan Jiang
Lin Tan
ALM
ELM
32
2
0
29 Oct 2024
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Qingbin Liu
Ken Deng
Congnan Liu
Jian Yang
Shukai Liu
...
Zekun Wang
Guoan Zhang
Bangyu Xiang
Wenbo Su
Jian Xu
75
4
0
28 Oct 2024
LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems
Nan Xu
Xuezhe Ma
LRM
59
4
0
18 Oct 2024
Agent-as-a-Judge: Evaluate Agents with Agents
Mingchen Zhuge
Changsheng Zhao
Dylan R. Ashley
Wenyi Wang
Dmitrii Khizbullin
...
Raghuraman Krishnamoorthi
Yuandong Tian
Yangyang Shi
Vikas Chandra
Jürgen Schmidhuber
ELM
65
35
0
14 Oct 2024
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang
Carlos E. Jimenez
Alex Zhang
K. Lieret
Joyce Yang
...
Gabriel Synnaeve
Karthik Narasimhan
Diyi Yang
Sida I. Wang
Ofir Press
41
24
0
04 Oct 2024
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?
Zhenyu Pan
Rongyu Cao
Yongchang Cao
Yingwei Ma
Binhua Li
Fei Huang
Han Liu
Yongbin Li
50
4
0
02 Oct 2024
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories
Ben Bogin
Kejuan Yang
Shashank Gupta
Kyle Richardson
Erin Bransom
Peter Clark
Ashish Sabharwal
Tushar Khot
ELM
LRM
49
10
0
11 Sep 2024
Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks
Zi Yang
33
0
0
10 Sep 2024
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
Huy N. Phan
Phong X. Nguyen
Nghi D. Q. Bui
LLMAG
33
12
0
09 Sep 2024
1
2
3
Next