ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.14856
  4. Cited By
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
v1v2 (latest)

Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code

28 January 2025
Shahin Honarvar
Mark van der Wilk
Alastair Donaldson
ArXiv (abs)PDFHTML

Papers citing "Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code"

48 / 48 papers shown
Title
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study
Enhancing LLM Robustness to Perturbed Instructions: An Empirical Study
Aryan Agrawal
Lisa Alazraki
Shahin Honarvar
Marek Rei
118
2
0
03 Apr 2025
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
Roham Koohestani
Philippe de Bekker
Maliheh Izadi
VLM
94
0
0
07 Mar 2025
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI
  with a Focus on Model Confidence
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence
Norbert Tihanyi
Tamás Bisztray
Richard A. Dubniczky
Rebeka Tóth
B. Borsos
...
Ryan Marinelli
Lucas C. Cordeiro
Merouane Debbah
Vasileios Mavroeidis
Audun Josang
72
5
0
20 Oct 2024
Large Language Models for Secure Code Assessment: A Multi-Language
  Empirical Study
Large Language Models for Secure Code Assessment: A Multi-Language Empirical Study
Kohei Dozono
Tiago Gasiba
Andrea Stocco
ELM
65
2
0
12 Aug 2024
Knowledge-based Consistency Testing of Large Language Models
Knowledge-based Consistency Testing of Large Language Models
Sai Sathiesh Rajan
E. Soremekun
Sudipta Chattopadhyay
72
6
0
03 Jul 2024
Test Code Generation for Telecom Software Systems using Two-Stage
  Generative Model
Test Code Generation for Telecom Software Systems using Two-Stage Generative Model
Mohamad Nabeel
Doumitrou Daniil Nimara
Tahar Zanouda
65
3
0
14 Apr 2024
Reasoning Runtime Behavior of a Program with LLM: How Far Are We?
Reasoning Runtime Behavior of a Program with LLM: How Far Are We?
Junkai Chen
Zhiyuan Pan
Xing Hu
Zhenhao Li
Ge Li
Xin Xia
LRM
92
28
0
25 Mar 2024
Bugs in Large Language Models Generated Code: An Empirical Study
Bugs in Large Language Models Generated Code: An Empirical Study
Florian Tambon
Arghavan Moradi Dakhel
Amin Nikanjam
Foutse Khomh
Michel C. Desmarais
G. Antoniol
ELM
79
35
0
13 Mar 2024
Astraios: Parameter-Efficient Instruction Tuning Code Large Language
  Models
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
Terry Yue Zhuo
A. Zebaze
Nitchakarn Suppattarachai
Leandro von Werra
H. D. Vries
Qian Liu
Niklas Muennighoff
ALM
84
18
0
01 Jan 2024
ConDefects: A New Dataset to Address the Data Leakage Concern for
  LLM-based Fault Localization and Program Repair
ConDefects: A New Dataset to Address the Data Leakage Concern for LLM-based Fault Localization and Program Repair
Yonghao Wu
Zheng Li
Jie Zhang
Yong Liu
82
13
0
25 Oct 2023
Beyond Accuracy: Evaluating Self-Consistency of Code Large Language
  Models with IdentityChain
Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain
Marcus J. Min
Yangruibo Ding
Luca Buratti
Saurabh Pujar
Gail E. Kaiser
Suman Jana
Baishakhi Ray
LRMHILM
60
21
0
21 Oct 2023
Instruction Tuning for Large Language Models: A Survey
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang
Linfeng Dong
Xiaoya Li
Sen Zhang
Xiaofei Sun
...
Jiwei Li
Runyi Hu
Tianwei Zhang
Leilei Gan
Guoyin Wang
LM&MA
98
606
0
21 Aug 2023
An Empirical Study on Using Large Language Models to Analyze Software
  Supply Chain Security Failures
An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures
Tanmay Singla
Dharun Anandayuvaraj
Kelechi G. Kalu
Taylor R. Schorlemmer
James C. Davis
127
13
0
09 Aug 2023
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on
  Class-level Code Generation
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation
Xueying Du
Wentai Deng
Kaixin Wang
Hanlin Wang
Junwei Liu
Yixuan Chen
Jiayi Feng
Chaofeng Sha
Xin Peng
Xin Peng
ELMALM
67
149
0
03 Aug 2023
Evaluating Instruction-Tuned Large Language Models on Code Comprehension
  and Generation
Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation
Zhiqiang Yuan
Junwei Liu
Qiancheng Zi
Wentai Deng
Xin Peng
Xin Peng
ALMELMLRM
82
80
0
02 Aug 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
156
1,723
0
06 Jul 2023
Exploring the Robustness of Large Language Models for Solving
  Programming Problems
Exploring the Robustness of Large Language Models for Solving Programming Problems
Atsushi Shirafuji
Yutaka Watanobe
Takumi Ito
Makoto Morishita
Yuki Nakamura
Yusuke Oda
Jun Suzuki
ELM
82
21
0
26 Jun 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
  and LLMs Evaluations
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
117
83
0
07 Jun 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELMALM
253
955
0
02 May 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,761
0
15 Mar 2023
The Programmer's Assistant: Conversational Interaction with a Large
  Language Model for Software Development
The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development
Steven I. Ross
Fernando Martinez
Stephanie Houde
Michael J. Muller
Justin D. Weisz
86
221
0
14 Feb 2023
Code Difference Guided Adversarial Example Generation for Deep Code
  Models
Code Difference Guided Adversarial Example Generation for Deep Code Models
Zhao Tian
Junjie Chen
Zhi Jin
AAML
81
22
0
06 Jan 2023
ReCode: Robustness Evaluation of Code Generation Models
ReCode: Robustness Evaluation of Code Generation Models
Shiqi Wang
Zheng Li
Haifeng Qian
Cheng Yang
Zijian Wang
...
Parminder Bhatia
Ramesh Nallapati
M. K. Ramanathan
Dan Roth
Bing Xiang
63
89
0
20 Dec 2022
Benchmarking Large Language Models for Automated Verilog RTL Code
  Generation
Benchmarking Large Language Models for Automated Verilog RTL Code Generation
Shailja Thakur
Baleegh Ahmad
Zhenxing Fan
Hammond Pearce
Benjamin Tan
Ramesh Karri
Brendan Dolan-Gavitt
S. Garg
57
141
0
13 Dec 2022
CLAWSAT: Towards Both Robust and Accurate Code Models
CLAWSAT: Towards Both Robust and Accurate Code Models
Jinghan Jia
Shashank Srikant
Tamara Mitrovska
Chuang Gan
Shiyu Chang
Sijia Liu
Una-May O’Reilly
AAML
108
11
0
21 Nov 2022
Do Users Write More Insecure Code with AI Assistants?
Do Users Write More Insecure Code with AI Assistants?
Neil Perry
Megha Srivastava
Deepak Kumar
Dan Boneh
ELMAAML
73
178
0
07 Nov 2022
Piloting Copilot, Codex, and StarCoder2: Hot Temperature, Cold Prompts, or Black Magic?
Piloting Copilot, Codex, and StarCoder2: Hot Temperature, Cold Prompts, or Black Magic?
Jean-Baptiste Döderlein
M. Acher
Mathieu Acher
D. Khelladi
B. Combemale
85
35
0
26 Oct 2022
Language Models are Multilingual Chain-of-Thought Reasoners
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLMLRM
247
369
0
06 Oct 2022
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of
  Chain-of-Thought
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELMLRMReLM
258
315
0
03 Oct 2022
CoditT5: Pretraining for Source Code and Natural Language Editing
CoditT5: Pretraining for Source Code and Natural Language Editing
Jiyang Zhang
Sheena Panthaplackel
Pengyu Nie
Junyi Jessy Li
Miloš Gligorić
KELM
82
91
0
10 Aug 2022
GitHub Copilot AI pair programmer: Asset or Liability?
GitHub Copilot AI pair programmer: Asset or Liability?
Arghavan Moradi Dakhel
Vahid Majdinasab
Amin Nikanjam
Foutse Khomh
Michel C. Desmarais
Zhen Ming
Z. Jiang
93
357
0
30 Jun 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLMLRM
532
4,077
0
24 May 2022
A Systematic Evaluation of Large Language Models of Code
A Systematic Evaluation of Large Language Models of Code
Frank F. Xu
Uri Alon
Graham Neubig
Vincent J. Hellendoorn
ELMALM
235
655
0
26 Feb 2022
What Do They Capture? -- A Structural Analysis of Pre-Trained Language
  Models for Source Code
What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code
Yao Wan
Wei Zhao
Hongyu Zhang
Yulei Sui
Guandong Xu
Hairong Jin
88
110
0
14 Feb 2022
Competition-Level Code Generation with AlphaCode
Competition-Level Code Generation with AlphaCode
Yujia Li
David Choi
Junyoung Chung
Nate Kushman
Julian Schrittwieser
...
Esme Sutherland Robson
Pushmeet Kohli
Nando de
Koray Kavukcuoglu
Oriol Vinyals
148
1,425
0
08 Feb 2022
Finetuned Language Models Are Zero-Shot Learners
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALMUQCV
246
3,789
0
03 Sep 2021
Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code
  Contributions
Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions
Hammond Pearce
Baleegh Ahmad
Benjamin Tan
Brendan Dolan-Gavitt
Ramesh Karri
SILM
84
424
0
20 Aug 2021
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELMAIMatReCodALM
216
2,009
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
236
5,665
0
07 Jul 2021
A Survey of Uncertainty in Deep Neural Networks
A Survey of Uncertainty in Deep Neural Networks
J. Gawlikowski
Cedrique Rovile Njieutcheu Tassi
Mohsin Ali
Jongseo Lee
Matthias Humt
...
R. Roscher
Muhammad Shahzad
Wen Yang
R. Bamler
Xiaoxiang Zhu
BDLUQCVOOD
235
1,164
0
07 Jul 2021
On Adversarial Robustness of Synthetic Code Generation
On Adversarial Robustness of Synthetic Code Generation
Mrinal Anand
Pratik Kayal
M. Singh
122
5
0
22 Jun 2021
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
Basel Alomair
Jacob Steinhardt
ELMAIMatALM
272
704
0
20 May 2021
Perfection Not Required? Human-AI Partnerships in Code Translation
Perfection Not Required? Human-AI Partnerships in Code Translation
Justin D. Weisz
Michael J. Muller
Stephanie Houde
John T. Richards
Steven I. Ross
Fernando Martinez
Mayank Agarwal
Kartik Talamadupula
61
130
0
08 Apr 2021
Generating Adversarial Computer Programs using Optimized Obfuscations
Generating Adversarial Computer Programs using Optimized Obfuscations
Shashank Srikant
Sijia Liu
Tamara Mitrovska
Shiyu Chang
Quanfu Fan
Gaoyuan Zhang
Una-May O’Reilly
AAML
99
46
0
18 Mar 2021
Prompt Programming for Large Language Models: Beyond the Few-Shot
  Paradigm
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
Laria Reynolds
Kyle McDonell
114
918
0
15 Feb 2021
Adversarial Examples for Models of Code
Adversarial Examples for Models of Code
Noam Yefet
Uri Alon
Eran Yahav
SILMAAMLMLAU
102
168
0
15 Oct 2019
Unifying Human and Statistical Evaluation for Natural Language
  Generation
Unifying Human and Statistical Evaluation for Natural Language Generation
Tatsunori B. Hashimoto
Hugh Zhang
Percy Liang
85
225
0
04 Apr 2019
Language GANs Falling Short
Language GANs Falling Short
Massimo Caccia
Lucas Caccia
W. Fedus
Hugo Larochelle
Joelle Pineau
Laurent Charlin
222
218
0
06 Nov 2018
1