Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.12938
Cited By
v1
v2 (latest)
Leveraging LLM Inconsistency to Boost Pass@k Performance
19 May 2025
Uri Dalal
Meirav Segal
Zvika Ben-Haim
Dan Lahav
Omer Nevo
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Leveraging LLM Inconsistency to Boost Pass@k Performance"
18 / 18 papers shown
Title
A Framework for Evaluating Emerging Cyberattack Capabilities of AI
Mikel Rodriguez
Raluca Ada Popa
Four Flynn
Lihao Liang
Allan Dafoe
Anna Wang
ELM
132
8
0
14 Mar 2025
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
Jingming Zhuo
Shanghang Zhang
Xinyu Fang
Haodong Duan
Dahua Lin
Kai Chen
72
28
0
16 Oct 2024
POSIX: A Prompt Sensitivity Index For Large Language Models
Anwoy Chatterjee
H. S. V. N. S. K. Renduchintala
S. Bhatia
Tanmoy Chakraborty
AAML
71
10
0
03 Oct 2024
PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations
Jiatong Li
Renjun Hu
Kunzhe Huang
Zhuang Yan
Qi Liu
Mengxiao Zhu
Xing Shi
Wei Lin
KELM
99
7
0
30 May 2024
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards
Norah A. Alzahrani
H. A. Alyahya
Sultan Yazeed Alnumay
Muhtasim Tahmid
Shaykhah Alsubaie
...
Saleh Soltan
Nathan Scales
Marie-Anne Lachaux
Samuel R. Bowman
Haidar Khan
ELM
121
79
0
01 Feb 2024
Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements
Anton Voronov
Lena Wolf
Max Ryabinin
60
52
0
12 Jan 2024
State of What Art? A Call for Multi-Prompt LLM Evaluation
Moran Mizrahi
Guy Kaplan
Daniel Malkin
Rotem Dror
Dafna Shahaf
Gabriel Stanovsky
ELM
91
147
0
31 Dec 2023
The language of prompting: What linguistic properties make a prompt successful?
Alina Leidinger
R. Rooij
Ekaterina Shutova
74
44
0
03 Nov 2023
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
Melanie Sclar
Yejin Choi
Yulia Tsvetkov
Alane Suhr
93
352
0
17 Oct 2023
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions
Pouya Pezeshkpour
Estevam R. Hruschka
LRM
53
144
0
22 Aug 2023
Evaluating the Zero-shot Robustness of Instruction-tuned Language Models
Jiu Sun
Chantal Shaib
Byron C. Wallace
ALM
50
50
0
20 Jun 2023
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
Or Honovich
Thomas Scialom
Omer Levy
Timo Schick
ALM
124
375
0
19 Dec 2022
Demystifying Prompts in Language Models via Perplexity Estimation
Hila Gonen
Srini Iyer
Terra Blevins
Noah A. Smith
Luke Zettlemoyer
LRM
124
213
0
08 Dec 2022
Robustness of Learning from Task Instructions
Jiasheng Gu
Hongyu Zhao
Hanzi Xu
Liang Nie
Hongyuan Mei
Wenpeng Yin
OOD
83
34
0
07 Dec 2022
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
233
5,635
0
07 Jul 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
Basel Alomair
Jacob Steinhardt
ELM
AIMat
ALM
257
703
0
20 May 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
406
1,193
0
18 Apr 2021
SPoC: Search-based Pseudocode to Code
Sumith Kulal
Panupong Pasupat
Kartik Chandra
Mina Lee
Oded Padon
A. Aiken
Percy Liang
62
226
0
12 Jun 2019
1