(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for
Evolving LLM APIs

(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs

18 November 2023

Chenyang Yang

Christian Kastner

Papers citing "(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs"

6 / 6 papers shown

Title
The Hitchhikers Guide to Production-ready Trustworthy Foundation Model powered Software (FMware) Kirill Vasilevski Benjamin Rombaut Gopi Krishnan Rajbahadur G. Oliva Keheliya Gallaba ... Bouyan Chen Kishanthan Thangarajah Ahmed E. Hassan Zhen Ming Jiang 17 0 0 15 May 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models Z. He Haiyan Zhao Yiran Qiao Fan Yang Ali Payani Jing Ma Jundong Li LLMSV 74 2 0 17 Feb 2025
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap Gopi Krishnan Rajbahadur G. Oliva Dayi Lin Ahmed E. Hassan 46 1 0 28 Jan 2025
Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts Jenny T Liang Melissa Lin Nikitha Rao Brad A. Myers 75 5 0 19 Sep 2024
Toxicity Detection with Generative Prompt-based Inference Yau-Shian Wang Y. Chang 90 35 0 24 May 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity Yao Lu Max Bartolo Alastair Moore Sebastian Riedel Pontus Stenetorp AILaw LRM 279 1,124 0 18 Apr 2021