Title
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs Marco Arazzi Vignesh Kumar Kembu Antonino Nocera V. P. 122 0 0 30 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling Zijun Liu P. Wang Ran Xu Shirong Ma Chong Ruan Ziwei Sun Yang Liu Y. Wu OffRL LRM 137 40 0 03 Apr 2025
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans? Jeremy Barnes Naiara Perez Alba Bonet-Jover Begoña Altuna 86 2 0 21 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings Austin Xu Srijan Bansal Yifei Ming Semih Yavuz Shafiq Joty ELM 128 3 0 19 Mar 2025
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context Bryan L. M. de Oliveira Luana G. B. Martins Bruno Brandão Luckeciano C. Melo ELM 365 1 0 17 Feb 2025