Title
RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation Xinnuo Xu Rachel Lawrence Kshitij Dubey Atharva Pandey Risa Ueno Fabian Falck A. Nori Rahul Sharma Amit Sharma Javier González LRM 17 0 0 18 Jun 2025
Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality Alex Grzankowski Geoff Keeling Henry Shevlin Winnie Street 11 0 0 16 Jun 2025
Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints Yaswanth Chittepu Blossom Metevier Will Schwarzer Austin Hoag S. Niekum Philip S Thomas 20 0 0 09 Jun 2025
Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems Haowei Wang Rupeng Zhang Junjie Wang Mingyang Li Yuekai Huang Dandan Wang Qing Wang SILM AAML 46 0 0 06 Jun 2025
Scenarios in Computing Research: A Systematic Review of the Use of Scenario Methods for Exploring the Future of Computing Technologies in Society Julia Barnett Kimon Kieslich J. Sinchai Nicholas Diakopoulos AI4TS 30 0 0 05 Jun 2025
Do Language Models Think Consistently? A Study of Value Preferences Across Varying Response Lengths Inderjeet Nair Lu Wang 47 0 0 03 Jun 2025
Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models Bumjin Park Jinsil Lee Jaesik Choi 13 0 0 01 Jun 2025
HADA: Human-AI Agent Decision Alignment Architecture Tapio Pitkäranta Leena Pitkäranta 17 0 0 01 Jun 2025
COSMIC: Generalized Refusal Direction Identification in LLM Activations Vincent Siu Nicholas Crispino Zihao Yu Sam Pan Zhun Wang Yang Liu Dawn Song Chenguang Wang LLMSV 25 0 0 30 May 2025
Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations Daniele Barolo Chiara Valentin Fariba Karimi Luis Galárraga Gonzalo G. Méndez Lisette Espín-Noboa 20 0 0 29 May 2025
Probing Politico-Economic Bias in Multilingual Large Language Models: A Cultural Analysis of Low-Resource Pakistani Languages Afrozah Nadeem Mark Dras Usman Naseem 17 0 0 29 May 2025
LLM Agents for Bargaining with Utility-based Feedback Jihwan Oh Murad Aghazada Se-Young Yun Taehyeon Kim LLMAG 21 0 0 29 May 2025
The Multilingual Divide and Its Impact on Global AI Safety Aidan Peppin Julia Kreutzer Alice Schoenauer Sebag Kelly Marchisio Beyza Ermis ... Wei-Yin Ko Ahmet Üstün Matthias Gallé Marzieh Fadaee Sara Hooker ELM 75 1 0 27 May 2025
Language Models Surface the Unwritten Code of Science and Society Honglin Bao Siyang Wu Jiwoong Choi Yingrong Mao James A. Evans 50 0 0 25 May 2025
Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking? Chengda Lu Xiaoyu Fan Yu Huang Rongwu Xu Jijie Li Wei Xu LRM 66 0 0 23 May 2025
Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability Punya Syon Pandey Samuel Simko Kellin Pelrine Zhijing Jin AAML 52 0 0 22 May 2025
Breaking mBad! Supervised Fine-tuning for Cross-Lingual Detoxification Himanshu Beniwal Y. Kim Maarten Sap Soham Dan Thomas Hartvigsen CLL 85 0 0 22 May 2025
After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in RAG Xinbang Dai Huikang Hu Yuncheng Hua Jiaqi Li Yongrui Chen Rihui Jin Nan Hu Guilin Qi RALM 3DV 67 0 0 21 May 2025
A Risk Taxonomy for Evaluating AI-Powered Psychotherapy Agents Ian Steenstra Timothy W. Bickmore 58 0 0 21 May 2025
Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification Tuc Nguyen Yifan Hu Thai Le DeLMO 83 0 0 20 May 2025
Attributional Safety Failures in Large Language Models under Code-Mixed Perturbations Somnath Banerjee Pratyush Chatterjee Shanu Kumar Sayan Layek Parag Agrawal Rima Hazra Animesh Mukherjee AAML 195 0 0 20 May 2025
Trust Me, I Can Handle It: Self-Generated Adversarial Scenario Extrapolation for Robust Language Models Md Rafi Ur Rashid Vishnu Asutosh Dasu Ye Wang Gang Tan Shagufta Mehnaz AAML ELM 109 0 0 20 May 2025
Pairwise Calibrated Rewards for Pluralistic Alignment Daniel Halpern Evi Micha Ariel D. Procaccia Itai Shapira 18 0 0 17 May 2025
Creating General User Models from Computer Use Omar Shaikh Shardul Sapkota Shan Rizvi Eric Horvitz Joon Sung Park Diyi Yang Michael S. Bernstein HAI 134 0 0 16 May 2025
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents" Pedro M. P. Curvo Mara Dragomir Salvador Torpes Mohammadmahdi Rahimi LLMAG 107 0 0 14 May 2025
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models Huining Cui Wei Liu AAML ELM 116 0 0 12 May 2025
Real-World Gaps in AI Governance Research Ilan Strauss Isobel Moure Tim O'Reilly Sruly Rosenblat 156 1 0 30 Apr 2025
$$\texttt{SAGE}$: A Generic Framework for LLM Safety Evaluation$ $\texttt{SAGE}$ : A Generic Framework for LLM Safety Evaluation Madhur Jindal Hari Shrawgi Parag Agrawal Sandipan Dandapat ELM 86 0 0 28 Apr 2025
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors Ren-Wei Liang Chin-Ting Hsu Chan-Hung Yu Saransh Agrawal Shih-Cheng Huang Shang-Tse Chen Kuan-Hao Huang Shao-Hua Sun 167 0 0 27 Apr 2025
AI Ethics and Social Norms: Exploring ChatGPT's Capabilities From What to How Omid Veisi Sasan Bahrami Roman Englert Claudia Müller 330 0 0 25 Apr 2025
Evaluating and Mitigating Bias in AI-Based Medical Text Generation Xiuying Chen Tairan Wang Juexiao Zhou Zirui Song Xin Gao Wei Wei MedIm 79 3 0 24 Apr 2025
(Im)possibility of Automated Hallucination Detection in Large Language Models Amin Karbasi Omar Montasser John Sous Grigoris Velegkas HILM 101 0 0 23 Apr 2025
aiXamine: Simplified LLM Safety and Security Fatih Deniz Dorde Popovic Yazan Boshmaf Euisuh Jeong M. Ahmad Sanjay Chawla Issa M. Khalil ELM 337 0 0 21 Apr 2025
Exploring Collaborative GenAI Agents in Synchronous Group Settings: Eliciting Team Perceptions and Design Considerations for the Future of Work Janet G. Johnson Macarena Peralta Mansanjam Kaur Ruijie Sophia Huang Sheng Zhao Ruijia Guan Shwetha Rajaram Michael Nebeling LLMAG 94 0 0 21 Apr 2025
Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models Tri Nguyen Lohith Srikanth Pentapalli Magnus Sieverding Laurah Turner Seth Overla ... Michael Gharib Matt Kelleher Michael Shukis Cameron Pawlik Kelly Cohen 101 0 0 21 Apr 2025
Mind the Language Gap: Automated and Augmented Evaluation of Bias in LLMs for High- and Low-Resource Languages Alessio Buscemi Cedric Lothritz Sergio Morales Marcos Gomez-Vazquez Robert Clarisó Jordi Cabot German Castignani 57 0 0 19 Apr 2025
Demo: ViolentUTF as An Accessible Platform for Generative AI Red Teaming Tam n. Nguyen 103 0 0 14 Apr 2025
Feature-Aware Malicious Output Detection and Mitigation Weilong Dong Peiguang Li Yu Tian Xinyi Zeng Fengdi Li Sirui Wang AAML 47 0 0 12 Apr 2025
Open Problems and a Hypothetical Path Forward in LLM Knowledge Paradigms Xiaotian Ye Hao Fei Shu Wu KELM ELM 133 0 0 09 Apr 2025
NLP Security and Ethics, in the Wild Heather Lent Erick Galinkin Yiyi Chen Jens Myrup Pedersen Leon Derczynski Johannes Bjerva SILM 135 0 0 09 Apr 2025
Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups Rijul Magu Arka Dutta Sean Kim Ashiqur R. KhudaBukhsh Munmun De Choudhury 110 0 0 08 Apr 2025
A Survey of Social Cybersecurity: Techniques for Attack Detection, Evaluations, Challenges, and Future Prospects Aos Mulahuwaish Basheer Qolomany Kevin Gyorick Jacques Bou Abdo Mohammed Aledhari Junaid Qadir Kathleen Carley Ala I. Al-Fuqaha 56 1 0 06 Apr 2025
Locations of Characters in Narratives: Andersen and Persuasion Datasets Batuhan Ozyurt Roya Arkhmammadova Deniz Yuret 75 2 0 04 Apr 2025
Increasing happiness through conversations with artificial intelligence Joseph Heffner Chongyu Qin Martin Chadwick Chris Knutsen Christopher Summerfield Zeb Kurth-Nelson Robb B. Rutledge AI4MH 86 0 0 02 Apr 2025
Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions Shih-Han Chan AAML 83 1 0 29 Mar 2025
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models Dahyun Jung Seungyoon Lee Hyeonseok Moon Chanjun Park Heuiseok Lim AAML ALM ELM 106 3 0 25 Mar 2025
Gemma 3 Technical Report Gemma Team Aishwarya B Kamath Johan Ferret Shreya Pathak Nino Vieillard ... Harshal Tushar Lehri Hussein Hazimeh Ian Ballantyne Idan Szpektor Ivan Nardini VLM 193 136 0 25 Mar 2025
Understanding the Effects of RLHF on the Quality and Detectability of LLM-Generated Texts Beining Xu Arkaitz Zubiaga DeLMO 119 0 0 23 Mar 2025
AgentRxiv: Towards Collaborative Autonomous Research Samuel Schmidgall Michael Moor 181 8 0 23 Mar 2025
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes Sharan Maiya Yinhong Liu Ramit Debnath Anna Korhonen 74 0 0 22 Mar 2025