Title |
---|
![]() Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse Reinforcement Learning Jared Joselowitz Arjun Jagota Satyapriya Krishna Sonali Parbhoo Nyal Patel Satyapriya Krishna Sonali Parbhoo |
![]() Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in
Red Teaming GenAI Ambrish Rawat Stefan Schoepf Giulio Zizzo Giandomenico Cornacchia Muhammad Zaid Hameed ...Elizabeth M. Daly Mark Purcell P. Sattigeri Pin-Yu Chen Kush R. Varshney |