
Purple-teaming LLMs with Adversarial Defender Training
Papers citing "Purple-teaming LLMs with Adversarial Defender Training"
16 / 16 papers shown
Title |
---|
![]() Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations Hakan Inan Kartikeya Upasani Jianfeng Chi Rashi Rungta Krithika Iyer ...Michael Tontchev Qing Hu Brian Fuller Davide Testuggine Madian Khabsa |
![]() Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake
Analysis Kai Chen Chunwei Wang Kuo Yang Jianhua Han Lanqing Hong ...Zhenguo Li Dit-Yan Yeung Lifeng Shang Xin Jiang Qun Liu |