
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Papers citing "Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models"
33 / 33 papers shown
Title |
---|
![]() Transfer Q Star: Principled Decoding for LLM Alignment Souradip Chakraborty Soumya Suvra Ghosal Ming Yin Dinesh Manocha Mengdi Wang Amrit Singh Bedi Furong Huang |
![]() Mistral 7B Albert Q. Jiang Alexandre Sablayrolles A. Mensch Chris Bamford Devendra Singh Chaplot ...Teven Le Scao Thibaut Lavril Thomas Wang Timothée Lacroix William El Sayed |
![]() Qwen Technical Report Jinze Bai Shuai Bai Yunfei Chu Zeyu Cui Kai Dang ...Zhenru Zhang Chang Zhou Jingren Zhou Xiaohuan Zhou Tianhang Zhu |
![]() Llama 2: Open Foundation and Fine-Tuned Chat Models Hugo Touvron Louis Martin Kevin R. Stone Peter Albert Amjad Almahairi ...Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov Thomas Scialom |