Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.00892
Cited By
No Offense Taken: Eliciting Offensiveness from Language Models
2 October 2023
Anugya Srivastava
Rahul Ahuja
Rohith Mukku
Re-assign community
ArXiv
PDF
HTML
Papers citing
"No Offense Taken: Eliciting Offensiveness from Language Models"
5 / 5 papers shown
Title
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations
Tarun Raheja
Nilay Pochhi
AAML
51
1
0
09 Oct 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
42
68
0
29 Jan 2024
Analyzing Dynamic Adversarial Training Data in the Limit
Eric Wallace
Adina Williams
Robin Jia
Douwe Kiela
200
30
0
16 Oct 2021
Internet-Augmented Dialogue Generation
M. Komeili
Kurt Shuster
Jason Weston
RALM
244
281
0
15 Jul 2021
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
301
1,616
0
18 Sep 2019
1