ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.00892
  4. Cited By
No Offense Taken: Eliciting Offensiveness from Language Models

No Offense Taken: Eliciting Offensiveness from Language Models

2 October 2023
Anugya Srivastava
Rahul Ahuja
Rohith Mukku
ArXivPDFHTML

Papers citing "No Offense Taken: Eliciting Offensiveness from Language Models"

5 / 5 papers shown
Title
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and
  Ethical Considerations
Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations
Tarun Raheja
Nilay Pochhi
AAML
51
1
0
09 Oct 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
42
68
0
29 Jan 2024
Analyzing Dynamic Adversarial Training Data in the Limit
Analyzing Dynamic Adversarial Training Data in the Limit
Eric Wallace
Adina Williams
Robin Jia
Douwe Kiela
200
30
0
16 Oct 2021
Internet-Augmented Dialogue Generation
Internet-Augmented Dialogue Generation
M. Komeili
Kurt Shuster
Jason Weston
RALM
244
281
0
15 Jul 2021
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
301
1,616
0
18 Sep 2019
1