ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.11220
  4. Cited By
Learn What NOT to Learn: Towards Generative Safety in Chatbots

Learn What NOT to Learn: Towards Generative Safety in Chatbots

21 April 2023
Leila Khalatbari
Yejin Bang
Dan Su
Willy Chung
Saeedeh Ghadimi
Hossein Sameti
Pascale Fung
ArXivPDFHTML

Papers citing "Learn What NOT to Learn: Towards Generative Safety in Chatbots"

8 / 8 papers shown
Title
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
185
0
0
03 Mar 2025
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
"Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Jiayi Mao
Xueqi Cheng
AAML
47
9
0
17 Jun 2024
High-Dimension Human Value Representation in Large Language Models
High-Dimension Human Value Representation in Large Language Models
Samuel Cahyawijaya
Delong Chen
Yejin Bang
Leila Khalatbari
Bryan Wilie
Ziwei Ji
Etsuko Ishii
Pascale Fung
71
5
0
11 Apr 2024
SaGE: Evaluating Moral Consistency in Large Language Models
SaGE: Evaluating Moral Consistency in Large Language Models
Vamshi Krishna Bonagiri
Sreeram Vennam
Priyanshul Govil
Ponnurangam Kumaraguru
Manas Gaur
ELM
56
0
0
21 Feb 2024
Tackling Bias in Pre-trained Language Models: Current Trends and
  Under-represented Societies
Tackling Bias in Pre-trained Language Models: Current Trends and Under-represented Societies
Vithya Yogarajan
Gillian Dobbie
Te Taka Keegan
R. Neuwirth
ALM
43
11
0
03 Dec 2023
Bias and Fairness in Large Language Models: A Survey
Bias and Fairness in Large Language Models: A Survey
Isabel O. Gallegos
Ryan A. Rossi
Joe Barrow
Md Mehrab Tanjim
Sungchul Kim
Franck Dernoncourt
Tong Yu
Ruiyi Zhang
Nesreen Ahmed
AILaw
26
490
0
02 Sep 2023
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
502
0
28 Sep 2022
Text Detoxification using Large Pre-trained Neural Models
Text Detoxification using Large Pre-trained Neural Models
David Dale
Anton Voronov
Daryna Dementieva
V. Logacheva
Olga Kozlova
Nikita Semenov
Alexander Panchenko
39
71
0
18 Sep 2021
1