ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.07518
  4. Cited By
SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety
  Failures

SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures

14 October 2021
Megan Ung
Jing Xu
Y-Lan Boureau
ArXivPDFHTML

Papers citing "SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures"

31 / 31 papers shown
Title
Chained Tuning Leads to Biased Forgetting
Chained Tuning Leads to Biased Forgetting
Megan Ung
Alicia Sun
Samuel J. Bell
Bhaktipriya Radharapu
Levent Sagun
Adina Williams
CLL
KELM
89
0
0
21 Dec 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language
  Models
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models
Eddie L. Ungless
Nikolas Vitsakis
Zeerak Talat
James Garforth
Bjorn Ross
Arno Onken
Atoosa Kasirzadeh
Alexandra Birch
30
1
0
17 Oct 2024
Weak-to-Strong Generalization beyond Accuracy: a Pilot Study in Safety, Toxicity, and Legal Reasoning
Weak-to-Strong Generalization beyond Accuracy: a Pilot Study in Safety, Toxicity, and Legal Reasoning
Ruimeng Ye
Yang Xiao
Bo Hui
ALM
ELM
OffRL
29
2
0
16 Oct 2024
Purple-teaming LLMs with Adversarial Defender Training
Purple-teaming LLMs with Adversarial Defender Training
Jingyan Zhou
Kun Li
Junan Li
Jiawen Kang
Minda Hu
Xixin Wu
Helen Meng
AAML
36
1
0
01 Jul 2024
CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue
  Coreference
CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference
Erxin Yu
Jing Li
Ming Liao
Siqi Wang
Zuchen Gao
Fei Mi
Lanqing Hong
ELM
LRM
33
9
0
25 Jun 2024
Mitigating Social Biases in Language Models through Unlearning
Mitigating Social Biases in Language Models through Unlearning
O. Dige
Diljot Singh
Tsz Fung Yau
Qixuan Zhang
Borna Bolandraftar
Xiaodan Zhu
Faiza Khan Khattak
MoMe
MU
26
1
0
19 Jun 2024
The Life Cycle of Large Language Models: A Review of Biases in Education
The Life Cycle of Large Language Models: A Review of Biases in Education
Jinsook Lee
Yann Hicke
Renzhe Yu
Christopher A. Brooks
René F. Kizilcec
AI4Ed
39
1
0
03 Jun 2024
Unifying Bias and Unfairness in Information Retrieval: A Survey of
  Challenges and Opportunities with Large Language Models
Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models
Sunhao Dai
Chen Xu
Shicheng Xu
Liang Pang
Zhenhua Dong
Jun Xu
48
59
0
17 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
58
31
0
08 Apr 2024
NLP for Counterspeech against Hate: A Survey and How-To Guide
NLP for Counterspeech against Hate: A Survey and How-To Guide
Helena Bonaldi
Yi-Ling Chung
Gavin Abercrombie
Marco Guerini
AAML
31
13
0
29 Mar 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
52
55
0
14 Feb 2024
GrounDial: Human-norm Grounded Safe Dialog Response Generation
GrounDial: Human-norm Grounded Safe Dialog Response Generation
Siwon Kim
Shuyang Dai
Mohammad Kachuee
Shayan Ray
Tara Taghavi
Sungroh Yoon
14
0
0
14 Feb 2024
Improving Dialog Safety using Socially Aware Contrastive Learning
Improving Dialog Safety using Socially Aware Contrastive Learning
Souvik Das
R. Srihari
16
1
0
01 Feb 2024
CESAR: Automatic Induction of Compositional Instructions for Multi-turn
  Dialogs
CESAR: Automatic Induction of Compositional Instructions for Multi-turn Dialogs
Taha İbrahim Aksu
Devamanyu Hazarika
Shikib Mehri
Seokhwan Kim
Dilek Z. Hakkani-Tür
Yang Liu
Mahdi Namazifar
30
2
0
29 Nov 2023
Safer-Instruct: Aligning Language Models with Automated Preference Data
Safer-Instruct: Aligning Language Models with Automated Preference Data
Taiwei Shi
Kai Chen
Jieyu Zhao
ALM
SyDa
27
21
0
15 Nov 2023
Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend
  Existing Ones?
Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend Existing Ones?
Dominic Petrak
N. Moosavi
Ye Tian
Nikolai Rozanov
Iryna Gurevych
6
5
0
24 Oct 2023
Which Prompts Make The Difference? Data Prioritization For Efficient
  Human LLM Evaluation
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
M. Boubdir
Edward Kim
B. Ermiş
Marzieh Fadaee
Sara Hooker
ALM
31
18
0
22 Oct 2023
Bias and Fairness in Large Language Models: A Survey
Bias and Fairness in Large Language Models: A Survey
Isabel O. Gallegos
Ryan A. Rossi
Joe Barrow
Md Mehrab Tanjim
Sungchul Kim
Franck Dernoncourt
Tong Yu
Ruiyi Zhang
Nesreen Ahmed
AILaw
24
490
0
02 Sep 2023
Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning
  in Goal-Oriented Dialogue Models
Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
Qiang Zhang
Jason Naradowsky
Yusuke Miyao
ELM
26
32
0
29 May 2023
Healing Unsafe Dialogue Responses with Weak Supervision Signals
Healing Unsafe Dialogue Responses with Weak Supervision Signals
Zi Liang
Pinghui Wang
Ruofei Zhang
Shuo Zhang
Xiaofan Ye Yi Huang
Junlan Feng
29
1
0
25 May 2023
Effortless Integration of Memory Management into Open-Domain
  Conversation Systems
Effortless Integration of Memory Management into Open-Domain Conversation Systems
Eunbi Choi
Kyoung-Woon On
Gunsoo Han
Sungwoong Kim
D. W. Nam
DaeJin Jo
Seungeun Rho
Taehwan Kwon
Minjoon Seo
19
3
0
23 May 2023
Using In-Context Learning to Improve Dialogue Safety
Using In-Context Learning to Improve Dialogue Safety
Nicholas Meade
Spandana Gella
Devamanyu Hazarika
Prakhar Gupta
Di Jin
Siva Reddy
Yang Liu
Dilek Z. Hakkani-Tür
30
38
0
02 Feb 2023
DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines
DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines
Prakhar Gupta
Yang Liu
Di Jin
Behnam Hedayatnia
Spandana Gella
Sijia Liu
P. Lange
Julia Hirschberg
Dilek Z. Hakkani-Tür
30
5
0
20 Dec 2022
Sources of Noise in Dialogue and How to Deal with Them
Sources of Noise in Dialogue and How to Deal with Them
Derek Chen
Zhou Yu
13
2
0
06 Dec 2022
Bipartite-play Dialogue Collection for Practical Automatic Evaluation of
  Dialogue Systems
Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems
Shiki Sato
Yosuke Kishinami
Hiroaki Sugiyama
Reina Akama
Ryoko Tokuhisa
Jun Suzuki
15
2
0
19 Nov 2022
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain
  Chatbots
Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots
Waiman Si
Michael Backes
Jeremy Blackburn
Emiliano De Cristofaro
Gianluca Stringhini
Savvas Zannettou
Yang Zhang
36
58
0
07 Sep 2022
BlenderBot 3: a deployed conversational agent that continually learns to
  responsibly engage
BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage
Kurt Shuster
Jing Xu
M. Komeili
Da Ju
Eric Michael Smith
...
Naman Goyal
Arthur Szlam
Y-Lan Boureau
Melanie Kambadur
Jason Weston
LM&Ro
KELM
35
233
0
05 Aug 2022
ProsocialDialog: A Prosocial Backbone for Conversational Agents
ProsocialDialog: A Prosocial Backbone for Conversational Agents
Hyunwoo J. Kim
Youngjae Yu
Liwei Jiang
Ximing Lu
Daniel Khashabi
Gunhee Kim
Yejin Choi
Maarten Sap
22
118
0
25 May 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue
  through Instruction Tuning
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning
Prakhar Gupta
Cathy Jiao
Yi-Ting Yeh
Shikib Mehri
M. Eskénazi
Jeffrey P. Bigham
ALM
38
47
0
25 May 2022
OPT: Open Pre-trained Transformer Language Models
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLM
OSLM
AI4CE
59
3,488
0
02 May 2022
Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and
  Benchmarks
Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks
Jingyan Zhou
Deng Jiawen
Fei Mi
Yitong Li
Yasheng Wang
Minlie Huang
Xin Jiang
Qun Liu
Helen Meng
27
31
0
16 Feb 2022
1