ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.09306
  4. Cited By
Mitigating Covertly Unsafe Text within Natural Language Systems

Mitigating Covertly Unsafe Text within Natural Language Systems

17 October 2022
Alex Mei
Anisha Kabir
Sharon Levy
Melanie Subbiah
Emily Allaway
J. Judge
D. Patton
Bruce Bimber
Kathleen McKeown
William Yang Wang
ArXivPDFHTML

Papers citing "Mitigating Covertly Unsafe Text within Natural Language Systems"

16 / 16 papers shown
Title
Purple-teaming LLMs with Adversarial Defender Training
Purple-teaming LLMs with Adversarial Defender Training
Jingyan Zhou
Kun Li
Junan Li
Jiawen Kang
Minda Hu
Xixin Wu
Helen Meng
AAML
34
1
0
01 Jul 2024
Unlearning Climate Misinformation in Large Language Models
Unlearning Climate Misinformation in Large Language Models
Michael Fore
Simranjit Singh
Chaehong Lee
Amritanshu Pandey
Antonios Anastasopoulos
Dimitrios Stamoulis
MU
52
1
0
29 May 2024
Risks and Opportunities of Open-Source Generative AI
Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksander Petrov
Bertie Vidgen
Christian Schroeder
Fabio Pizzati
...
Matthew Jackson
Phillip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
40
18
0
14 May 2024
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksandar Petrov
Bertie Vidgen
Christian Schroeder de Witt
Fabio Pizzati
...
Paul Röttger
Philip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
46
5
0
25 Apr 2024
Detectors for Safe and Reliable LLMs: Implementations, Uses, and
  Limitations
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations
Swapnaja Achintalwar
Adriana Alvarado Garcia
Ateret Anaby-Tavor
Ioana Baldini
Sara E. Berger
...
Aashka Trivedi
Kush R. Varshney
Dennis L. Wei
Shalisha Witherspooon
Marcel Zalmanovici
27
10
0
09 Mar 2024
A Survey of the Evolution of Language Model-Based Dialogue Systems
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
53
20
0
28 Nov 2023
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
  Robustness of Large Language Models
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models
Alex Mei
Sharon Levy
William Yang Wang
AAML
34
7
0
14 Oct 2023
Training Data Extraction From Pre-trained Language Models: A Survey
Training Data Extraction From Pre-trained Language Models: A Survey
Shotaro Ishihara
24
46
0
25 May 2023
Users are the North Star for AI Transparency
Users are the North Star for AI Transparency
Alex Mei
Michael Stephen Saxon
Shiyu Chang
Zachary Chase Lipton
William Yang Wang
29
9
0
09 Mar 2023
Foveate, Attribute, and Rationalize: Towards Physically Safe and
  Trustworthy AI
Foveate, Attribute, and Rationalize: Towards Physically Safe and Trustworthy AI
Alex Mei
Sharon Levy
William Yang Wang
46
7
0
19 Dec 2022
SafeText: A Benchmark for Exploring Physical Safety in Language Models
SafeText: A Benchmark for Exploring Physical Safety in Language Models
Sharon Levy
Emily Allaway
Melanie Subbiah
Lydia B. Chilton
D. Patton
Kathleen McKeown
William Yang Wang
54
40
0
18 Oct 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
310
4,077
0
24 May 2022
Open-Domain Question-Answering for COVID-19 and Other Emergent Domains
Open-Domain Question-Answering for COVID-19 and Other Emergent Domains
Sharon Levy
Kevin Mo
Wenhan Xiong
W. Wang
OOD
LRM
39
12
0
13 Oct 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
  Bias in NLP
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
259
374
0
28 Feb 2021
Constrained Abstractive Summarization: Preserving Factual Consistency
  with Constrained Generation
Constrained Abstractive Summarization: Preserving Factual Consistency with Constrained Generation
Yuning Mao
Xiang Ren
Heng Ji
Jiawei Han
HILM
115
38
0
24 Oct 2020
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
280
1,587
0
18 Sep 2019
1