ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.15897
  4. Cited By
Red-Teaming for Generative AI: Silver Bullet or Security Theater?

Red-Teaming for Generative AI: Silver Bullet or Security Theater?

29 January 2024
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
    AAML
ArXivPDFHTML

Papers citing "Red-Teaming for Generative AI: Silver Bullet or Security Theater?"

46 / 46 papers shown
Title
Red Teaming Large Language Models for Healthcare
Red Teaming Large Language Models for Healthcare
Vahid Balazadeh
Michael Cooper
David Pellow
Atousa Assadi
Jennifer Bell
...
Syed Ahmar Shah
Babak Taati
Balagopal Unnikrishnan
Stephanie Williams
Rahul G. Krishnan
LM&MA
31
0
0
01 May 2025
Generative AI in Financial Institution: A Global Survey of Opportunities, Threats, and Regulation
Generative AI in Financial Institution: A Global Survey of Opportunities, Threats, and Regulation
Bikash Saha
Nanda Rani
Sandeep K. Shukla
148
0
0
30 Apr 2025
When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines
When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines
Sachin R. Pendse
Darren Gergle
Rachel Kornfield
J. Meyerhoff
David C. Mohr
Jina Suh
Annie Wescott
Casey Williams
J. Schleider
39
0
0
29 Apr 2025
AI-Based Crypto Tokens: The Illusion of Decentralized AI?
AI-Based Crypto Tokens: The Illusion of Decentralized AI?
Rischan Mafrur
26
0
0
29 Apr 2025
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
Bang An
Shiyue Zhang
Mark Dredze
61
0
0
25 Apr 2025
From job titles to jawlines: Using context voids to study generative AI systems
From job titles to jawlines: Using context voids to study generative AI systems
Shahan Ali Memon
Soham De
Sungha Kang
Riyan Mujtaba
Bedoor K. AlShebli
Katie Davis
Jaime Snyder
Jevin D. West
17
0
0
16 Apr 2025
What Makes an Evaluation Useful? Common Pitfalls and Best Practices
What Makes an Evaluation Useful? Common Pitfalls and Best Practices
Gil Gekker
Meirav Segal
Dan Lahav
Omer Nevo
ELM
43
0
0
30 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
167
0
0
03 Mar 2025
A Guide to Failure in Machine Learning: Reliability and Robustness from Foundations to Practice
Eric Heim
Oren Wright
David Shriver
OOD
FaML
63
0
0
01 Mar 2025
Forecasting Rare Language Model Behaviors
Erik Jones
Meg Tong
Jesse Mu
Mohammed Mahfoud
Jan Leike
Roger C. Grosse
Jared Kaplan
William Fithian
Ethan Perez
Mrinank Sharma
47
2
0
24 Feb 2025
Robustness and Cybersecurity in the EU Artificial Intelligence Act
Robustness and Cybersecurity in the EU Artificial Intelligence Act
Henrik Nolte
Miriam Rateike
Michèle Finck
38
1
0
22 Feb 2025
The Pitfalls of "Security by Obscurity" And What They Mean for Transparent AI
The Pitfalls of "Security by Obscurity" And What They Mean for Transparent AI
Peter Hall
Olivia Mundahl
Sunoo Park
74
0
0
30 Jan 2025
Lessons From Red Teaming 100 Generative AI Products
Lessons From Red Teaming 100 Generative AI Products
Blake Bullwinkel
Amanda Minnich
Shiven Chawla
Gary Lopez
Martin Pouliot
...
Pete Bryan
Ram Shankar Siva Kumar
Yonatan Zunger
Chang Kawaguchi
Mark Russinovich
AAML
VLM
37
4
0
13 Jan 2025
Towards Effective Discrimination Testing for Generative AI
Towards Effective Discrimination Testing for Generative AI
Thomas P. Zollo
Nikita Rajaneesh
Richard Zemel
Talia B. Gillis
Emily Black
30
1
0
31 Dec 2024
OpenAI o1 System Card
OpenAI o1 System Card
OpenAI OpenAI
:
Aaron Jaech
Adam Tauman Kalai
Adam Lerer
...
Yuchen He
Yuchen Zhang
Yunyun Wang
Zheng Shao
Zhuohan Li
ELM
LRM
AI4CE
77
1
0
21 Dec 2024
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI
  Policy, Research, and Practice
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
A. Feder Cooper
Christopher A. Choquette-Choo
Miranda Bogen
Matthew Jagielski
Katja Filippova
...
Abigail Z. Jacobs
Andreas Terzis
Hanna M. Wallach
Nicolas Papernot
Katherine Lee
AILaw
MU
93
10
0
09 Dec 2024
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and
  Establishing Best Practices
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
Anka Reuel
Amelia F. Hardy
Chandler Smith
Max Lamparth
Malcolm Hardy
Mykel J. Kochenderfer
ELM
78
17
0
20 Nov 2024
AURA: Amplifying Understanding, Resilience, and Awareness for
  Responsible AI Content Work
AURA: Amplifying Understanding, Resilience, and Awareness for Responsible AI Content Work
Alice Qian Zhang
Judith Amores
Mary L. Gray
Mary Czerwinski
J. Suh
48
4
0
03 Nov 2024
A Formal Framework for Assessing and Mitigating Emergent Security Risks
  in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation
A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation
Aviral Srivastava
Sourav Panda
29
0
0
15 Oct 2024
From Transparency to Accountability and Back: A Discussion of Access and
  Evidence in AI Auditing
From Transparency to Accountability and Back: A Discussion of Access and Evidence in AI Auditing
Sarah H. Cen
Rohan Alur
29
1
0
07 Oct 2024
HiddenGuard: Fine-Grained Safe Generation with Specialized
  Representation Router
HiddenGuard: Fine-Grained Safe Generation with Specialized Representation Router
Lingrui Mei
Shenghua Liu
Yiwei Wang
Baolong Bi
Ruibin Yuan
Xueqi Cheng
35
4
0
03 Oct 2024
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
Catherine Yeh
Donghao Ren
Yannick Assogba
Dominik Moritz
Fred Hohman
36
0
0
01 Oct 2024
What Is Wrong with My Model? Identifying Systematic Problems with
  Semantic Data Slicing
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing
Chenyang Yang
Yining Hong
Grace A. Lewis
Tongshuang Wu
Christian Kastner
38
1
0
14 Sep 2024
Safe Generative Chats in a WhatsApp Intelligent Tutoring System
Safe Generative Chats in a WhatsApp Intelligent Tutoring System
Zachary Levonian
Owen Henkel
KELM
31
0
0
06 Jul 2024
STAR: SocioTechnical Approach to Red Teaming Language Models
STAR: SocioTechnical Approach to Red Teaming Language Models
Laura Weidinger
John F. J. Mellor
Bernat Guillen Pegueroles
Nahema Marchal
Ravin Kumar
...
Mark Diaz
Stevie Bergman
Mikel Rodriguez
Verena Rieser
William S. Isaac
VLM
39
7
0
17 Jun 2024
Improving Alignment and Robustness with Circuit Breakers
Improving Alignment and Robustness with Circuit Breakers
Andy Zou
Long Phan
Justin Wang
Derek Duenas
Maxwell Lin
Maksym Andriushchenko
Rowan Wang
Zico Kolter
Matt Fredrikson
Dan Hendrycks
AAML
39
71
0
06 Jun 2024
A Comprehensive Overview of Large Language Models (LLMs) for Cyber
  Defences: Opportunities and Directions
A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions
Mohammed Hassanin
Nour Moustafa
36
26
0
23 May 2024
Mitigating Exaggerated Safety in Large Language Models
Mitigating Exaggerated Safety in Large Language Models
Ruchi Bhalani
Ruchira Ray
29
1
0
08 May 2024
Holistic Safety and Responsibility Evaluations of Advanced AI Models
Holistic Safety and Responsibility Evaluations of Advanced AI Models
Laura Weidinger
Joslyn Barnhart
Jenny Brennan
Christina Butterfield
Susie Young
...
Sebastian Farquhar
Lewis Ho
Iason Gabriel
Allan Dafoe
William S. Isaac
ELM
34
8
0
22 Apr 2024
The Minimum Information about CLinical Artificial Intelligence Checklist
  for Generative Modeling Research (MI-CLAIM-GEN)
The Minimum Information about CLinical Artificial Intelligence Checklist for Generative Modeling Research (MI-CLAIM-GEN)
Brenda Y. Miao
Irene Y. Chen
C. Y. Williams
Jaysón M. Davidson
Augusto Garcia-Agundez
...
Bin Yu
Milena Gianfrancesco
A. Butte
Beau Norgeot
Madhumita Sushil
VLM
34
2
0
05 Mar 2024
Dialect prejudice predicts AI decisions about people's character,
  employability, and criminality
Dialect prejudice predicts AI decisions about people's character, employability, and criminality
Valentin Hofmann
Pratyusha Kalluri
Dan Jurafsky
Sharese King
83
40
0
01 Mar 2024
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science
Xiangru Tang
Qiao Jin
Kunlun Zhu
Tongxin Yuan
Yichi Zhang
...
Jian Tang
Zhuosheng Zhang
Arman Cohan
Zhiyong Lu
Mark B. Gerstein
LLMAG
ELM
17
40
0
06 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
34
78
0
25 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO
  and Toxicity
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
68
95
0
03 Jan 2024
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
129
117
0
09 Nov 2023
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden
  Harms and Biases
Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Rishabh Bhardwaj
Soujanya Poria
ALM
49
14
0
22 Oct 2023
Probing LLMs for hate speech detection: strengths and vulnerabilities
Probing LLMs for hate speech detection: strengths and vulnerabilities
Sarthak Roy
Ashish Harshavardhan
Animesh Mukherjee
Punyajoy Saha
63
32
0
19 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by
  Adversarial Attacks
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
147
145
0
16 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future
  Directions
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
42
41
0
16 Oct 2023
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
  Robustness of Large Language Models
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models
Alex Mei
Sharon Levy
William Yang Wang
AAML
34
7
0
14 Oct 2023
Jailbreak and Guard Aligned Language Models with Only Few In-Context
  Demonstrations
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Zeming Wei
Yifei Wang
Ang Li
Yichuan Mo
Yisen Wang
40
235
0
10 Oct 2023
The Participatory Turn in AI Design: Theoretical Foundations and the
  Current State of Practice
The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice
Fernando Delgado
Stephen Yang
Michael A. Madaio
Qian Yang
73
100
0
02 Oct 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
115
300
0
19 Sep 2023
On the Adversarial Robustness of Multi-Modal Foundation Models
On the Adversarial Robustness of Multi-Modal Foundation Models
Christian Schlarmann
Matthias Hein
AAML
110
85
0
21 Aug 2023
Red-Teaming the Stable Diffusion Safety Filter
Red-Teaming the Stable Diffusion Safety Filter
Javier Rando
Daniel Paleka
David Lindner
Lennard Heim
Florian Tramèr
DiffM
124
183
0
03 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
444
0
23 Aug 2022
1