ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.09154
  4. Cited By
Attacking Large Language Models with Projected Gradient Descent

Attacking Large Language Models with Projected Gradient Descent

14 February 2024
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Johannes Gasteiger
Stephan Günnemann
    AAML
    SILM
ArXivPDFHTML

Papers citing "Attacking Large Language Models with Projected Gradient Descent"

29 / 29 papers shown
Title
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Vincent Cohen-Addad
Johannes Gasteiger
Stephan Günnemann
AAML
93
2
0
24 Feb 2025
Endless Jailbreaks with Bijection Learning
Endless Jailbreaks with Bijection Learning
Brian R. Y. Huang
Maximilian Li
Leonard Tang
AAML
103
5
0
02 Oct 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
116
186
0
02 Apr 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
72
39
0
14 Feb 2024
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming
  and Robust Refusal
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika
Long Phan
Xuwang Yin
Andy Zou
Zifan Wang
...
Nathaniel Li
Steven Basart
Bo Li
David A. Forsyth
Dan Hendrycks
AAML
55
369
0
06 Feb 2024
Gradient-Based Language Model Red Teaming
Gradient-Based Language Model Red Teaming
Nevan Wichers
Carson E. Denison
Ahmad Beirami
40
26
0
30 Jan 2024
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra
Manolis Zampetakis
Paul Kassianik
Blaine Nelson
Hyrum Anderson
Yaron Singer
Amin Karbasi
58
239
0
04 Dec 2023
The Falcon Series of Open Language Models
The Falcon Series of Open Language Models
Ebtesam Almazrouei
Hamza Alobeidli
Abdulaziz Alshamsi
Alessandro Cappelli
Ruxandra-Aimée Cojocaru
...
Quentin Malartic
Daniele Mazzotta
Badreddine Noune
B. Pannier
Guilherme Penedo
AI4TS
ALM
124
420
0
28 Nov 2023
Adversarial Attacks and Defenses in Large Language Models: Old and New
  Threats
Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
Leo Schwinn
David Dobre
Stephan Günnemann
Gauthier Gidel
AAML
ELM
52
41
0
30 Oct 2023
Jailbreaking Black Box Large Language Models in Twenty Queries
Jailbreaking Black Box Large Language Models in Twenty Queries
Patrick Chao
Alexander Robey
Yan Sun
Hamed Hassani
George J. Pappas
Eric Wong
AAML
70
642
0
12 Oct 2023
Open Sesame! Universal Black Box Jailbreaking of Large Language Models
Open Sesame! Universal Black Box Jailbreaking of Large Language Models
Raz Lapid
Ron Langberg
Moshe Sipper
AAML
62
111
0
04 Sep 2023
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
160
1,376
0
27 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
201
11,636
0
18 Jul 2023
Adversarial Training for Graph Neural Networks: Pitfalls, Solutions, and
  New Directions
Adversarial Training for Graph Neural Networks: Pitfalls, Solutions, and New Directions
Lukas Gosch
Simon Geisler
Daniel Sturm
Bertrand Charpentier
Daniel Zügner
Stephan Günnemann
AAML
GNN
47
29
0
27 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
228
4,186
0
09 Jun 2023
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt
  Tuning and Discovery
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery
Yuxin Wen
Neel Jain
John Kirchenbauer
Micah Goldblum
Jonas Geiping
Tom Goldstein
VLM
DiffM
61
265
1
07 Feb 2023
TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven
  Optimization
TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven Optimization
Bairu Hou
Jinghan Jia
Yihua Zhang
Guanhua Zhang
Yang Zhang
Sijia Liu
Shiyu Chang
SILM
AAML
39
21
0
19 Dec 2022
Gradient-Based Constrained Sampling from Language Models
Gradient-Based Constrained Sampling from Language Models
Sachin Kumar
Biswajit Paria
Yulia Tsvetkov
BDL
57
55
0
25 May 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
42
627
0
07 Feb 2022
Robustness of Graph Neural Networks at Scale
Robustness of Graph Neural Networks at Scale
Simon Geisler
Tobias Schmidt
Hakan cSirin
Daniel Zügner
Aleksandar Bojchevski
Stephan Günnemann
AAML
55
127
0
26 Oct 2021
Generalization of Neural Combinatorial Solvers Through the Lens of
  Adversarial Robustness
Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness
Simon Geisler
Johanna Sommer
Jan Schuchardt
Aleksandar Bojchevski
Stephan Günnemann
AAML
29
39
0
21 Oct 2021
Gradient-based Adversarial Attacks against Text Transformers
Gradient-based Adversarial Attacks against Text Transformers
Chuan Guo
Alexandre Sablayrolles
Hervé Jégou
Douwe Kiela
SILM
134
234
0
15 Apr 2021
On Adaptive Attacks to Adversarial Example Defenses
On Adaptive Attacks to Adversarial Example Defenses
Florian Tramèr
Nicholas Carlini
Wieland Brendel
Aleksander Madry
AAML
188
827
0
19 Feb 2020
Universal Adversarial Triggers for Attacking and Analyzing NLP
Universal Adversarial Triggers for Attacking and Analyzing NLP
Eric Wallace
Shi Feng
Nikhil Kandpal
Matt Gardner
Sameer Singh
AAML
SILM
87
856
0
20 Aug 2019
Topology Attack and Defense for Graph Neural Networks: An Optimization
  Perspective
Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective
Kaidi Xu
Hongge Chen
Sijia Liu
Pin-Yu Chen
Tsui-Wei Weng
Mingyi Hong
Xue Lin
AAML
47
450
0
10 Jun 2019
Towards Deep Learning Models Resistant to Adversarial Attacks
Towards Deep Learning Models Resistant to Adversarial Attacks
Aleksander Madry
Aleksandar Makelov
Ludwig Schmidt
Dimitris Tsipras
Adrian Vladu
SILM
OOD
227
11,962
0
19 Jun 2017
Categorical Reparameterization with Gumbel-Softmax
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
219
5,323
0
03 Nov 2016
SGDR: Stochastic Gradient Descent with Warm Restarts
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
229
8,030
0
13 Aug 2016
Intriguing properties of neural networks
Intriguing properties of neural networks
Christian Szegedy
Wojciech Zaremba
Ilya Sutskever
Joan Bruna
D. Erhan
Ian Goodfellow
Rob Fergus
AAML
166
14,831
1
21 Dec 2013
1