ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.10683
  4. Cited By
Why Should Adversarial Perturbations be Imperceptible? Rethink the
  Research Paradigm in Adversarial NLP

Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP

19 October 2022
Yangyi Chen
Hongcheng Gao
Ganqu Cui
Fanchao Qi
Longtao Huang
Zhiyuan Liu
Maosong Sun
    SILM
ArXivPDFHTML

Papers citing "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP"

41 / 41 papers shown
Title
Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation
Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation
CheolWon Na
YunSeok Choi
Jee-Hyong Lee
AAML
37
0
0
18 Apr 2025
Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense
Shuyang Hao
Y. Wang
Bryan Hooi
Ming Yang
Jiaheng Liu
Chengcheng Tang
Zi Huang
Yujun Cai
AAML
54
0
0
14 Mar 2025
GuidedBench: Equipping Jailbreak Evaluation with Guidelines
GuidedBench: Equipping Jailbreak Evaluation with Guidelines
Ruixuan Huang
Xunguang Wang
Zongjie Li
Daoyuan Wu
Shuai Wang
ALM
ELM
61
0
0
24 Feb 2025
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Nilanjana Das
Edward Raff
Manas Gaur
AAML
106
1
0
20 Dec 2024
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in
  Red Teaming GenAI
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI
Ambrish Rawat
Stefan Schoepf
Giulio Zizzo
Giandomenico Cornacchia
Muhammad Zaid Hameed
...
Elizabeth M. Daly
Mark Purcell
P. Sattigeri
Pin-Yu Chen
Kush R. Varshney
AAML
40
7
0
23 Sep 2024
CERT-ED: Certifiably Robust Text Classification for Edit Distance
CERT-ED: Certifiably Robust Text Classification for Edit Distance
Zhuoqun Huang
Yipeng Wang
Seunghee Shin
Benjamin I. P. Rubinstein
AAML
48
1
0
01 Aug 2024
Human-Interpretable Adversarial Prompt Attack on Large Language Models
  with Situational Context
Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context
Nilanjana Das
Edward Raff
Manas Gaur
AAML
35
2
0
19 Jul 2024
Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Shangqing Tu
Zhuoran Pan
Wenxuan Wang
Zhexin Zhang
Yuliang Sun
Jifan Yu
Hongning Wang
Lei Hou
Juanzi Li
ALM
42
1
0
17 Jun 2024
On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and
  Latent Concept
On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept
Guangliang Liu
Haitao Mao
Bochuan Cao
Zhiyu Xue
K. Johnson
Jiliang Tang
Rongrong Wang
LRM
34
9
0
04 Jun 2024
TAIA: Large Language Models are Out-of-Distribution Data Learners
TAIA: Large Language Models are Out-of-Distribution Data Learners
Shuyang Jiang
Yusheng Liao
Ya-Qin Zhang
Yu Wang
Yanfeng Wang
29
3
0
30 May 2024
Efficient Adversarial Training in LLMs with Continuous Attacks
Efficient Adversarial Training in LLMs with Continuous Attacks
Sophie Xhonneux
Alessandro Sordoni
Stephan Günnemann
Gauthier Gidel
Leo Schwinn
AAML
45
45
0
24 May 2024
Watch Out for Your Guidance on Generation! Exploring Conditional Backdoor Attacks against Large Language Models
Watch Out for Your Guidance on Generation! Exploring Conditional Backdoor Attacks against Large Language Models
Jiaming He
Wenbo Jiang
Guanyu Hou
Wenshu Fan
Rui Zhang
Hongwei Li
AAML
53
0
0
23 Apr 2024
Uncovering Safety Risks of Large Language Models through Concept
  Activation Vector
Uncovering Safety Risks of Large Language Models through Concept Activation Vector
Zhihao Xu
Ruixuan Huang
Changyu Chen
Shuai Wang
Xiting Wang
LLMSV
34
10
0
18 Apr 2024
A StrongREJECT for Empty Jailbreaks
A StrongREJECT for Empty Jailbreaks
Alexandra Souly
Qingyuan Lu
Dillon Bowen
Tu Trinh
Elvis Hsieh
...
Pieter Abbeel
Justin Svegliato
Scott Emmons
Olivia Watkins
Sam Toyer
33
67
0
15 Feb 2024
In-Context Learning Can Re-learn Forbidden Tasks
In-Context Learning Can Re-learn Forbidden Tasks
Sophie Xhonneux
David Dobre
Jian Tang
Gauthier Gidel
Dhanya Sridhar
21
3
0
08 Feb 2024
Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation
  and Echopraxia
Rapid Optimization for Jailbreaking LLMs via Subconscious Exploitation and Echopraxia
Guangyu Shen
Shuyang Cheng
Kai-xian Zhang
Guanhong Tao
Shengwei An
Lu Yan
Zhuo Zhang
Shiqing Ma
Xiangyu Zhang
17
10
0
08 Feb 2024
HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack
  on Text
HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text
Han Liu
Zhi Xu
Xiaotong Zhang
Feng Zhang
Fenglong Ma
Hongyang Chen
Hong Yu
Xianchao Zhang
AAML
22
7
0
02 Feb 2024
TARGET: Template-Transferable Backdoor Attack Against Prompt-based NLP
  Models via GPT4
TARGET: Template-Transferable Backdoor Attack Against Prompt-based NLP Models via GPT4
Zihao Tan
Qingliang Chen
Yongjian Huang
Chen Liang
SILM
AAML
36
3
0
29 Nov 2023
Generating Valid and Natural Adversarial Examples with Large Language
  Models
Generating Valid and Natural Adversarial Examples with Large Language Models
Zimu Wang
Wei Wang
Qi Chen
Qiufeng Wang
Anh Nguyen
AAML
21
4
0
20 Nov 2023
Guiding LLM to Fool Itself: Automatically Manipulating Machine Reading
  Comprehension Shortcut Triggers
Guiding LLM to Fool Itself: Automatically Manipulating Machine Reading Comprehension Shortcut Triggers
Mosh Levy
Shauli Ravfogel
Yoav Goldberg
40
5
0
24 Oct 2023
CT-GAT: Cross-Task Generative Adversarial Attack based on
  Transferability
CT-GAT: Cross-Task Generative Adversarial Attack based on Transferability
Minxuan Lv
Chengwei Dai
Kun Li
Wei Zhou
Song Hu
AAML
37
6
0
22 Oct 2023
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
  Robustness of Large Language Models
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models
Alex Mei
Sharon Levy
William Yang Wang
AAML
36
7
0
14 Oct 2023
Certifying LLM Safety against Adversarial Prompting
Certifying LLM Safety against Adversarial Prompting
Aounon Kumar
Chirag Agarwal
Suraj Srinivas
Aaron Jiaxun Li
S. Feizi
Himabindu Lakkaraju
AAML
27
164
0
06 Sep 2023
Making Pre-trained Language Models both Task-solvers and
  Self-calibrators
Making Pre-trained Language Models both Task-solvers and Self-calibrators
Yangyi Chen
Xingyao Wang
Heng Ji
18
0
0
21 Jul 2023
Evaluating the Robustness of Text-to-image Diffusion Models against
  Real-world Attacks
Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks
Hongcheng Gao
Hao Zhang
Yinpeng Dong
Zhijie Deng
AAML
35
21
0
16 Jun 2023
COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in
  Language Models
COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models
Zihao Tan
Qingliang Chen
Wenbin Zhu
Yongjian Huang
AAML
SILM
28
3
0
09 Jun 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
  and LLMs Evaluations
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
39
73
0
07 Jun 2023
From Adversarial Arms Race to Model-centric Evaluation: Motivating a
  Unified Automatic Robustness Evaluation Framework
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework
Yangyi Chen
Hongcheng Gao
Ganqu Cui
Lifan Yuan
Dehan Kong
...
Longtao Huang
H. Xue
Zhiyuan Liu
Maosong Sun
Heng Ji
AAML
ELM
27
6
0
29 May 2023
How do humans perceive adversarial text? A reality check on the validity
  and naturalness of word-based adversarial attacks
How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks
Salijona Dyrmishi
Salah Ghamizi
Maxime Cordy
AAML
15
17
0
24 May 2023
Translate your gibberish: black-box adversarial attack on machine
  translation systems
Translate your gibberish: black-box adversarial attack on machine translation systems
Andrei Chertkov
Olga Tsymboi
Mikhail Aleksandrovich Pautov
Ivan V. Oseledets
AAML
23
3
0
20 Mar 2023
Verifying the Robustness of Automatic Credibility Assessment
Verifying the Robustness of Automatic Credibility Assessment
Piotr Przybyła
A. Shvets
Horacio Saggion
DeLMO
AAML
30
6
0
14 Mar 2023
On the Security Vulnerabilities of Text-to-SQL Models
On the Security Vulnerabilities of Text-to-SQL Models
Xutan Peng
Yipeng Zhang
Jingfeng Yang
Mark Stevenson
SILM
28
10
0
28 Nov 2022
A Close Look into the Calibration of Pre-trained Language Models
A Close Look into the Calibration of Pre-trained Language Models
Yangyi Chen
Lifan Yuan
Ganqu Cui
Zhiyuan Liu
Heng Ji
33
43
0
31 Oct 2022
Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text
  Style Transfer
Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer
Fanchao Qi
Yangyi Chen
Xurui Zhang
Mukai Li
Zhiyuan Liu
Maosong Sun
AAML
SILM
82
175
0
14 Oct 2021
Types of Out-of-Distribution Texts and How to Detect Them
Types of Out-of-Distribution Texts and How to Detect Them
Udit Arora
William Huang
He He
OODD
225
97
0
14 Sep 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Joey Tianyi Zhou
Christopher Ré
AAML
OffRL
OOD
154
136
0
13 Jan 2021
Robust Encodings: A Framework for Combating Adversarial Typos
Robust Encodings: A Framework for Combating Adversarial Typos
Erik Jones
Robin Jia
Aditi Raghunathan
Percy Liang
AAML
142
102
0
04 May 2020
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
232
438
0
25 Sep 2019
Certified Robustness to Adversarial Word Substitutions
Certified Robustness to Adversarial Word Substitutions
Robin Jia
Aditi Raghunathan
Kerem Göksel
Percy Liang
AAML
183
291
0
03 Sep 2019
Generating Natural Language Adversarial Examples
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
245
914
0
21 Apr 2018
Adversarial Example Generation with Syntactically Controlled Paraphrase
  Networks
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
Mohit Iyyer
John Wieting
Kevin Gimpel
Luke Zettlemoyer
AAML
GAN
205
711
0
17 Apr 2018
1