Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.05887
Cited By
v1
v2 (latest)
Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples
13 April 2020
Maximilian Mozes
Pontus Stenetorp
Bennett Kleinberg
Lewis D. Griffin
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples"
46 / 46 papers shown
Title
Exploring Gradient-Guided Masked Language Model to Detect Textual Adversarial Attacks
Xiaomei Zhang
Zhaoxi Zhang
Yanjun Zhang
Xufei Zheng
L. Zhang
Shengshan Hu
Shirui Pan
AAML
58
0
0
08 Apr 2025
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
131
13
0
08 Jun 2024
Reversible Jump Attack to Textual Classifiers with Modification Reduction
Mingze Ni
Zhensu Sun
Wei Liu
AAML
56
0
0
21 Mar 2024
RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions
Yuan Zhang
Xiao Wang
Zhiheng Xi
Han Xia
Tao Gui
Qi Zhang
Xuanjing Huang
88
4
0
26 Feb 2024
Fast Adversarial Training against Textual Adversarial Attacks
Yichen Yang
Xin Liu
Kun He
AAML
47
4
0
23 Jan 2024
ROIC-DM: Robust Text Inference and Classification via Diffusion Model
Shilong Yuan
Wei Yuan
Hongzhi Yin
Tieke He
DiffM
96
3
0
07 Jan 2024
Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention
Lujia Shen
Yuwen Pu
Shouling Ji
Changjiang Li
Xuhong Zhang
Chunpeng Ge
Ting Wang
AAML
69
6
0
29 Nov 2023
Toward Stronger Textual Attack Detectors
Pierre Colombo
Marine Picot
Nathan Noiry
Guillaume Staerman
Pablo Piantanida
561
5
0
21 Oct 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
93
23
0
28 Sep 2023
What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples
Shakila Mahjabin Tonni
Mark Dras
TDI
AAML
GAN
60
0
0
19 Sep 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
87
87
0
24 Aug 2023
Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks
Xinyu Zhang
Hanbin Hong
Yuan Hong
Peng Huang
Binghui Wang
Zhongjie Ba
Kui Ren
SILM
129
25
0
31 Jul 2023
Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)
Bushra Sabir
Muhammad Ali Babar
Sharif Abuadbba
SILM
74
10
0
03 Jul 2023
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection
Songyang Gao
Shihan Dou
Qi Zhang
Xuanjing Huang
Jin Ma
Yingchun Shan
AAML
55
3
0
27 Jun 2023
VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations
Hoang-Quoc Nguyen-Son
Seira Hidano
Kazuhide Fukushima
S. Kiyomoto
Isao Echizen
57
0
0
02 Jun 2023
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework
Yangyi Chen
Hongcheng Gao
Ganqu Cui
Lifan Yuan
Dehan Kong
...
Longtao Huang
H. Xue
Zhiyuan Liu
Maosong Sun
Heng Ji
AAML
ELM
101
6
0
29 May 2023
The Best Defense is Attack: Repairing Semantics in Textual Adversarial Examples
Heng Yang
Ke Li
AAML
114
3
0
06 May 2023
Masked Language Model Based Textual Adversarial Example Detection
Xiaomei Zhang
Zhaoxi Zhang
Qi Zhong
Xufei Zheng
Yanjun Zhang
Shengshan Hu
L. Zhang
AAML
101
2
0
18 Apr 2023
TextDefense: Adversarial Text Detection based on Word Importance Entropy
Lujia Shen
Xuhong Zhang
S. Ji
Yuwen Pu
Chunpeng Ge
Xing Yang
Yanghe Feng
AAML
59
8
0
12 Feb 2023
Less is More: Understanding Word-level Textual Adversarial Attack via n-gram Frequency Descend
Ning Lu
Shengcai Liu
Zhirui Zhang
Qi. Wang
Haifeng Liu
Jiaheng Zhang
AAML
152
8
0
06 Feb 2023
TextShield: Beyond Successfully Detecting Adversarial Sentences in Text Classification
Lingfeng Shen
Ze Zhang
Haiyun Jiang
Ying-Cong Chen
AAML
113
5
0
03 Feb 2023
Disentangled Text Representation Learning with Information-Theoretic Perspective for Adversarial Robustness
Jiahao Zhao
Wenji Mao
DRL
OOD
61
3
0
26 Oct 2022
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation
Fan Yin
Yao Li
Cho-Jui Hsieh
Kai-Wei Chang
AAML
93
4
0
22 Oct 2022
TCAB: A Large-Scale Text Classification Attack Benchmark
Kalyani Asthana
Zhouhang Xie
Wencong You
Adam Noack
Jonathan Brophy
Sameer Singh
Daniel Lowd
119
3
0
21 Oct 2022
Identifying Human Strategies for Generating Word-Level Adversarial Examples
Maximilian Mozes
Bennett Kleinberg
Lewis D. Griffin
AAML
116
2
0
20 Oct 2022
Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP
Yangyi Chen
Hongcheng Gao
Ganqu Cui
Fanchao Qi
Longtao Huang
Zhiyuan Liu
Maosong Sun
SILM
62
56
0
19 Oct 2022
Textwash -- automated open-source text anonymisation
Bennett Kleinberg
Toby P Davies
Maximilian Mozes
64
13
0
27 Aug 2022
Rethinking Textual Adversarial Defense for Pre-trained Language Models
Jiayi Wang
Rongzhou Bao
Zhuosheng Zhang
Hai Zhao
AAML
SILM
56
11
0
21 Jul 2022
Towards Explainability in NLP: Analyzing and Calculating Word Saliency through Word Properties
Jialiang Dong
Zhitao Guan
Longfei Wu
Zijian Zhang
Xiaojiang Du
XAI
AAML
FAtt
MILM
88
2
0
17 Jul 2022
Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations
Na Liu
Mark Dras
Wei Emma Zhang
AAML
44
6
0
29 Apr 2022
Residue-Based Natural Language Adversarial Attack Detection
Vyas Raina
Mark Gales
AAML
72
12
0
17 Apr 2022
"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks
Edoardo Mosca
Shreyash Agarwal
Javier Rando
Georg Groh
AAML
95
31
0
10 Apr 2022
Adversarial Training for Improving Model Robustness? Look at Both Prediction and Interpretation
Hanjie Chen
Yangfeng Ji
OOD
AAML
VLM
101
21
0
23 Mar 2022
Input-specific Attention Subnetworks for Adversarial Detection
Emil Biju
Anirudh Sriram
Pratyush Kumar
Mitesh M Khapra
AAML
40
5
0
23 Mar 2022
A Prompting-based Approach for Adversarial Example Generation and Robustness Enhancement
Yuting Yang
Pei Huang
Juan Cao
Jintao Li
Yun Lin
Jin Song Dong
Feifei Ma
Jian Zhang
AAML
SILM
96
13
0
21 Mar 2022
Detection of Word Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation
Kiyoon Yoo
Jangho Kim
Jiho Jang
Nojun Kwak
225
41
0
03 Mar 2022
Identifying Adversarial Attacks on Text Classifiers
Zhouhang Xie
Jonathan Brophy
Adam Noack
Wencong You
Kalyani Asthana
Carter Perkins
Sabrina Reis
Sameer Singh
Daniel Lowd
AAML
84
10
0
21 Jan 2022
Detecting Textual Adversarial Examples through Randomized Substitution and Vote
Xiaosen Wang
Yifeng Xiong
Kun He
AAML
59
11
0
13 Sep 2021
TREATED:Towards Universal Defense against Textual Adversarial Attacks
Bin Zhu
Zhaoquan Gu
Le Wang
Zhihong Tian
AAML
45
8
0
13 Sep 2021
Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification
Maximilian Mozes
Max Bartolo
Pontus Stenetorp
Bennett Kleinberg
Lewis D. Griffin
DeLMO
AAML
SILM
47
7
0
09 Sep 2021
Defending Pre-trained Language Models from Adversarial Word Substitutions Without Performance Sacrifice
Rongzhou Bao
Jiayi Wang
Hai Zhao
AAML
56
43
0
30 May 2021
Adversarial Examples Detection with Bayesian Neural Network
Yao Li
Tongyi Tang
Cho-Jui Hsieh
T. C. Lee
GAN
AAML
58
3
0
18 May 2021
Achieving Model Robustness through Discrete Adversarial Training
Maor Ivgi
Jonathan Berant
AAML
71
28
0
11 Apr 2021
Enhancing Pre-trained Language Model with Lexical Simplification
Rongzhou Bao
Jiayi Wang
Zhuosheng Zhang
Hai Zhao
39
2
0
30 Dec 2020
SHIELD: Defending Textual Neural Networks against Multiple Black-Box Adversarial Attacks with Stochastic Multi-Expert Patcher
Thai Le
Noseong Park
Dongwon Lee
AAML
46
21
0
17 Nov 2020
Manipulating emotions for ground truth emotion analysis
Bennett Kleinberg
20
2
0
16 Jun 2020
1