v1v2 (latest)

Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

13 April 2020

Papers citing "Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples"

46 / 46 papers shown

Title
Exploring Gradient-Guided Masked Language Model to Detect Textual Adversarial Attacks Xiaomei Zhang Zhaoxi Zhang Yanjun Zhang Xufei Zheng L. Zhang Shengshan Hu Shirui Pan AAML 58 0 0 08 Apr 2025
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models Hao Fang Jiawei Kong Wenbo Yu Bin Chen Jiawei Li Hao Wu Ke Xu Ke Xu AAML VLM 131 13 0 08 Jun 2024
Reversible Jump Attack to Textual Classifiers with Modification Reduction Mingze Ni Zhensu Sun Wei Liu AAML 56 0 0 21 Mar 2024
RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions Yuan Zhang Xiao Wang Zhiheng Xi Han Xia Tao Gui Qi Zhang Xuanjing Huang 88 4 0 26 Feb 2024
Fast Adversarial Training against Textual Adversarial Attacks Yichen Yang Xin Liu Kun He AAML 47 4 0 23 Jan 2024
ROIC-DM: Robust Text Inference and Classification via Diffusion Model Shilong Yuan Wei Yuan Hongzhi Yin Tieke He DiffM 96 3 0 07 Jan 2024
Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention Lujia Shen Yuwen Pu Shouling Ji Changjiang Li Xuhong Zhang Chunpeng Ge Ting Wang AAML 69 6 0 29 Nov 2023
Toward Stronger Textual Attack Detectors Pierre Colombo Marine Picot Nathan Noiry Guillaume Staerman Pablo Piantanida 561 5 0 21 Oct 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF Lingfeng Shen Sihao Chen Linfeng Song Lifeng Jin Baolin Peng Haitao Mi Daniel Khashabi Dong Yu 93 23 0 28 Sep 2023
What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples Shakila Mahjabin Tonni Mark Dras TDI AAML GAN 60 0 0 19 Sep 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities Maximilian Mozes Xuanli He Bennett Kleinberg Lewis D. Griffin 87 87 0 24 Aug 2023
Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks Xinyu Zhang Hanbin Hong Yuan Hong Peng Huang Binghui Wang Zhongjie Ba Kui Ren SILM 129 25 0 31 Jul 2023
Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT) Bushra Sabir Muhammad Ali Babar Sharif Abuadbba SILM 74 10 0 03 Jul 2023
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection Songyang Gao Shihan Dou Qi Zhang Xuanjing Huang Jin Ma Yingchun Shan AAML 55 3 0 27 Jun 2023
VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations Hoang-Quoc Nguyen-Son Seira Hidano Kazuhide Fukushima S. Kiyomoto Isao Echizen 57 0 0 02 Jun 2023
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework Yangyi Chen Hongcheng Gao Ganqu Cui Lifan Yuan Dehan Kong ... Longtao Huang H. Xue Zhiyuan Liu Maosong Sun Heng Ji AAML ELM 101 6 0 29 May 2023
The Best Defense is Attack: Repairing Semantics in Textual Adversarial Examples Heng Yang Ke Li AAML 114 3 0 06 May 2023
Masked Language Model Based Textual Adversarial Example Detection Xiaomei Zhang Zhaoxi Zhang Qi Zhong Xufei Zheng Yanjun Zhang Shengshan Hu L. Zhang AAML 101 2 0 18 Apr 2023
TextDefense: Adversarial Text Detection based on Word Importance Entropy Lujia Shen Xuhong Zhang S. Ji Yuwen Pu Chunpeng Ge Xing Yang Yanghe Feng AAML 59 8 0 12 Feb 2023
Less is More: Understanding Word-level Textual Adversarial Attack via n-gram Frequency Descend Ning Lu Shengcai Liu Zhirui Zhang Qi. Wang Haifeng Liu Jiaheng Zhang AAML 152 8 0 06 Feb 2023
TextShield: Beyond Successfully Detecting Adversarial Sentences in Text Classification Lingfeng Shen Ze Zhang Haiyun Jiang Ying-Cong Chen AAML 113 5 0 03 Feb 2023
Disentangled Text Representation Learning with Information-Theoretic Perspective for Adversarial Robustness Jiahao Zhao Wenji Mao DRL OOD 61 3 0 26 Oct 2022
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation Fan Yin Yao Li Cho-Jui Hsieh Kai-Wei Chang AAML 93 4 0 22 Oct 2022
TCAB: A Large-Scale Text Classification Attack Benchmark Kalyani Asthana Zhouhang Xie Wencong You Adam Noack Jonathan Brophy Sameer Singh Daniel Lowd 119 3 0 21 Oct 2022
Identifying Human Strategies for Generating Word-Level Adversarial Examples Maximilian Mozes Bennett Kleinberg Lewis D. Griffin AAML 116 2 0 20 Oct 2022
Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP Yangyi Chen Hongcheng Gao Ganqu Cui Fanchao Qi Longtao Huang Zhiyuan Liu Maosong Sun SILM 62 56 0 19 Oct 2022
Textwash -- automated open-source text anonymisation Bennett Kleinberg Toby P Davies Maximilian Mozes 64 13 0 27 Aug 2022
Rethinking Textual Adversarial Defense for Pre-trained Language Models Jiayi Wang Rongzhou Bao Zhuosheng Zhang Hai Zhao AAML SILM 56 11 0 21 Jul 2022
Towards Explainability in NLP: Analyzing and Calculating Word Saliency through Word Properties Jialiang Dong Zhitao Guan Longfei Wu Zijian Zhang Xiaojiang Du XAI AAML FAtt MILM 88 2 0 17 Jul 2022
Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations Na Liu Mark Dras Wei Emma Zhang AAML 44 6 0 29 Apr 2022
Residue-Based Natural Language Adversarial Attack Detection Vyas Raina Mark Gales AAML 72 12 0 17 Apr 2022
"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks Edoardo Mosca Shreyash Agarwal Javier Rando Georg Groh AAML 95 31 0 10 Apr 2022
Adversarial Training for Improving Model Robustness? Look at Both Prediction and Interpretation Hanjie Chen Yangfeng Ji OOD AAML VLM 101 21 0 23 Mar 2022
Input-specific Attention Subnetworks for Adversarial Detection Emil Biju Anirudh Sriram Pratyush Kumar Mitesh M Khapra AAML 40 5 0 23 Mar 2022
A Prompting-based Approach for Adversarial Example Generation and Robustness Enhancement Yuting Yang Pei Huang Juan Cao Jintao Li Yun Lin Jin Song Dong Feifei Ma Jian Zhang AAML SILM 96 13 0 21 Mar 2022
Detection of Word Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation Kiyoon Yoo Jangho Kim Jiho Jang Nojun Kwak 225 41 0 03 Mar 2022
Identifying Adversarial Attacks on Text Classifiers Zhouhang Xie Jonathan Brophy Adam Noack Wencong You Kalyani Asthana Carter Perkins Sabrina Reis Sameer Singh Daniel Lowd AAML 84 10 0 21 Jan 2022
Detecting Textual Adversarial Examples through Randomized Substitution and Vote Xiaosen Wang Yifeng Xiong Kun He AAML 59 11 0 13 Sep 2021
TREATED:Towards Universal Defense against Textual Adversarial Attacks Bin Zhu Zhaoquan Gu Le Wang Zhihong Tian AAML 45 8 0 13 Sep 2021
Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification Maximilian Mozes Max Bartolo Pontus Stenetorp Bennett Kleinberg Lewis D. Griffin DeLMO AAML SILM 47 7 0 09 Sep 2021
Defending Pre-trained Language Models from Adversarial Word Substitutions Without Performance Sacrifice Rongzhou Bao Jiayi Wang Hai Zhao AAML 56 43 0 30 May 2021
Adversarial Examples Detection with Bayesian Neural Network Yao Li Tongyi Tang Cho-Jui Hsieh T. C. Lee GAN AAML 58 3 0 18 May 2021
Achieving Model Robustness through Discrete Adversarial Training Maor Ivgi Jonathan Berant AAML 71 28 0 11 Apr 2021
Enhancing Pre-trained Language Model with Lexical Simplification Rongzhou Bao Jiayi Wang Zhuosheng Zhang Hai Zhao 39 2 0 30 Dec 2020
SHIELD: Defending Textual Neural Networks against Multiple Black-Box Adversarial Attacks with Stochastic Multi-Expert Patcher Thai Le Noseong Park Dongwon Lee AAML 46 21 0 17 Nov 2020
Manipulating emotions for ground truth emotion analysis Bennett Kleinberg 20 2 0 16 Jun 2020