"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

10 April 2022

Papers citing ""That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks"

21 / 21 papers shown

Title
Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation CheolWon Na YunSeok Choi Jee-Hyong Lee AAML 37 0 0 18 Apr 2025
Exploring Gradient-Guided Masked Language Model to Detect Textual Adversarial Attacks Xiaomei Zhang Zhaoxi Zhang Yanjun Zhang Xufei Zheng L. Zhang Shengshan Hu Shirui Pan AAML 27 0 0 08 Apr 2025
Learning on LLM Output Signatures for gray-box LLM Behavior Analysis Guy Bar-Shalom Fabrizio Frasca Derek Lim Yoav Gelberg Yftah Ziser Ran El-Yaniv Gal Chechik Haggai Maron 67 0 0 18 Mar 2025
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges Jonas Becker Jan Philip Wahle Bela Gipp Terry Ruas 31 9 0 24 May 2024
SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks Brian Formento Wenjie Feng Chuan-Sheng Foo Anh Tuan Luu See-Kiong Ng AAML 34 7 0 27 Mar 2024
ROIC-DM: Robust Text Inference and Classification via Diffusion Model Shilong Yuan Wei Yuan Hongzhi Yin Tieke He DiffM 33 2 0 07 Jan 2024
Why do universal adversarial attacks work on large language models?: Geometry might be the answer Varshini Subhash Anna Bialas Weiwei Pan Finale Doshi-Velez AAML 22 10 0 01 Sep 2023
Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT) Bushra Sabir Muhammad Ali Babar Sharif Abuadbba SILM 42 8 0 03 Jul 2023
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection Songyang Gao Shihan Dou Qi Zhang Xuanjing Huang Jin Ma Yingchun Shan AAML 13 3 0 27 Jun 2023
VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations Hoang-Quoc Nguyen-Son Seira Hidano Kazuhide Fukushima S. Kiyomoto Isao Echizen 28 0 0 02 Jun 2023
Masked Language Model Based Textual Adversarial Example Detection Xiaomei Zhang Zhaoxi Zhang Qi Zhong Xufei Zheng Yanjun Zhang Shengshan Hu L. Zhang AAML 28 0 0 18 Apr 2023
IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models Edoardo Mosca Daryna Dementieva Tohid Ebrahim Ajdari Maximilian Kummeth Kirill Gringauz Yutong Zhou Georg Groh 24 8 0 06 Mar 2023
TextShield: Beyond Successfully Detecting Adversarial Sentences in Text Classification Lingfeng Shen Ze Zhang Haiyun Jiang Ying-Cong Chen AAML 41 5 0 03 Feb 2023
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation Fan Yin Yao Li Cho-Jui Hsieh Kai-Wei Chang AAML 67 4 0 22 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review Dieuwke Hupkes Mario Giulianelli Verna Dankers Mikel Artetxe Yanai Elazar ... Leila Khalatbari Maria Ryskina Rita Frieske Ryan Cotterell Zhijing Jin 121 94 0 06 Oct 2022
An Interpretability Evaluation Benchmark for Pre-trained Language Models Ya-Ming Shen Lijie Wang Ying-Cong Chen Xinyan Xiao Jing Liu Hua Wu 37 4 0 28 Jul 2022
Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO Javier Rando Nasib Naimi Thomas Baumann Max Mathys AAML 20 5 0 14 Jun 2022
Certified Robustness to Adversarial Word Substitutions Robin Jia Aditi Raghunathan Kerem Göksel Percy Liang AAML 183 291 0 03 Sep 2019
Generating Natural Language Adversarial Examples M. Alzantot Yash Sharma Ahmed Elgohary Bo-Jhang Ho Mani B. Srivastava Kai-Wei Chang AAML 245 915 0 21 Apr 2018
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks Mohit Iyyer John Wieting Kevin Gimpel Luke Zettlemoyer AAML GAN 205 712 0 17 Apr 2018
Adversarial examples in the physical world Alexey Kurakin Ian Goodfellow Samy Bengio SILM AAML 287 5,837 0 08 Jul 2016