Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.04636
Cited By
"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks
10 April 2022
Edoardo Mosca
Shreyash Agarwal
Javier Rando
Georg Groh
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
""That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks"
21 / 21 papers shown
Title
Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation
CheolWon Na
YunSeok Choi
Jee-Hyong Lee
AAML
37
0
0
18 Apr 2025
Exploring Gradient-Guided Masked Language Model to Detect Textual Adversarial Attacks
Xiaomei Zhang
Zhaoxi Zhang
Yanjun Zhang
Xufei Zheng
L. Zhang
Shengshan Hu
Shirui Pan
AAML
27
0
0
08 Apr 2025
Learning on LLM Output Signatures for gray-box LLM Behavior Analysis
Guy Bar-Shalom
Fabrizio Frasca
Derek Lim
Yoav Gelberg
Yftah Ziser
Ran El-Yaniv
Gal Chechik
Haggai Maron
67
0
0
18 Mar 2025
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges
Jonas Becker
Jan Philip Wahle
Bela Gipp
Terry Ruas
31
9
0
24 May 2024
SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks
Brian Formento
Wenjie Feng
Chuan-Sheng Foo
Anh Tuan Luu
See-Kiong Ng
AAML
34
7
0
27 Mar 2024
ROIC-DM: Robust Text Inference and Classification via Diffusion Model
Shilong Yuan
Wei Yuan
Hongzhi Yin
Tieke He
DiffM
33
2
0
07 Jan 2024
Why do universal adversarial attacks work on large language models?: Geometry might be the answer
Varshini Subhash
Anna Bialas
Weiwei Pan
Finale Doshi-Velez
AAML
22
10
0
01 Sep 2023
Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)
Bushra Sabir
Muhammad Ali Babar
Sharif Abuadbba
SILM
42
8
0
03 Jul 2023
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection
Songyang Gao
Shihan Dou
Qi Zhang
Xuanjing Huang
Jin Ma
Yingchun Shan
AAML
13
3
0
27 Jun 2023
VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations
Hoang-Quoc Nguyen-Son
Seira Hidano
Kazuhide Fukushima
S. Kiyomoto
Isao Echizen
28
0
0
02 Jun 2023
Masked Language Model Based Textual Adversarial Example Detection
Xiaomei Zhang
Zhaoxi Zhang
Qi Zhong
Xufei Zheng
Yanjun Zhang
Shengshan Hu
L. Zhang
AAML
28
0
0
18 Apr 2023
IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models
Edoardo Mosca
Daryna Dementieva
Tohid Ebrahim Ajdari
Maximilian Kummeth
Kirill Gringauz
Yutong Zhou
Georg Groh
24
8
0
06 Mar 2023
TextShield: Beyond Successfully Detecting Adversarial Sentences in Text Classification
Lingfeng Shen
Ze Zhang
Haiyun Jiang
Ying-Cong Chen
AAML
41
5
0
03 Feb 2023
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation
Fan Yin
Yao Li
Cho-Jui Hsieh
Kai-Wei Chang
AAML
67
4
0
22 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
121
94
0
06 Oct 2022
An Interpretability Evaluation Benchmark for Pre-trained Language Models
Ya-Ming Shen
Lijie Wang
Ying-Cong Chen
Xinyan Xiao
Jing Liu
Hua Wu
37
4
0
28 Jul 2022
Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO
Javier Rando
Nasib Naimi
Thomas Baumann
Max Mathys
AAML
20
5
0
14 Jun 2022
Certified Robustness to Adversarial Word Substitutions
Robin Jia
Aditi Raghunathan
Kerem Göksel
Percy Liang
AAML
183
291
0
03 Sep 2019
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
245
915
0
21 Apr 2018
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
Mohit Iyyer
John Wieting
Kevin Gimpel
Luke Zettlemoyer
AAML
GAN
205
712
0
17 Apr 2018
Adversarial examples in the physical world
Alexey Kurakin
Ian Goodfellow
Samy Bengio
SILM
AAML
287
5,837
0
08 Jul 2016
1