Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.11932
Cited By
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
27 July 2019
Di Jin
Zhijing Jin
Qiufeng Wang
Peter Szolovits
SILM
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment"
50 / 196 papers shown
Title
IM-BERT: Enhancing Robustness of BERT through the Implicit Euler Method
Mihyeon Kim
Juhyoung Park
Youngbin Kim
34
0
0
11 May 2025
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
Shashank Kapoor
Sanjay Surendranath Girija
Lakshit Arora
Dipen Pradhan
Ankit Shetgaonkar
Aman Raj
AAML
77
0
0
06 May 2025
MatMMFuse: Multi-Modal Fusion model for Material Property Prediction
Abhiroop Bhattacharya
Sylvain G. Cloutier
AI4CE
46
0
0
30 Apr 2025
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
80
0
0
21 Apr 2025
CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent
Liang-bo Ning
Shijie Wang
Wenqi Fan
Qing Li
Xin Xu
Hao Chen
Feiran Huang
AAML
30
17
0
13 Apr 2025
Adversarial Training of Reward Models
Alexander Bukharin
Haifeng Qian
Shengyang Sun
Adithya Renduchintala
Soumye Singhal
Zihan Wang
Oleksii Kuchaiev
Olivier Delalleau
T. Zhao
AAML
32
0
0
08 Apr 2025
Get the Agents Drunk: Memory Perturbations in Autonomous Agent-based Recommender Systems
Shiyi Yang
Zhibo Hu
Chen Wang
Tong Yu
Xiwei Xu
Liming Zhu
Lina Yao
AAML
42
0
0
31 Mar 2025
TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors
Jingyi Zheng
Junfeng Wang
Zhen Sun
Wenhan Dong
Yule Liu
Xinlei He
AAML
50
0
0
10 Mar 2025
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
V. Cevher
AAML
51
0
0
24 Feb 2025
SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks
Yue Gao
Ilia Shumailov
Kassem Fawaz
AAML
148
0
0
21 Feb 2025
Confidence Elicitation: A New Attack Vector for Large Language Models
Brian Formento
Chuan-Sheng Foo
See-Kiong Ng
AAML
99
0
0
07 Feb 2025
On Adversarial Robustness of Language Models in Transfer Learning
Bohdan Turbal
Anastasiia Mazur
Jiaxu Zhao
Mykola Pechenizkiy
AAML
45
0
0
03 Jan 2025
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Nilanjana Das
Edward Raff
Manas Gaur
AAML
106
1
0
20 Dec 2024
Adversarial Prompt Distillation for Vision-Language Models
Lin Luo
Xin Wang
Bojia Zi
Shihao Zhao
Xingjun Ma
Yu-Gang Jiang
AAML
VLM
84
1
0
22 Nov 2024
IAE: Irony-based Adversarial Examples for Sentiment Analysis Systems
Xiaoyin Yi
Jiacheng Huang
AAML
62
0
0
12 Nov 2024
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios
Junchao Wu
Runzhe Zhan
Derek F. Wong
Shu Yang
Xinyi Yang
Yulin Yuan
Lidia S. Chao
DeLMO
58
2
0
31 Oct 2024
ID-Free Not Risk-Free: LLM-Powered Agents Unveil Risks in ID-Free Recommender Systems
Zehua Wang
Min Gao
Junliang Yu
Xinyi Gao
Quoc Viet Hung Nguyen
S. Sadiq
Hongzhi Yin
AAML
54
3
0
18 Sep 2024
Contextual Breach: Assessing the Robustness of Transformer-based QA Models
Asir Saadat
Nahian Ibn Asad
Md Farhan Ishmam
AAML
46
0
0
17 Sep 2024
Enhancing adversarial robustness in Natural Language Inference using explanations
Alexandros Koulakos
Maria Lymperaiou
Giorgos Filandrianos
Giorgos Stamou
SILM
AAML
43
0
0
11 Sep 2024
Adversarial Attacks on Data Attribution
Xinhe Wang
Pingbang Hu
Junwei Deng
Jiaqi W. Ma
TDI
63
0
0
09 Sep 2024
CERT-ED: Certifiably Robust Text Classification for Edit Distance
Zhuoqun Huang
Yipeng Wang
Seunghee Shin
Benjamin I. P. Rubinstein
AAML
56
1
0
01 Aug 2024
Jailbreaking Text-to-Image Models with LLM-Based Agents
Yingkai Dong
Zheng Li
Xiangtao Meng
Ning Yu
Shanqing Guo
LLMAG
45
13
0
01 Aug 2024
Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context
Nilanjana Das
Edward Raff
Manas Gaur
AAML
35
2
0
19 Jul 2024
DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising
Zhenhao Li
Huichi Zhou
Marek Rei
Lucia Specia
DiffM
34
0
0
28 Jun 2024
Spiking Convolutional Neural Networks for Text Classification
Changze Lv
Jianhan Xu
Xiaoqing Zheng
56
28
0
27 Jun 2024
Obfuscating IoT Device Scanning Activity via Adversarial Example Generation
Haocong Li
Yaxin Zhang
Long Cheng
Wenjia Niu
Haining Wang
Qiang Li
AAML
41
0
0
17 Jun 2024
MoE-RBench
\texttt{MoE-RBench}
MoE-RBench
: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
76
5
0
17 Jun 2024
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
Taiming Lu
Lingfeng Shen
Xinyu Yang
Weiting Tan
Beidi Chen
Huaxiu Yao
63
2
0
12 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
40
13
0
08 Jun 2024
Exploiting the Layered Intrinsic Dimensionality of Deep Models for Practical Adversarial Training
Enes Altinisik
Safa Messaoud
Husrev Taha Sencar
Hassan Sajjad
Sanjay Chawla
AAML
48
0
0
27 May 2024
A Comprehensive Survey on Data Augmentation
Zaitian Wang
Pengfei Wang
Kunpeng Liu
Pengyang Wang
Yanjie Fu
Chang-Tien Lu
Charu Aggarwal
Jian Pei
Yuanchun Zhou
ViT
109
22
0
15 May 2024
Revisiting character-level adversarial attacks
Elias Abad Rocamora
Yongtao Wu
Fanghui Liu
Grigorios G. Chrysos
V. Cevher
AAML
39
3
0
07 May 2024
Advancing the Robustness of Large Language Models through Self-Denoised Smoothing
Jiabao Ji
Bairu Hou
Zhen Zhang
Guanhua Zhang
Wenqi Fan
Qing Li
Yang Zhang
Gaowen Liu
Sijia Liu
Shiyu Chang
AAML
43
6
0
18 Apr 2024
VertAttack: Taking advantage of Text Classifiers' horizontal vision
Jonathan Rusert
AAML
43
1
0
12 Apr 2024
Monotonic Paraphrasing Improves Generalization of Language Model Prompting
Qin Liu
Fei Wang
Nan Xu
Tianyi Yan
Tao Meng
Muhao Chen
LRM
43
7
0
24 Mar 2024
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
48
6
0
28 Feb 2024
RITFIS: Robust input testing framework for LLMs-based intelligent software
Ming-Ming Xiao
Yan Xiao
Hai Dong
Shunhui Ji
Pengcheng Zhang
AAML
52
5
0
21 Feb 2024
Contrastive Instruction Tuning
Tianyi Yan
Fei Wang
James Y. Huang
Wenxuan Zhou
Fan Yin
Aram Galstyan
Wenpeng Yin
Muhao Chen
ALM
27
5
0
17 Feb 2024
Comprehensive Assessment of Jailbreak Attacks Against LLMs
Junjie Chu
Yugeng Liu
Ziqing Yang
Xinyue Shen
Michael Backes
Yang Zhang
AAML
40
67
0
08 Feb 2024
Adversarial Text Purification: A Large Language Model Approach for Defense
Raha Moraffah
Shubh Khandelwal
Amrita Bhattacharjee
Huan Liu
DeLMO
AAML
36
5
0
05 Feb 2024
ALISON: Fast and Effective Stylometric Authorship Obfuscation
Eric Xing
Saranya Venkatraman
Thai V. Le
Dongwon Lee
DeLMO
22
1
0
01 Feb 2024
Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges
Aiqi Jiang
A. Zubiaga
AAML
31
3
0
17 Jan 2024
IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages
Muhammad Farid Adilazuarda
Samuel Cahyawijaya
Genta Indra Winata
Pascale Fung
Ayu Purwarianti
47
11
0
21 Nov 2023
Towards Effective Paraphrasing for Information Disguise
Anmol Agarwal
Shrey Gupta
Vamshi Krishna Bonagiri
Manas Gaur
Joseph M. Reagle
Ponnurangam Kumaraguru
40
3
0
08 Nov 2023
Quantifying Uncertainty in Natural Language Explanations of Large Language Models
Sree Harsha Tanneru
Chirag Agarwal
Himabindu Lakkaraju
LRM
27
14
0
06 Nov 2023
Robustifying Language Models with Test-Time Adaptation
Noah T. McDermott
Junfeng Yang
Chengzhi Mao
24
2
0
29 Oct 2023
Toward Stronger Textual Attack Detectors
Pierre Colombo
Marine Picot
Nathan Noiry
Guillaume Staerman
Pablo Piantanida
62
5
0
21 Oct 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
40
21
0
28 Sep 2023
Are Large Language Models Really Robust to Word-Level Perturbations?
Haoyu Wang
Guozheng Ma
Cong Yu
Ning Gui
Linrui Zhang
...
Sen Zhang
Li Shen
Xueqian Wang
Peilin Zhao
Dacheng Tao
KELM
28
22
0
20 Sep 2023
A Classification-Guided Approach for Adversarial Attacks against Neural Machine Translation
Sahar Sadrizadeh
Ljiljana Dolamic
P. Frossard
AAML
SILM
44
2
0
29 Aug 2023
1
2
3
4
Next