BAE: BERT-based Adversarial Examples for Text Classification
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
- AAMLSILM
Abstract
Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans but which get misclassified by the model. We present BAE, a powerful black box attack for generating grammatically correct and semantically coherent adversarial examples. BAE replaces and inserts tokens in the original text by masking a portion of the text and leveraging a language model to generate alternatives for the masked tokens. Compared to prior work, we show that BAE performs a stronger attack on three widely used models for seven text classification datasets.
View on arXivComments on this paper
