BAE: BERT-based Adversarial Examples for Text Classification

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

4 April 2020

Abstract

Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans but which get misclassified by the model. We present BAE, a powerful black box attack for generating grammatically correct and semantically coherent adversarial examples. BAE replaces and inserts tokens in the original text by masking a portion of the text and leveraging a language model to generate alternatives for the masked tokens. Compared to prior work, we show that BAE performs a stronger attack on three widely used models for seven text classification datasets.

View on arXiv

Comments on this paper