Deceiving Google's Perspective API Built for Detecting Toxic Comments

Deceiving Google's Perspective API Built for Detecting Toxic Comments

27 February 2017

Hossein Hosseini

Radha Poovendran

Papers citing "Deceiving Google's Perspective API Built for Detecting Toxic Comments"

14 / 14 papers shown

Title
Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation Shiza Ali Jeremy Blackburn Gianluca Stringhini 80 0 0 24 Feb 2025
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns Xinyue Shen Yixin Wu Y. Qu Michael Backes Savvas Zannettou Yang Zhang 69 4 0 28 Jan 2025
Adversarial Hubness in Multi-Modal Retrieval Tingwei Zhang Fnu Suya Rishi Jha Collin Zhang Vitaly Shmatikov AAML 116 1 0 18 Dec 2024
Weak-to-Strong Generalization beyond Accuracy: a Pilot Study in Safety, Toxicity, and Legal Reasoning Ruimeng Ye Yang Xiao Bo Hui ALM ELM OffRL 69 3 0 16 Oct 2024
TaeBench: Improving Quality of Toxic Adversarial Examples Xuan Zhu Dmitriy Bespalov Liwen You Ninad Kulkarni Yanjun Qi AAML 79 0 0 08 Oct 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models Peiyi Zhang Yazhou Zhang Bo Wang Lu Rong Jing Qin Jing Qin AI4Ed ELM 85 1 0 19 Sep 2024
Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems Guangjing Wang Ce Zhou Yuanda Wang Bocheng Chen Hanqing Guo Qiben Yan AAML SILM 91 3 0 20 Nov 2023
Ex Machina: Personal Attacks Seen at Scale Ellery Wulczyn Nithum Thain Lucas Dixon 53 755 0 27 Oct 2016
Adversarial Perturbations Against Deep Neural Networks for Malware Classification Kathrin Grosse Nicolas Papernot Praveen Manoharan Michael Backes Patrick McDaniel AAML 44 418 0 14 Jun 2016
Practical Black-Box Attacks against Machine Learning Nicolas Papernot Patrick McDaniel Ian Goodfellow S. Jha Z. Berkay Celik A. Swami MLAU AAML 49 3,660 0 08 Feb 2016
The Limitations of Deep Learning in Adversarial Settings Nicolas Papernot Patrick McDaniel S. Jha Matt Fredrikson Z. Berkay Celik A. Swami AAML 66 3,947 0 24 Nov 2015
Deep Learning and Music Adversaries Corey Kereliuk Bob L. T. Sturm J. Larsen AAML 44 136 0 16 Jul 2015
Explaining and Harnessing Adversarial Examples Ian Goodfellow Jonathon Shlens Christian Szegedy AAML GAN 176 18,922 0 20 Dec 2014
Intriguing properties of neural networks Christian Szegedy Wojciech Zaremba Ilya Sutskever Joan Bruna D. Erhan Ian Goodfellow Rob Fergus AAML 183 14,831 1 21 Dec 2013