Watching the Watchers: A Comparative Fairness Audit of Cloud-based Content Moderation Services

20 June 2024

Papers citing "Watching the Watchers: A Comparative Fairness Audit of Cloud-based Content Moderation Services"

7 / 7 papers shown

Title
Black-Box Access is Insufficient for Rigorous AI Audits Stephen Casper Carson Ezell Charlotte Siegmann Noam Kolt Taylor Lynn Curtis ... Michael Gerovitch David Bau Max Tegmark David M. Krueger Dylan Hadfield-Menell AAML 111 88 0 25 Jan 2024
How Hate Speech Varies by Target Identity: A Computational Analysis Michael Miller Yoder Lynnette Hui Xian Ng D. W. Brown Kathleen M. Carley 85 21 0 19 Oct 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection Thomas Hartvigsen Saadia Gabriel Hamid Palangi Maarten Sap Dipankar Ray Ece Kamar 78 376 0 17 Mar 2022
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech Mai Elsherief Caleb Ziems D. Muchlinski Vaishnavi Anupindi Jordyn Seybolt M. D. Choudhury Diyi Yang 161 250 0 11 Sep 2021
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection Binny Mathew Punyajoy Saha Seid Muhie Yimam Chris Biemann Pawan Goyal Animesh Mukherjee 115 578 0 18 Dec 2020
The Woman Worked as a Babysitter: On Biases in Language Generation Emily Sheng Kai-Wei Chang Premkumar Natarajan Nanyun Peng 276 645 0 03 Sep 2019
Counterfactual Fairness in Text Classification through Robustness Sahaj Garg Vincent Perot Nicole Limtiaco Ankur Taly Ed H. Chi Alex Beutel 99 261 0 27 Sep 2018