v1v2 (latest)

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

24 September 2020

Yejin Choi

Papers citing "RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models"

50 / 814 papers shown

Title
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation Alisa Liu Swabha Swayamdipta Noah A. Smith Yejin Choi 215 221 0 16 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound Rowan Zellers Jiasen Lu Ximing Lu Youngjae Yu Yanpeng Zhao Mohammadreza Salehi Aditya Kusupati Jack Hessel Ali Farhadi Yejin Choi 115 215 0 07 Jan 2022
Analyzing the Limits of Self-Supervision in Handling Bias in Language Lisa Bauer Karthik Gopalakrishnan Spandana Gella Yang Liu Joey Tianyi Zhou Dilek Z. Hakkani-Tür ELM 41 1 0 16 Dec 2021
Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings Andrew Wang Mohit Sudhakar Yangfeng Ji 44 2 0 15 Dec 2021
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases Shrimai Prabhumoye Rafal Kocielnik Mohammad Shoeybi Anima Anandkumar Bryan Catanzaro 68 21 0 15 Dec 2021
Massive-scale Decoding for Text Generation using Lattices Jiacheng Xu Siddhartha Reddy Jonnalagadda Greg Durrett AI4CE 93 8 0 14 Dec 2021
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts Nan Du Yanping Huang Andrew M. Dai Simon Tong Dmitry Lepikhin ... Kun Zhang Quoc V. Le Yonghui Wu Zhiwen Chen Claire Cui ALM MoE 284 835 0 13 Dec 2021
Extending the WILDS Benchmark for Unsupervised Adaptation Shiori Sagawa Pang Wei Koh Tony Lee Irena Gao Sang Michael Xie ... Kate Saenko Tatsunori Hashimoto Sergey Levine Chelsea Finn Percy Liang OOD 67 102 0 09 Dec 2021
Improving language models by retrieving from trillions of tokens Sebastian Borgeaud A. Mensch Jordan Hoffmann Trevor Cai Eliza Rutherford ... Simon Osindero Karen Simonyan Jack W. Rae Erich Elsen Laurent Sifre KELM RALM 303 1,109 0 08 Dec 2021
A General Language Assistant as a Laboratory for Alignment Amanda Askell Yuntao Bai Anna Chen Dawn Drain Deep Ganguli ... Tom B. Brown Jack Clark Sam McCandlish C. Olah Jared Kaplan ALM 143 791 0 01 Dec 2021
Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs Peter Hase Mona T. Diab Asli Celikyilmaz Xian Li Zornitsa Kozareva Veselin Stoyanov Joey Tianyi Zhou Srini Iyer KELM LRM 92 79 0 26 Nov 2021
RedCaps: web-curated image-text data created by the people, for the people Karan Desai Gaurav Kaul Zubin Aysola Justin Johnson 137 169 0 22 Nov 2021
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey Bonan Min Hayley L Ross Elior Sulem Amir Pouran Ben Veyseh Thien Huu Nguyen Oscar Sainz Eneko Agirre Ilana Heinz Dan Roth LM&MA VLM AI4CE 197 1,101 0 01 Nov 2021
A Systematic Investigation of Commonsense Knowledge in Large Language Models Xiang Lorraine Li A. Kuncoro Jordan Hoffmann Cyprien de Masson dÁutume Phil Blunsom Aida Nematzadeh LRM 101 59 0 31 Oct 2021
PAGnol: An Extra-Large French Generative Model Julien Launay E. L. Tommasone B. Pannier Franccois Boniface A. Chatelain Alessandro Cappelli Iacopo Poli Djamé Seddah AILaw MoE AI4CE 82 8 0 16 Oct 2021
Control Prefixes for Parameter-Efficient Text Generation Jordan Clive Kris Cao Marek Rei 125 32 0 15 Oct 2021
Can Machines Learn Morality? The Delphi Experiment Liwei Jiang Jena D. Hwang Chandra Bhagavatula Ronan Le Bras Jenny T Liang ... Yulia Tsvetkov Oren Etzioni Maarten Sap Regina A. Rini Yejin Choi FaML 218 123 0 14 Oct 2021
Scheduling Optimization Techniques for Neural Network Training Hyungjun Oh Junyeol Lee HyeongJu Kim Jiwon Seo 48 1 0 03 Oct 2021
Expected Validation Performance and Estimation of a Random Variable's Maximum Jesse Dodge Suchin Gururangan Dallas Card Roy Schwartz Noah A. Smith 102 9 0 01 Oct 2021
PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided MCTS Decoding Antoine Chaffin Vincent Claveau Ewa Kijak 78 38 0 28 Sep 2021
Text Detoxification using Large Pre-trained Neural Models David Dale Anton Voronov Daryna Dementieva V. Logacheva Olga Kozlova Nikita Semenov Alexander Panchenko 124 74 0 18 Sep 2021
Challenges in Detoxifying Language Models Johannes Welbl Amelia Glaese J. Uesato Sumanth Dathathri John F. J. Mellor Lisa Anne Hendricks Kirsty Anderson Pushmeet Kohli Ben Coppin Po-Sen Huang LM&MA 313 196 0 15 Sep 2021
Sequence Length is a Domain: Length-based Overfitting in Transformer Models Dusan Varis Ondrej Bojar 77 56 0 15 Sep 2021
Automatically Exposing Problems with Neural Dialog Models Dian Yu Kenji Sagae 110 9 0 14 Sep 2021
Hi, my name is Martha: Using names to measure and mitigate bias in generative dialogue models Eric Michael Smith Adina Williams 124 28 0 07 Sep 2021
Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts Ashutosh Baheti Maarten Sap Alan Ritter Mark O. Riedl 90 91 0 26 Aug 2021
DEMix Layers: Disentangling Domains for Modular Language Modeling Suchin Gururangan Michael Lewis Ari Holtzman Noah A. Smith Luke Zettlemoyer KELM MoE 117 138 0 11 Aug 2021
Mitigating harm in language models with conditional-likelihood filtration Helen Ngo Cooper D. Raterink J. Araújo Ivan Zhang Carol Chen Adrien Morisot Nick Frosst 98 42 0 04 Aug 2021
Controlled Text Generation as Continuous Optimization with Multiple Constraints Sachin Kumar Eric Malmi Aliaksei Severyn Yulia Tsvetkov BDL AI4CE 109 79 0 04 Aug 2021
Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty Quan Ze Chen Daniel S. Weld Amy X. Zhang 66 17 0 04 Aug 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing Pengfei Liu Weizhe Yuan Jinlan Fu Zhengbao Jiang Hiroaki Hayashi Graham Neubig VLM SyDa 389 4,052 0 28 Jul 2021
Interactive Storytelling for Children: A Case-study of Design and Development Considerations for Ethical Conversational AI J. Chubb S. Missaoui S. Concannon Liam Maloney James Alfred Walker 49 35 0 20 Jul 2021
Trustworthy AI: A Computational Perspective Haochen Liu Yiqi Wang Wenqi Fan Xiaorui Liu Yaxin Li Shaili Jain Yunhao Liu Anil K. Jain Jiliang Tang FaML 196 213 0 12 Jul 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling Emily Dinan Gavin Abercrombie A. S. Bergman Shannon L. Spruit Dirk Hovy Y-Lan Boureau Verena Rieser 97 109 0 07 Jul 2021
Towards Understanding and Mitigating Social Biases in Language Models Paul Pu Liang Chiyu Wu Louis-Philippe Morency Ruslan Salakhutdinov 102 399 0 24 Jun 2021
Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech Yi-Ling Chung Serra Sinem Tekiroğlu Marco Guerini 72 67 0 22 Jun 2021
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets Irene Solaiman Christy Dennison 115 226 0 18 Jun 2021
Assessing Political Prudence of Open-domain Chatbots Yejin Bang Nayeon Lee Etsuko Ishii Andrea Madotto Pascale Fung 70 25 0 11 Jun 2021
Conditional Contrastive Learning for Improving Fairness in Self-Supervised Learning Martin Q. Ma Yao-Hung Hubert Tsai Paul Pu Liang Han Zhao Kun Zhang Ruslan Salakhutdinov Louis-Philippe Morency SSL 87 16 0 05 Jun 2021
A Dataset and Baselines for Multilingual Reply Suggestion Mozhi Zhang Wei Wang Budhaditya Deb Guoqing Zheng Milad Shokouhi Ahmed Hassan Awadallah LRM 56 8 0 03 Jun 2021
Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution Jiacheng Xu Greg Durrett 99 16 0 03 Jun 2021
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts Alisa Liu Maarten Sap Ximing Lu Swabha Swayamdipta Chandra Bhagavatula Noah A. Smith Yejin Choi MU 143 376 0 07 May 2021
What's in the Box? A Preliminary Analysis of Undesirable Content in the Common Crawl Corpus A. Luccioni J. Viviano 107 119 0 06 May 2021
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation Kang Min Yoo Dongju Park Jaewook Kang Sang-Woo Lee Woomyeong Park 117 244 0 18 Apr 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus Jesse Dodge Maarten Sap Ana Marasović William Agnew Gabriel Ilharco Dirk Groeneveld Margaret Mitchell Matt Gardner AILaw 128 455 0 18 Apr 2021
Revealing Persona Biases in Dialogue Systems Emily Sheng Josh Arnold Zhou Yu Kai-Wei Chang Nanyun Peng 104 39 0 18 Apr 2021
Detoxifying Language Models Risks Marginalizing Minority Voices Albert Xu Eshaan Pathak Eric Wallace Suchin Gururangan Maarten Sap Dan Klein 77 129 0 13 Apr 2021
Semantic maps and metrics for science Semantic maps and metrics for science using deep transformer encoders Brendan Chambers James A. Evans MedIm 50 0 0 13 Apr 2021
Factual Probing Is [MASK]: Learning vs. Learning to Recall Zexuan Zhong Dan Friedman Danqi Chen 108 413 0 12 Apr 2021
Alignment of Language Agents Zachary Kenton Tom Everitt Laura Weidinger Iason Gabriel Vladimir Mikulik G. Irving 90 166 0 26 Mar 2021