Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.03891
Cited By
Social Bias Frames: Reasoning about Social and Power Implications of Language
10 November 2019
Maarten Sap
Saadia Gabriel
Lianhui Qin
Dan Jurafsky
Noah A. Smith
Yejin Choi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Social Bias Frames: Reasoning about Social and Power Implications of Language"
50 / 100 papers shown
Title
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?
Saadia Gabriel
Hamid Palangi
Yejin Choi
AAML
45
1
0
08 Nov 2022
Detecting Unintended Social Bias in Toxic Language Datasets
Nihar Ranjan Sahoo
Himanshu Gupta
P. Bhattacharyya
18
18
0
21 Oct 2022
How Hate Speech Varies by Target Identity: A Computational Analysis
Michael Miller Yoder
Lynnette Hui Xian Ng
D. W. Brown
Kathleen M. Carley
33
20
0
19 Oct 2022
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly
Yi R. Fung
Tuhin Chakraborty
Hao Guo
Owen Rambow
Smaranda Muresan
Heng Ji
21
39
0
16 Oct 2022
SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models
Haozhe An
Zongxia Li
Jieyu Zhao
Rachel Rudinger
30
25
0
13 Oct 2022
Explainable Abuse Detection as Intent Classification and Slot Filling
Agostina Calabrese
Bjorn Ross
Mirella Lapata
48
10
0
06 Oct 2022
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
Zhijing Jin
Sydney Levine
Fernando Gonzalez
Ojasv Kamal
Maarten Sap
Mrinmaya Sachan
Rada Mihalcea
J. Tenenbaum
Bernhard Schölkopf
ELM
LRM
34
90
0
04 Oct 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
231
447
0
23 Aug 2022
KOLD: Korean Offensive Language Dataset
Young-kuk Jeong
Juhyun Oh
Jaimeen Ahn
Jongwon Lee
Jihyung Mon
Sungjoon Park
Alice Oh
57
25
0
23 May 2022
Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification
Badr AlKhamissi
Mona T. Diab
54
14
0
16 May 2022
Analyzing Hate Speech Data along Racial, Gender and Intersectional Axes
Antonis Maronikolakis
Philip Baader
Hinrich Schütze
28
9
0
13 May 2022
Towards Answering Open-ended Ethical Quandary Questions
Yejin Bang
Nayeon Lee
Tiezheng Yu
Leila Khalatbari
Yan Xu
...
Romain Barraud
Elham J. Barezi
Andrea Madotto
Hayden Kee
Pascale Fung
ELM
35
6
0
12 May 2022
Aligning to Social Norms and Values in Interactive Narratives
Prithviraj Ammanabrolu
Liwei Jiang
Maarten Sap
Hannaneh Hajishirzi
Yejin Choi
AI4CE
28
47
0
04 May 2022
A Comparison of Approaches for Imbalanced Classification Problems in the Context of Retrieving Relevant Documents for an Analysis
Sandra Wankmüller
33
2
0
03 May 2022
Hidden behind the obvious: misleading keywords and implicitly abusive language on social media
Wenjie Yin
A. Zubiaga
31
27
0
03 May 2022
A Corpus for Understanding and Generating Moral Stories
Jian Guan
Ziqi Liu
Minlie Huang
32
9
0
20 Apr 2022
UMass PCL at SemEval-2022 Task 4: Pre-trained Language Model Ensembles for Detecting Patronizing and Condescending Language
David Koleczek
Alexander Scarlatos
Siddha Makarand Karkare
Preshma Linet Pereira
24
0
0
18 Apr 2022
The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems
Caleb Ziems
Jane A. Yu
Yi-Chia Wang
A. Halevy
Diyi Yang
28
92
0
06 Apr 2022
Probing Pre-Trained Language Models for Cross-Cultural Differences in Values
Arnav Arora
Lucie-Aimée Kaffee
Isabelle Augenstein
VLM
43
124
0
25 Mar 2022
Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments
Antonis Maronikolakis
Axel Wisiorek
Leah Nann
Haris Jabbar
Sahana Udupa
Hinrich Schütze
24
24
0
22 Mar 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
33
353
0
17 Mar 2022
Quantifying Gender Biases Towards Politicians on Reddit
Sara Vera Marjanović
Karolina Stañczak
Isabelle Augenstein
11
15
0
22 Dec 2021
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases
Shrimai Prabhumoye
Rafal Kocielnik
M. Shoeybi
Anima Anandkumar
Bryan Catanzaro
35
20
0
15 Dec 2021
CO-STAR: Conceptualisation of Stereotypes for Analysis and Reasoning
Teyun Kwon
Anandha Gopalan
27
2
0
01 Dec 2021
Few-Shot Self-Rationalization with Natural Language Prompts
Ana Marasović
Iz Beltagy
Doug Downey
Matthew E. Peters
LRM
26
106
0
16 Nov 2021
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection
Maarten Sap
Swabha Swayamdipta
Laura Vianna
Xuhui Zhou
Yejin Choi
Noah A. Smith
46
268
0
15 Nov 2021
A Word on Machine Ethics: A Response to Jiang et al. (2021)
Zeerak Talat
Hagen Blix
Josef Valvoda
M. I. Ganesh
Ryan Cotterell
Adina Williams
SyDa
FaML
96
38
0
07 Nov 2021
Clean or Annotate: How to Spend a Limited Data Collection Budget
Derek Chen
Zhou Yu
Samuel R. Bowman
37
13
0
15 Oct 2021
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
223
378
0
15 Oct 2021
Detecting Community Sensitive Norm Violations in Online Conversations
Chan Young Park
Julia Mendelsohn
Karthik Radhakrishnan
Kinjal Jain
Tushar Kanakagiri
David Jurgens
Yulia Tsvetkov
38
23
0
09 Oct 2021
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Mai Elsherief
Caleb Ziems
D. Muchlinski
Vaishnavi Anupindi
Jordyn Seybolt
M. D. Choudhury
Diyi Yang
106
239
0
11 Sep 2021
Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts
Ashutosh Baheti
Maarten Sap
Alan Ritter
Mark O. Riedl
21
84
0
26 Aug 2021
On Measures of Biases and Harms in NLP
Sunipa Dev
Emily Sheng
Jieyu Zhao
Aubrie Amstutz
Jiao Sun
...
M. Sanseverino
Jiin Kim
Akihiro Nishi
Nanyun Peng
Kai-Wei Chang
31
80
0
07 Aug 2021
On the Diversity and Limits of Human Explanations
Chenhao Tan
19
31
0
22 Jun 2021
A Survey of Race, Racism, and Anti-Racism in NLP
Anjalie Field
Su Lin Blodgett
Zeerak Talat
Yulia Tsvetkov
42
122
0
21 Jun 2021
Understanding and Countering Stereotypes: A Computational Approach to the Stereotype Content Model
Kathleen C. Fraser
I. Nejadgholi
S. Kiritchenko
19
37
0
04 Jun 2021
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts
Alisa Liu
Maarten Sap
Ximing Lu
Swabha Swayamdipta
Chandra Bhagavatula
Noah A. Smith
Yejin Choi
MU
26
359
0
07 May 2021
Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing
Boaz Shmueli
Jan Fell
Soumya Ray
Lun-Wei Ku
118
86
0
20 Apr 2021
Detoxifying Language Models Risks Marginalizing Minority Voices
Albert Xu
Eshaan Pathak
Eric Wallace
Suchin Gururangan
Maarten Sap
Dan Klein
24
123
0
13 Apr 2021
Towards generalisable hate speech detection: a review on obstacles and solutions
Wenjie Yin
A. Zubiaga
117
164
0
17 Feb 2021
HateCheck: Functional Tests for Hate Speech Detection Models
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
31
260
0
31 Dec 2020
Argument from Old Man's View: Assessing Social Bias in Argumentation
Maximilian Spliethover
Henning Wachsmuth
14
20
0
24 Nov 2020
Towards Ethics by Design in Online Abusive Content Detection
S. Kiritchenko
I. Nejadgholi
21
13
0
28 Oct 2020
PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction
Xinyao Ma
Maarten Sap
Hannah Rashkin
Yejin Choi
38
73
0
26 Oct 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
37
1,132
0
24 Sep 2020
A Computational Approach to Understanding Empathy Expressed in Text-Based Mental Health Support
Ashish Sharma
Adam S. Miner
David C. Atkins
Tim Althoff
AI4MH
25
272
0
17 Sep 2020
OpenFraming: We brought the ML; you bring the data. Interact with your data and discover its frames
Alyssa Smith
D. Tofu
Mona Jalal
Edward Edberg Halim
Yimeng Sun
V. Akavoor
Margrit Betke
Prakash Ishwar
Lei Guo
Derry Wijaya
11
1
0
16 Aug 2020
Multi-Dimensional Gender Bias Classification
Emily Dinan
Angela Fan
Ledell Yu Wu
Jason Weston
Douwe Kiela
Adina Williams
FaML
22
122
0
01 May 2020
Unsupervised Discovery of Implicit Gender Bias
Anjalie Field
Yulia Tsvetkov
11
49
0
17 Apr 2020
A Framework for the Computational Linguistic Analysis of Dehumanization
Julia Mendelsohn
Yulia Tsvetkov
Dan Jurafsky
87
89
0
06 Mar 2020
Previous
1
2