ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.14256
  4. Cited By
Handling and Presenting Harmful Text in NLP Research

Handling and Presenting Harmful Text in NLP Research

29 April 2022
Hannah Rose Kirk
Abeba Birhane
Bertie Vidgen
Leon Derczynski
ArXivPDFHTML

Papers citing "Handling and Presenting Harmful Text in NLP Research"

32 / 32 papers shown
Title
Improving Hate Speech Classification with Cross-Taxonomy Dataset Integration
Jan Fillies
Adrian Paschke
39
0
0
07 Mar 2025
Mitigating Trauma in Qualitative Research Infrastructure: Roles for
  Machine Assistance and Trauma-Informed Design
Mitigating Trauma in Qualitative Research Infrastructure: Roles for Machine Assistance and Trauma-Informed Design
Emily Tseng
Thomas Ristenpart
Nicola Dell
85
1
0
22 Dec 2024
HateDay: Insights from a Global Hate Speech Dataset Representative of a
  Day on Twitter
HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Manuel Tonneau
Diyi Liu
Niyati Malhotra
Scott A. Hale
Samuel Fraiberger
Victor Orozco-Olvera
Paul Röttger
78
0
0
23 Nov 2024
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A
  Comparative Analysis
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis
Jonathan Brokman
Omer Hofman
Oren Rachmil
Inderjeet Singh
Vikas Pahuja
Rathina Sabapathy Aishvariya Priya
Amit Giloni
Roman Vainshtein
Hisashi Kojima
36
2
0
21 Oct 2024
Generation with Dynamic Vocabulary
Generation with Dynamic Vocabulary
Yanting Liu
Tao Ji
Changzhi Sun
Yuanbin Wu
Xiaoling Wang
45
0
0
11 Oct 2024
Re-examining Sexism and Misogyny Classification with Annotator Attitudes
Re-examining Sexism and Misogyny Classification with Annotator Attitudes
Aiqi Jiang
Nikolas Vitsakis
Tanvi Dinkar
Gavin Abercrombie
Ioannis Konstas
47
1
0
04 Oct 2024
When in Doubt, Cascade: Towards Building Efficient and Capable
  Guardrails
When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails
Manish Nagireddy
Inkit Padhi
Soumya Ghosh
P. Sattigeri
46
1
0
08 Jul 2024
Safeguarding Large Language Models: A Survey
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
37
19
0
03 Jun 2024
The Life Cycle of Large Language Models: A Review of Biases in Education
The Life Cycle of Large Language Models: A Review of Biases in Education
Jinsook Lee
Yann Hicke
Renzhe Yu
Christopher A. Brooks
René F. Kizilcec
AI4Ed
42
1
0
03 Jun 2024
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM
  Generated Conversations
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations
Preetam Prabhu Srikar Dammu
Hayoung Jung
Anjali Singh
Monojit Choudhury
Tanushree Mitra
39
8
0
08 May 2024
Challenging Negative Gender Stereotypes: A Study on the Effectiveness of
  Automated Counter-Stereotypes
Challenging Negative Gender Stereotypes: A Study on the Effectiveness of Automated Counter-Stereotypes
I. Nejadgholi
Kathleen C. Fraser
Anna Kerkhof
S. Kiritchenko
18
3
0
18 Apr 2024
If there's a Trigger Warning, then where's the Trigger? Investigating
  Trigger Warnings at the Passage Level
If there's a Trigger Warning, then where's the Trigger? Investigating Trigger Warnings at the Passage Level
Matti Wiegmann
Jennifer Rakete
Magdalena Wolska
Benno Stein
Martin Potthast
LLMSV
24
0
0
15 Apr 2024
NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but
  Teaching the Distinction Helps
NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps
Kristina Gligorić
Myra Cheng
Lucia Zheng
Esin Durmus
Dan Jurafsky
45
9
0
02 Apr 2024
NLP for Counterspeech against Hate: A Survey and How-To Guide
NLP for Counterspeech against Hate: A Survey and How-To Guide
Helena Bonaldi
Yi-Ling Chung
Gavin Abercrombie
Marco Guerini
AAML
39
13
0
29 Mar 2024
Toxic language detection: a systematic review of Arabic datasets
Toxic language detection: a systematic review of Arabic datasets
Imene Bensalem
Paolo Rosso
Hanane Zitouni
30
4
0
12 Dec 2023
DELPHI: Data for Evaluating LLMs' Performance in Handling Controversial
  Issues
DELPHI: Data for Evaluating LLMs' Performance in Handling Controversial Issues
David Q. Sun
Artem Abzaliev
Hadas Kotek
Zidi Xiu
Christopher Klein
Jason D. Williams
24
9
0
27 Oct 2023
Cultural Compass: Predicting Transfer Learning Success in Offensive
  Language Detection with Cultural Features
Cultural Compass: Predicting Transfer Learning Success in Offensive Language Detection with Cultural Features
Li Zhou
Antonia Karamolegkou
Wenyu Chen
Daniel Hershcovich
45
11
0
10 Oct 2023
DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for
  Detecting Abuse Targeted at Public Figures
DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures
Angus R. Williams
Hannah Rose Kirk
L. Burke
Yi-Ling Chung
Ivan Debono
Pica Johansson
Francesca Stevens
Jonathan Bright
Scott A. Hale
37
1
0
31 Jul 2023
Understanding Counterspeech for Online Harm Mitigation
Understanding Counterspeech for Online Harm Mitigation
Yi-Ling Chung
Gavin Abercrombie
Florence E. Enock
Jonathan Bright
Verena Rieser
25
16
0
01 Jul 2023
Evaluating the Effectiveness of Natural Language Inference for Hate
  Speech Detection in Languages with Limited Labeled Data
Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data
Janis Goldzycher
Moritz Preisig
Chantal Amrhein
Gerold Schneider
27
3
0
06 Jun 2023
Seeing Seeds Beyond Weeds: Green Teaming Generative AI for Beneficial
  Uses
Seeing Seeds Beyond Weeds: Green Teaming Generative AI for Beneficial Uses
Logan Stapleton
Jordan Taylor
Sarah E Fox
Tongshuang Wu
Haiyi Zhu
28
13
0
30 May 2023
Exploiting Explainability to Design Adversarial Attacks and Evaluate
  Attack Resilience in Hate-Speech Detection Models
Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models
Pranath Reddy Kumbam
Sohaib Uddin Syed
Prashanth Thamminedi
S. Harish
Ian Perera
Bonnie J. Dorr
AAML
18
1
0
29 May 2023
You Are What You Annotate: Towards Better Models through Annotator
  Representations
You Are What You Annotate: Towards Better Models through Annotator Representations
Naihao Deng
Xinliang Frederick Zhang
Siyang Liu
Winston Wu
Lu Wang
Rada Mihalcea
29
20
0
24 May 2023
Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety
  of Text-to-Image Models
Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models
Alicia Parrish
Hannah Rose Kirk
Jessica Quaye
Charvi Rastogi
Max Bartolo
...
Addison Howard
William J. Cukierski
D. Sculley
Vijay Janapa Reddi
Lora Aroyo
DiffM
43
13
0
22 May 2023
Assessing Language Model Deployment with Risk Cards
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
28
42
0
31 Mar 2023
SemEval-2023 Task 10: Explainable Detection of Online Sexism
SemEval-2023 Task 10: Explainable Detection of Online Sexism
Hannah Rose Kirk
Wenjie Yin
Bertie Vidgen
Paul Röttger
24
117
0
07 Mar 2023
Auditing large language models: a three-layered approach
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
48
194
0
16 Feb 2023
Evaluating Human-Language Model Interaction
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
58
99
0
19 Dec 2022
Trigger Warnings: Bootstrapping a Violence Detector for FanFiction
Trigger Warnings: Bootstrapping a Violence Detector for FanFiction
Magdalena Wolska
Christopher Schröder
Ole Borchardt
Benno Stein
Martin Potthast
18
9
0
09 Sep 2022
DataPerf: Benchmarks for Data-Centric AI Development
DataPerf: Benchmarks for Data-Centric AI Development
Mark Mazumder
Colby R. Banbury
Xiaozhe Yao
Bojan Karlavs
W. G. Rojas
...
Carole-Jean Wu
Cody Coleman
Andrew Y. Ng
Peter Mattson
Vijay Janapa Reddi
VLM
43
102
0
20 Jul 2022
Just What do You Think You're Doing, Dave?' A Checklist for Responsible
  Data Use in NLP
Just What do You Think You're Doing, Dave?' A Checklist for Responsible Data Use in NLP
Anna Rogers
Timothy Baldwin
Kobi Leins
104
64
0
14 Sep 2021
Hatemoji: A Test Suite and Adversarially-Generated Dataset for
  Benchmarking and Detecting Emoji-based Hate
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate
Hannah Rose Kirk
B. Vidgen
Paul Röttger
Tristan Thrush
Scott A. Hale
67
57
0
12 Aug 2021
1