ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.17680
  4. Cited By
Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

28 May 2023
H. Wang
Ming Shan Hee
Rabiul Awal
K. T. W. Choo
Roy Ka-Wei Lee
ArXivPDFHTML

Papers citing "Evaluating GPT-3 Generated Explanations for Hateful Content Moderation"

32 / 32 papers shown
Title
Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms
Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms
Shuoming Zhang
Jiacheng Zhao
Ruiyuan Xu
Xiaobing Feng
Huimin Cui
AAML
39
1
0
31 Mar 2025
LLM-C3MOD: A Human-LLM Collaborative System for Cross-Cultural Hate Speech Moderation
Junyeong Park
Seogyeong Jeong
Shri Kiran Srinivasan
Yohan Lee
Alice H. Oh
55
0
0
10 Mar 2025
EdgeAIGuard: Agentic LLMs for Minor Protection in Digital Spaces
G. Mujtaba
Sunder Ali Khowaja
K. Dev
40
0
0
28 Feb 2025
Reasoning About Persuasion: Can LLMs Enable Explainable Propaganda Detection?
Reasoning About Persuasion: Can LLMs Enable Explainable Propaganda Detection?
Maram Hasanain
Md. Arid Hasan
Mohamed Bayan Kmainasi
Elisa Sartori
Ali Ezzat Shahroor
Giovanni Da San Martino
Firoj Alam
40
0
0
23 Feb 2025
MemeIntel: Explainable Detection of Propagandistic and Hateful Memes
MemeIntel: Explainable Detection of Propagandistic and Hateful Memes
Mohamed Bayan Kmainasi
A. Hasnat
Md. Arid Hasan
Ali Ezzat Shahroor
Firoj Alam
VLM
45
0
0
23 Feb 2025
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Ming Shan Hee
Roy Ka-Wei Lee
VLM
83
0
0
16 Feb 2025
Is LLM an Overconfident Judge? Unveiling the Capabilities of LLMs in Detecting Offensive Language with Annotation Disagreement
Is LLM an Overconfident Judge? Unveiling the Capabilities of LLMs in Detecting Offensive Language with Annotation Disagreement
Junyu Lu
Kai Ma
Kaichun Wang
Kelaiti Xiao
Roy Ka-Wei Lee
Bo Xu
Liang Yang
Hongfei Lin
51
0
0
10 Feb 2025
Cross-Modal Transfer from Memes to Videos: Addressing Data Scarcity in Hateful Video Detection
Han Wang
Rui Yang Tan
Roy Ka-Wei Lee
39
0
0
28 Jan 2025
BanTH: A Multi-label Hate Speech Detection Dataset for Transliterated
  Bangla
BanTH: A Multi-label Hate Speech Detection Dataset for Transliterated Bangla
Fabiha Haider
Fariha Tanjim Shifat
Md Farhan Ishmam
Deeparghya Dutta Barua
Md Sakib Ul Rahman Sourove
Md Fahim
Md Farhad Alam
20
1
0
17 Oct 2024
Exploring Large Language Models for Hate Speech Detection in Rioplatense
  Spanish
Exploring Large Language Models for Hate Speech Detection in Rioplatense Spanish
Juan Manuel Pérez
Paula Miguel
Viviana Cotik
11
1
0
16 Oct 2024
End User Authoring of Personalized Content Classifiers: Comparing
  Example Labeling, Rule Writing, and LLM Prompting
End User Authoring of Personalized Content Classifiers: Comparing Example Labeling, Rule Writing, and LLM Prompting
Leijie Wang
Kathryn Yurechko
Pranati Dani
Quan Ze Chen
Amy X. Zhang
50
3
0
05 Sep 2024
Legilimens: Practical and Unified Content Moderation for Large Language
  Model Services
Legilimens: Practical and Unified Content Moderation for Large Language Model Services
Jialin Wu
Jiangyi Deng
Shengyuan Pang
Yanjiao Chen
Jiayang Xu
Xinfeng Li
Wenyuan Xu
37
6
0
28 Aug 2024
MemeGuard: An LLM and VLM-based Framework for Advancing Content
  Moderation via Meme Intervention
MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention
Prince Jha
Raghav Jain
Konika Mandal
Aman Chadha
Sriparna Saha
P. Bhattacharyya
23
6
0
08 Jun 2024
Explainability and Hate Speech: Structured Explanations Make Social
  Media Moderators Faster
Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster
Agostina Calabrese
Leonardo Neves
Neil Shah
Maarten W. Bos
Björn Ross
Mirella Lapata
Francesco Barbieri
FAtt
34
1
0
06 Jun 2024
SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource
  Languages of Singapore
SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore
Ri Chi Ng
Nirmalendu Prakash
Ming Shan Hee
K. T. W. Choo
Roy Ka-Wei Lee
41
4
0
03 May 2024
ChatGPT Rates Natural Language Explanation Quality Like Humans: But on
  Which Scales?
ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales?
Fan Huang
Haewoon Kwak
Kunwoo Park
Jisun An
ALM
ELM
AI4MH
40
12
0
26 Mar 2024
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive
  Speech Detection via Large Language Models
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models
H. Nghiem
Hal Daumé
39
1
0
18 Mar 2024
DOSA: A Dataset of Social Artifacts from Different Indian Geographical
  Subcultures
DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures
Agrima Seth
Sanchit Ahuja
Kalika Bali
Sunayana Sitaram
43
8
0
23 Feb 2024
What Does the Bot Say? Opportunities and Risks of Large Language Models
  in Social Media Bot Detection
What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection
Shangbin Feng
Herun Wan
Ningnan Wang
Zhaoxuan Tan
Minnan Luo
Yulia Tsvetkov
AAML
DeLMO
25
16
0
01 Feb 2024
Recent Advances in Hate Speech Moderation: Multimodality and the Role of
  Large Models
Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models
Ming Shan Hee
Shivam Sharma
Rui Cao
Palash Nandi
Tanmoy Chakraborty
Roy Ka-Wei Lee
43
14
0
30 Jan 2024
Cross-lingual Offensive Language Detection: A Systematic Review of
  Datasets, Transfer Approaches and Challenges
Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges
Aiqi Jiang
A. Zubiaga
AAML
31
3
0
17 Jan 2024
User Modeling in the Era of Large Language Models: Current Research and
  Future Directions
User Modeling in the Era of Large Language Models: Current Research and Future Directions
Zhaoxuan Tan
Meng Jiang
28
8
0
11 Dec 2023
HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online
  Posts using Large Language Models
HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online Posts using Large Language Models
Vibhor Agarwal
Yu Chen
Nishanth R. Sastry
23
6
0
21 Oct 2023
Composite Backdoor Attacks Against Large Language Models
Composite Backdoor Attacks Against Large Language Models
Hai Huang
Zhengyu Zhao
Michael Backes
Yun Shen
Yang Zhang
AAML
29
36
0
11 Oct 2023
Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and
  Addressing Sociological Implications
Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and Addressing Sociological Implications
Vishesh Thakur
27
26
0
18 Jul 2023
Toxicity Detection with Generative Prompt-based Inference
Toxicity Detection with Generative Prompt-based Inference
Yau-Shian Wang
Y. Chang
90
35
0
24 May 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
328
4,077
0
24 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
382
8,495
0
28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
213
1,657
0
15 Oct 2021
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Latent Hatred: A Benchmark for Understanding Implicit Hate Speech
Mai Elsherief
Caleb Ziems
D. Muchlinski
Vaishnavi Anupindi
Jordyn Seybolt
M. D. Choudhury
Diyi Yang
103
236
0
11 Sep 2021
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
417
2,588
0
03 Sep 2019
e-SNLI: Natural Language Inference with Natural Language Explanations
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
260
620
0
04 Dec 2018
1