EdgeAIGuard: Agentic LLMs for Minor Protection in Digital Spaces

v1v2 (latest)

EdgeAIGuard: Agentic LLMs for Minor Protection in Digital Spaces

28 February 2025

Sunder Ali Khowaja

ArXiv (abs)PDF HTML

Papers citing "EdgeAIGuard: Agentic LLMs for Minor Protection in Digital Spaces"

16 / 16 papers shown

Title
Legilimens: Practical and Unified Content Moderation for Large Language Model Services Jialin Wu Jiangyi Deng Shengyuan Pang Yanjiao Chen Jiayang Xu Xinfeng Li Wenyuan Xu 110 7 0 28 Aug 2024
HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models Tanmay Sen Ansuman Das Mrinmay Sen 63 4 0 26 Apr 2024
Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales Ayushi Nirmal Amrita Bhattacharjee Paras Sheth Huan Liu AAML 80 11 0 19 Mar 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism DeepSeek-AI Xiao Bi : Xiao Bi Deli Chen Guanting Chen ... Yao Zhao Shangyan Zhou Shunfeng Zhou Qihao Zhu Yuheng Zou LRM ALM 201 379 0 05 Jan 2024
TurkishBERTweet: Fast and Reliable Large Language Model for Social Media Analysis Ali Najafi Onur Varol VLM 62 13 0 29 Nov 2023
ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey S. Mohamadi Ghulam Mujtaba Ngan Le Gianfranco Doretto Don Adjeroh LM&MA AI4MH 96 21 0 09 Jul 2023
Evaluating GPT-3 Generated Explanations for Hateful Content Moderation H. Wang Ming Shan Hee Rabiul Awal K. T. W. Choo Roy Ka-wei Lee 64 45 0 28 May 2023
ChatGPT Needs SPADE (Sustainability, PrivAcy, Digital divide, and Ethics) Evaluation: A Review Sunder Ali Khowaja P. Khuwaja Kapal Dev Weizheng Wang Lewis Nkenyereye 96 83 0 13 Apr 2023
Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model Hind S. Alatawi Areej M. Alhothali K. Moria 68 92 0 02 Nov 2021
A systematic review of Hate Speech automatic detection using Natural Language Processing Md Saroar Jahan Mourad Oussalah 59 22 0 22 May 2021
Detecting White Supremacist Hate Speech using Domain Specific Word Embedding with Deep Learning and BERT Hind S. Alatawi Areej M. Alhothali K. Moria 49 89 0 01 Oct 2020
Multilingual and Multi-Aspect Hate Speech Analysis N. Ousidhoum Zizheng Lin Hongming Zhang Yangqiu Song Dit-Yan Yeung 95 291 0 29 Aug 2019
Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models Satyajit Kamble Aditya Joshi 48 76 0 13 Nov 2018
Challenges in Discriminating Profanity from Hate Speech S. Malmasi Marcos Zampieri 74 243 0 14 Mar 2018
Deep Learning for Hate Speech Detection in Tweets Pinkesh Badjatiya Shashank Gupta Manish Gupta Vasudeva Varma 94 1,142 0 01 Jun 2017
Automated Hate Speech Detection and the Problem of Offensive Language Thomas Davidson Dana Warmsley M. Macy Ingmar Weber 79 2,703 0 11 Mar 2017