Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2409.18708
Cited By
v1
v2
v3
v4
v5 (latest)
Evading Toxicity Detection with ASCII-art: A Benchmark of Spatial Attacks on Moderation Systems
27 September 2024
Sergey Berezin
R. Farahbakhsh
Noel Crespi
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Evading Toxicity Detection with ASCII-art: A Benchmark of Spatial Attacks on Moderation Systems"
4 / 4 papers shown
Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding
Seongho Joo
Hyukhun Koh
Kyomin Jung
225
4
0
13 Sep 2025
ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection
Axel Delaval
Shujian Yang
Huaimin Wang
Han Qiu
Jialiang Lu
200
0
0
15 Aug 2025
Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
Zhaochen Wang
Yujun Cai
Zi Huang
Bryan Hooi
Yiwei Wang
Ming Yang
CoGe
VLM
427
5
0
02 Apr 2025
From Intrinsic Toxicity to Reception-Based Toxicity: A Contextual Framework for Prediction and Evaluation
Sergey Berezin
R. Farahbakhsh
Noel Crespi
369
1
0
20 Mar 2025
1
Page 1 of 1