Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.10728
Cited By
DarkBench: Benchmarking Dark Patterns in Large Language Models
13 March 2025
Esben Kran
Hieu Minh "Jord" Nguyen
Akash Kundu
Sami Jawhar
Jinsuk Park
Mateusz Maria Jurewicz
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DarkBench: Benchmarking Dark Patterns in Large Language Models"
1 / 1 papers shown
Title
Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems
Simon Lermen
Mateusz Dziemian
Natalia Pérez-Campanero Antolín
31
0
0
10 Apr 2025
1