Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.03768
Cited By
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
2 October 2024
Yohan Mathew
Ollie Matthews
Robert McCarthy
Joan Velja
Christian Schroeder de Witt
Dylan R. Cope
Nandi Schoots
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs"
2 / 2 papers shown
Title
The Steganographic Potentials of Language Models
Artem Karpov
Tinuade Adeleke
Seong Hah Cho
Natalia Perez-Campanero
32
0
0
06 May 2025
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Sebastian Farquhar
Vikrant Varma
David Lindner
David Elson
Caleb Biddulph
Ian Goodfellow
Rohin Shah
82
1
0
22 Jan 2025
1