Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2507.14805
Cited By
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
20 July 2025
Alex Cloud
Minh Le
James Chua
Jan Betley
Anna Sztyber-Betley
Jacob Hilton
Samuel Marks
Owain Evans
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Subliminal Learning: Language models transmit behavioral traits via hidden signals in data"
12 / 12 papers shown
Title
Subliminal Corruption: Mechanisms, Thresholds, and Interpretability
Reya Vir
Sarvesh Bhatnagar
52
0
0
22 Oct 2025
Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation
Giovanni De Muri
Mark Vero
Robin Staab
Martin Vechev
115
0
0
21 Oct 2025
Detecting Adversarial Fine-tuning with Auditing Agents
Sarah Egler
John Schulman
Nicholas Carlini
AAML
MLAU
141
0
0
17 Oct 2025
Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time
Daniel Tan
Anders Woodruff
Niels Warncke
Arun Jose
Maxime Riché
David Demitri Africa
Mia Taylor
283
0
0
05 Oct 2025
LLM Chemistry Estimation for Multi-LLM Recommendation
H. Sánchez
Briland Hitaj
80
1
0
04 Oct 2025
Position: Privacy Is Not Just Memorization!
Niloofar Mireshghallah
Tianshi Li
PILM
201
1
0
02 Oct 2025
Exploring System 1 and 2 communication for latent reasoning in LLMs
Julian Coda-Forno
Zhuokai Zhao
Qiang Zhang
Dipesh Tamboli
W. Li
Xiangjun Fan
Lizhu Zhang
Eric Schulz
Hsiao-Ping Tseng
LRM
85
1
1
01 Oct 2025
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
Simon Schrodi
Elias Kempf
Fazl Barez
Thomas Brox
FedML
92
0
0
28 Sep 2025
Regulating the Agency of LLM-based Agents
Seán Boddy
Joshua Joseph
ELM
117
0
0
25 Sep 2025
Towards mitigating information leakage when evaluating safety monitors
Gerard Boxo
Aman Neelappa
Shivam Raval
AAML
96
0
0
16 Sep 2025
Probing the Preferences of a Language Model: Integrating Verbal and Behavioral Tests of AI Welfare
Valen Tagliabue
Leonard Dung
81
1
0
09 Sep 2025
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
Mia Taylor
James Chua
Jan Betley
Johannes Treutlein
Owain Evans
84
5
0
24 Aug 2025
1