ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2507.14805
  4. Cited By
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

20 July 2025
Alex Cloud
Minh Le
James Chua
Jan Betley
Anna Sztyber-Betley
Jacob Hilton
Samuel Marks
Owain Evans
ArXiv (abs)PDFHTML

Papers citing "Subliminal Learning: Language models transmit behavioral traits via hidden signals in data"

12 / 12 papers shown
Title
Subliminal Corruption: Mechanisms, Thresholds, and Interpretability
Subliminal Corruption: Mechanisms, Thresholds, and Interpretability
Reya Vir
Sarvesh Bhatnagar
60
0
0
22 Oct 2025
Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation
Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation
Giovanni De Muri
Mark Vero
Robin Staab
Martin Vechev
115
0
0
21 Oct 2025
Detecting Adversarial Fine-tuning with Auditing Agents
Detecting Adversarial Fine-tuning with Auditing Agents
Sarah Egler
John Schulman
Nicholas Carlini
AAMLMLAU
145
0
0
17 Oct 2025
Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time
Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time
Daniel Tan
Anders Woodruff
Niels Warncke
Arun Jose
Maxime Riché
David Demitri Africa
Mia Taylor
287
0
0
05 Oct 2025
LLM Chemistry Estimation for Multi-LLM Recommendation
LLM Chemistry Estimation for Multi-LLM Recommendation
H. Sánchez
Briland Hitaj
84
1
0
04 Oct 2025
Position: Privacy Is Not Just Memorization!
Position: Privacy Is Not Just Memorization!
Niloofar Mireshghallah
Tianshi Li
PILM
205
1
0
02 Oct 2025
Exploring System 1 and 2 communication for latent reasoning in LLMs
Exploring System 1 and 2 communication for latent reasoning in LLMs
Julian Coda-Forno
Zhuokai Zhao
Qiang Zhang
Dipesh Tamboli
W. Li
Xiangjun Fan
Lizhu Zhang
Eric Schulz
Hsiao-Ping Tseng
LRM
85
1
1
01 Oct 2025
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
Simon Schrodi
Elias Kempf
Fazl Barez
Thomas Brox
FedML
92
0
0
28 Sep 2025
Regulating the Agency of LLM-based Agents
Regulating the Agency of LLM-based Agents
Seán Boddy
Joshua Joseph
ELM
121
0
0
25 Sep 2025
Towards mitigating information leakage when evaluating safety monitors
Towards mitigating information leakage when evaluating safety monitors
Gerard Boxo
Aman Neelappa
Shivam Raval
AAML
100
0
0
16 Sep 2025
Probing the Preferences of a Language Model: Integrating Verbal and Behavioral Tests of AI Welfare
Probing the Preferences of a Language Model: Integrating Verbal and Behavioral Tests of AI Welfare
Valen Tagliabue
Leonard Dung
81
1
0
09 Sep 2025
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
Mia Taylor
James Chua
Jan Betley
Johannes Treutlein
Owain Evans
84
5
0
24 Aug 2025
1