Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2507.12428
Cited By
v1
v2 (latest)
Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models
16 July 2025
Yik Siu Chan
Zheng-Xin Yong
Stephen H. Bach
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (2★)
Papers citing
"Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models"
7 / 7 papers shown
Title
The Impact of Off-Policy Training Data on Probe Generalisation
Nathalie Kirch
Samuel Dower
Adrians Skapars
Ekdeep Singh Lubana
Dmitrii Krasheninnikov
72
0
0
21 Nov 2025
MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models
Jingyu Hu
Shu Yang
Xilin Gong
H. Wang
Weiru Liu
Di Wang
LRM
86
0
0
09 Nov 2025
AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research
Tim Beyer
Jonas Dornbusch
Jakob Steimle
Moritz Ladenburger
Leo Schwinn
Stephan Günnemann
AAML
164
0
0
06 Nov 2025
Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?
Deokhyung Kang
Seonjeong Hwang
Daehui Kim
Hyounghun Kim
Gary Geunbae Lee
LRM
96
0
0
31 Oct 2025
Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training
Zheng-Xin Yong
Stephen H. Bach
LRM
196
0
0
23 Oct 2025
Validation of Various Normalization Methods for Brain Tumor Segmentation: Can Federated Learning Overcome This Heterogeneity?
Jan Fiszer
Dominika Ciupek
Maciej Malawski
FedML
148
1
0
08 Oct 2025
HarmonyGuard: Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization
Yurun Chen
Xavier Hu
Y. Liu
Keting Yin
Juncheng Billy Li
Zhuosheng Zhang
Shengyu Zhang
LLMAG
96
5
0
06 Aug 2025
1