Papers citing "An Approach to Technical AGI Safety and Security"

5 / 5 papers shown

Title
Because we have LLMs, we Can and Should Pursue Agentic Interpretability Been Kim John Hewitt Neel Nanda Noah Fiedel Oyvind Tafjord 17 0 0 13 Jun 2025
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations Li Ji-An Hua-Dong Xiong Robert C. Wilson Marcelo G. Mattar M. Benna 83 0 0 19 May 2025
Scaling Laws For Scalable Oversight Joshua Engels David D. Baek Subhash Kantamneni Max Tegmark ELM 204 1 0 25 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape Wenbo Guo Yujin Potter Tianneng Shi Zhun Wang Andy Zhang Dawn Song 111 2 0 07 Apr 2025
How to evaluate control measures for LLM agents? A trajectory from today to superintelligence Tomek Korbak Mikita Balesni Buck Shlegeris Geoffrey Irving ELM 109 1 0 07 Apr 2025