Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.01849
Cited By
An Approach to Technical AGI Safety and Security
2 April 2025
Rohin Shah
Alex Irpan
Alexander Matt Turner
Anna Wang
Arthur Conmy
David Lindner
Jonah Brown-Cohen
Lewis Ho
Neel Nanda
Raluca Ada Popa
Rishub Jain
Rory Greig
Samuel Albanie
Scott Emmons
Sebastian Farquhar
Sébastien Krier
Senthooran Rajamanoharan
Sophie Bridgers
Tobi Ijitoye
Tom Everitt
Victoria Krakovna
Vikrant Varma
Vladimir Mikulik
Zachary Kenton
Dave Orr
Shane Legg
Noah D. Goodman
Allan Dafoe
Four Flynn
Anca Dragan
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"An Approach to Technical AGI Safety and Security"
5 / 5 papers shown
Title
Because we have LLMs, we Can and Should Pursue Agentic Interpretability
Been Kim
John Hewitt
Neel Nanda
Noah Fiedel
Oyvind Tafjord
17
0
0
13 Jun 2025
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
Li Ji-An
Hua-Dong Xiong
Robert C. Wilson
Marcelo G. Mattar
M. Benna
83
0
0
19 May 2025
Scaling Laws For Scalable Oversight
Joshua Engels
David D. Baek
Subhash Kantamneni
Max Tegmark
ELM
204
1
0
25 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape
Wenbo Guo
Yujin Potter
Tianneng Shi
Zhun Wang
Andy Zhang
Dawn Song
111
2
0
07 Apr 2025
How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
Tomek Korbak
Mikita Balesni
Buck Shlegeris
Geoffrey Irving
ELM
109
1
0
07 Apr 2025
1