Provably safe systems: the only path to controllable AGI

5 September 2023

Papers citing "Provably safe systems: the only path to controllable AGI"

12 / 12 papers shown

Title
Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity David Williams-King Linh Le Adam Oberman Yoshua Bengio AAML 114 0 0 19 Jan 2025
Adaptive Learn-then-Test: Statistically Valid and Efficient Hyperparameter Selection Matteo Zecchin Sangwoo Park Osvaldo Simeone LM&MA 231 4 0 24 Sep 2024
AI Deception: A Survey of Examples, Risks, and Potential Solutions Peter S. Park Simon Goldstein Aidan O'Gara Michael Chen Dan Hendrycks 75 158 0 28 Aug 2023
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought L. Wong Gabriel Grand Alexander K. Lew Noah D. Goodman Vikash K. Mansinghka Jacob Andreas J. Tenenbaum LRM AI4CE 55 107 0 22 Jun 2023
An Overview of Catastrophic AI Risks Dan Hendrycks Mantas Mazeika Thomas Woodside SILM 65 183 0 21 Jun 2023
Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving Xueliang Zhao Wenda Li Lingpeng Kong 61 31 0 25 May 2023
Fundamental Limitations of Alignment in Large Language Models Yotam Wolf Noam Wies Oshri Avnery Yoav Levine Amnon Shashua ALM 77 147 0 19 Apr 2023
"Real Attackers Don't Compute Gradients": Bridging the Gap Between Adversarial ML Research and Practice Giovanni Apruzzese Hyrum S. Anderson Savino Dambra D. Freeman Fabio Pierazzi Kevin A. Roundy AAML 101 81 0 29 Dec 2022
Precision Machine Learning Eric J. Michaud Ziming Liu Max Tegmark 61 34 0 24 Oct 2022
Autoformalization with Large Language Models Yuhuai Wu Albert Q. Jiang Wenda Li M. Rabe Charles Staats M. Jamnik Christian Szegedy AI4CE 282 177 0 25 May 2022
HyperTree Proof Search for Neural Theorem Proving Guillaume Lample Marie-Anne Lachaux Thibaut Lavril Xavier Martinet Amaury Hayat Gabriel Ebner Aurelien Rodriguez Timothée Lacroix AIMat 78 150 0 23 May 2022
The AGI Containment Problem James Babcock János Kramár Roman V. Yampolskiy 63 276 0 02 Apr 2016