System III: Learning with Domain Knowledge for Safety Constraints

23 April 2023

Papers citing "System III: Learning with Domain Knowledge for Safety Constraints"

6 / 6 papers shown

Title
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models Michael Lan Phillip H. S. Torr Fazl Barez LRM 38 3 0 07 Nov 2023
Neuron to Graph: Interpreting Language Model Neurons at Scale Alex Foote Neel Nanda Esben Kran Ioannis Konstas Shay B. Cohen Fazl Barez MILM 11 24 0 31 May 2023
N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models Alex Foote Neel Nanda Esben Kran Ionnis Konstas Fazl Barez MILM 28 3 0 22 Apr 2023
Fairness in AI and Its Long-Term Implications on Society Ondrej Bohdal Timothy M. Hospedales Philip Torr Fazl Barez 15 4 0 16 Apr 2023
Unsolved Problems in ML Safety Dan Hendrycks Nicholas Carlini John Schulman Jacob Steinhardt 186 276 0 28 Sep 2021
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic Mingyu Cai Mohammadhosein Hasanbeig Shaoping Xiao Alessandro Abate Z. Kan 80 86 0 24 Feb 2021