Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.06565
Cited By
Concrete Problems in AI Safety
21 June 2016
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Concrete Problems in AI Safety"
50 / 479 papers shown
Title
Guiding Pretraining in Reinforcement Learning with Large Language Models
Yuqing Du
Olivia Watkins
Zihan Wang
Cédric Colas
Trevor Darrell
Pieter Abbeel
Abhishek Gupta
Jacob Andreas
LM&Ro
25
175
0
13 Feb 2023
Probabilistic Circuits That Know What They Don't Know
Fabrizio G. Ventola
Steven Braun
Zhongjie Yu
Martin Mundt
Kristian Kersting
UQCV
TPM
32
7
0
13 Feb 2023
Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning
Yunke Wang
Bo Du
Chang Xu
38
8
0
13 Feb 2023
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
19
17
0
20 Jan 2023
Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes
Justin Reppert
Ben Rachbach
Charlie George
Luke Stebbing
Ju-Seung Byun
Maggie Appleton
Andreas Stuhlmuller
ReLM
LRM
43
17
0
04 Jan 2023
Don't do it: Safer Reinforcement Learning With Rule-based Guidance
Ekaterina Nikonova
Cheng Xue
Jochen Renz
32
0
0
28 Dec 2022
Methodological reflections for AI alignment research using human feedback
Thilo Hagendorff
Sarah Fabi
24
6
0
22 Dec 2022
Circumventing interpretability: How to defeat mind-readers
Lee D. Sharkey
35
3
0
21 Dec 2022
Target Conditioned Representation Independence (TCRI); From Domain-Invariant to Domain-General Representations
Olawale Salaudeen
Oluwasanmi Koyejo
32
2
0
21 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
22
367
0
19 Dec 2022
MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations
Nicklas Hansen
Yixin Lin
H. Su
Xiaolong Wang
Vikash Kumar
Aravind Rajeswaran
OffRL
32
49
0
12 Dec 2022
Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking
Dennis Gross
T. D. Simão
N. Jansen
G. Pérez
AAML
46
2
0
10 Dec 2022
Online Shielding for Reinforcement Learning
Bettina Könighofer
Julian Rudolf
Alexander Palmisano
Martin Tappler
Roderick Bloem
OffRL
14
21
0
04 Dec 2022
Melting Pot 2.0
J. Agapiou
A. Vezhnevets
Edgar A. Duénez-Guzmán
Jayd Matyas
Yiran Mao
...
Sukhdeep Singh
Julia Haas
Igor Mordatch
D. Mobbs
Joel Z Leibo
45
32
0
24 Nov 2022
A Brief Overview of AI Governance for Responsible Machine Learning Systems
Navdeep Gill
Abhishek Mathur
Marcos V. Conde
29
5
0
21 Nov 2022
Reward Gaming in Conditional Text Generation
Richard Yuanzhe Pang
Vishakh Padmakumar
Thibault Sellam
Ankur P. Parikh
He He
35
24
0
16 Nov 2022
Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning
Katherine Metcalf
Miguel Sarabia
B. Theobald
OffRL
38
4
0
12 Nov 2022
Calibrated Perception Uncertainty Across Objects and Regions in Bird's-Eye-View
Markus Kängsepp
Meelis Kull
UQCV
15
4
0
08 Nov 2022
Progress and summary of reinforcement learning on energy management of MPS-EV
Jincheng Hu
Yang Lin
Liang Chu
Zhuoran Hou
Jihan Li
Jingjing Jiang
Yuanjian Zhang
23
12
0
08 Nov 2022
Interpreting deep learning output for out-of-distribution detection
Damian J. Matuszewski
I. Sintorn
OODD
32
1
0
07 Nov 2022
Measuring Progress on Scalable Oversight for Large Language Models
Sam Bowman
Jeeyoon Hyun
Ethan Perez
Edwin Chen
Craig Pettit
...
Tristan Hume
Yuntao Bai
Zac Hatfield-Dodds
Benjamin Mann
Jared Kaplan
ALM
ELM
28
123
0
04 Nov 2022
Reward Shaping Using Convolutional Neural Network
Hani Sami
Hadi Otrok
Jamal Bentahar
Azzam Mourad
Ernesto Damiani
29
3
0
30 Oct 2022
Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities
Jasmina Gajcin
Ivana Dusparic
CML
OffRL
35
8
0
21 Oct 2022
Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving
Eli Bronstein
Mark Palatucci
Dominik Notz
Brandyn White
Alex Kuefler
...
Punit Shah
Evan Racah
Benjamin Frenkel
Shimon Whiteson
Drago Anguelov
47
58
0
18 Oct 2022
Learning Control Admissibility Models with Graph Neural Networks for Multi-Agent Navigation
Chenning Yu
Hong-Den Yu
Sicun Gao
42
17
0
17 Oct 2022
Prompting GPT-3 To Be Reliable
Chenglei Si
Zhe Gan
Zhengyuan Yang
Shuohang Wang
Jianfeng Wang
Jordan L. Boyd-Graber
Lijuan Wang
KELM
LRM
60
283
0
17 Oct 2022
Microscopy is All You Need
Sergei V. Kalinin
Rama K Vasudevan
Yongtao Liu
Ayana Ghosh
Kevin M. Roccapriore
M. Ziatdinov
30
0
0
12 Oct 2022
Robust Models are less Over-Confident
Julia Grabinski
Paul Gavrikov
J. Keuper
M. Keuper
AAML
36
24
0
12 Oct 2022
Learning Control Policies for Stochastic Systems with Reach-avoid Guarantees
Dorde Zikelic
Mathias Lechner
T. Henzinger
K. Chatterjee
24
22
0
11 Oct 2022
Artificial virtuous agents in a multiagent tragedy of the commons
Jakob Stenseke
32
6
0
06 Oct 2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Rohin Shah
Vikrant Varma
Ramana Kumar
Mary Phuong
Victoria Krakovna
J. Uesato
Zachary Kenton
40
68
0
04 Oct 2022
ROAD-R: The Autonomous Driving Dataset with Logical Requirements
Eleonora Giunchiglia
Mihaela C. Stoian
Salman Khan
Fabio Cuzzolin
Thomas Lukasiewicz
AI4TS
47
31
0
04 Oct 2022
Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation
Yannick Hogewind
T. D. Simão
Tal Kachman
N. Jansen
21
10
0
02 Oct 2022
GaIA: Graphical Information Gain based Attention Network for Weakly Supervised Point Cloud Semantic Segmentation
Min Seok Lee
Seok Woo Yang
S. W. Han
3DPC
22
21
0
02 Oct 2022
Causal Proxy Models for Concept-Based Model Explanations
Zhengxuan Wu
Karel DÓosterlinck
Atticus Geiger
Amir Zur
Christopher Potts
MILM
83
35
0
28 Sep 2022
Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning
Zhengwei Fang
Rui Wang
Tao Huang
L. Jing
AAML
40
5
0
24 Sep 2022
Extremely Simple Activation Shaping for Out-of-Distribution Detection
Andrija Djurisic
Nebojsa Bozanic
Arjun Ashok
Rosanne Liu
OODD
172
151
0
20 Sep 2022
An information-theoretic perspective on intrinsic motivation in reinforcement learning: a survey
A. Aubret
L. Matignon
S. Hassas
39
35
0
19 Sep 2022
A Unifying Framework for Online Optimization with Long-Term Constraints
Matteo Castiglioni
A. Celli
A. Marchesi
Giulia Romano
N. Gatti
25
34
0
15 Sep 2022
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
68
183
0
30 Aug 2022
SAFE: Sensitivity-Aware Features for Out-of-Distribution Object Detection
Samuel Wilson
Tobias Fischer
Feras Dayoub
Dimity Miller
Niko Sünderhauf
OODD
31
29
0
29 Aug 2022
What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Julian Michael
Ari Holtzman
Alicia Parrish
Aaron Mueller
Alex Jinpeng Wang
...
Divyam Madaan
Nikita Nangia
Richard Yuanzhe Pang
Jason Phang
Sam Bowman
30
37
0
26 Aug 2022
Calibrated Selective Classification
Adam Fisch
Tommi Jaakkola
Regina Barzilay
31
17
0
25 Aug 2022
Learning Task Automata for Reinforcement Learning using Hidden Markov Models
Alessandro Abate
Y. Almulla
James Fox
David Hyland
Michael Wooldridge
OffRL
30
6
0
25 Aug 2022
Towards Augmented Microscopy with Reinforcement Learning-Enhanced Workflows
Michael Xu
Abinash Kumar
J. Lebeau
18
7
0
04 Aug 2022
Out-of-Distribution Detection with Semantic Mismatch under Masking
Yijun Yang
Ruiyuan Gao
Qiang Xu
OODD
24
27
0
31 Jul 2022
Sample-efficient Safe Learning for Online Nonlinear Control with Control Barrier Functions
Wenhao Luo
Wen Sun
Ashish Kapoor
OffRL
43
9
0
29 Jul 2022
Membership Inference Attacks via Adversarial Examples
Hamid Jalalzai
Elie Kadoche
Rémi Leluc
Vincent Plassier
AAML
FedML
MIACV
45
7
0
27 Jul 2022
Active Exploration for Inverse Reinforcement Learning
David Lindner
Andreas Krause
Giorgia Ramponi
29
24
0
18 Jul 2022
Reinforcement Learning For Survival, A Clinically Motivated Method For Critically Ill Patients
Thesath Nanayakkara
OOD
OffRL
24
0
0
17 Jul 2022
Previous
1
2
3
4
5
...
8
9
10
Next