ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.06565
  4. Cited By
Concrete Problems in AI Safety

Concrete Problems in AI Safety

21 June 2016
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
ArXivPDFHTML

Papers citing "Concrete Problems in AI Safety"

50 / 476 papers shown
Title
Neural Network Model Extraction Attacks in Edge Devices by Hearing
  Architectural Hints
Neural Network Model Extraction Attacks in Edge Devices by Hearing Architectural Hints
Xing Hu
Ling Liang
Lei Deng
Shuangchen Li
Xinfeng Xie
Yu Ji
Yufei Ding
Chang Liu
T. Sherwood
Yuan Xie
AAML
MLAU
23
36
0
10 Mar 2019
Deep CNN-based Multi-task Learning for Open-Set Recognition
Deep CNN-based Multi-task Learning for Open-Set Recognition
Poojan Oza
Vishal M. Patel
24
35
0
07 Mar 2019
The Ethics of AI Ethics -- An Evaluation of Guidelines
The Ethics of AI Ethics -- An Evaluation of Guidelines
Thilo Hagendorff
AI4TS
28
1,156
0
28 Feb 2019
Conservative Agency via Attainable Utility Preservation
Conservative Agency via Attainable Utility Preservation
Alexander Matt Turner
Dylan Hadfield-Menell
Prasad Tadepalli
21
49
0
26 Feb 2019
Embedded Agency
Embedded Agency
A. Demski
Scott Garrabrant
AIFin
32
34
0
25 Feb 2019
Learning to Generalize from Sparse and Underspecified Rewards
Learning to Generalize from Sparse and Underspecified Rewards
Rishabh Agarwal
Chen Liang
Dale Schuurmans
Mohammad Norouzi
OffRL
54
97
0
19 Feb 2019
Value constrained model-free continuous control
Value constrained model-free continuous control
Steven Bohez
A. Abdolmaleki
Michael Neunert
J. Buchli
N. Heess
R. Hadsell
24
62
0
12 Feb 2019
Go-Explore: a New Approach for Hard-Exploration Problems
Go-Explore: a New Approach for Hard-Exploration Problems
Adrien Ecoffet
Joost Huizinga
Joel Lehman
Kenneth O. Stanley
Jeff Clune
AI4TS
24
362
0
30 Jan 2019
Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement
  Learning
Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement Learning
Andreas Sedlmeier
Thomas Gabor
Thomy Phan
Lenz Belzner
Claudia Linnhoff-Popien
UQCV
22
24
0
08 Jan 2019
Risk-Aware Active Inverse Reinforcement Learning
Risk-Aware Active Inverse Reinforcement Learning
Daniel S. Brown
Yuchen Cui
S. Niekum
27
58
0
08 Jan 2019
A predictive safety filter for learning-based control of constrained
  nonlinear dynamical systems
A predictive safety filter for learning-based control of constrained nonlinear dynamical systems
K. P. Wabersich
Melanie Zeilinger
AI4CE
26
155
0
13 Dec 2018
AutoGAN: Robust Classifier Against Adversarial Attacks
AutoGAN: Robust Classifier Against Adversarial Attacks
Blerta Lindqvist
Shridatt Sugrim
R. Izmailov
AAML
29
7
0
08 Dec 2018
Evaluating Bayesian Deep Learning Methods for Semantic Segmentation
Evaluating Bayesian Deep Learning Methods for Semantic Segmentation
Jishnu Mukhoti
Y. Gal
UQCV
BDL
33
219
0
30 Nov 2018
Probabilistic Object Detection: Definition and Evaluation
Probabilistic Object Detection: Definition and Evaluation
David Hall
Feras Dayoub
John Skinner
Haoyang Zhang
Dimity Miller
Peter Corke
G. Carneiro
A. Angelova
Niko Sünderhauf
UQCV
38
111
0
27 Nov 2018
Scalable agent alignment via reward modeling: a research direction
Scalable agent alignment via reward modeling: a research direction
Jan Leike
David M. Krueger
Tom Everitt
Miljan Martic
Vishal Maini
Shane Legg
34
397
0
19 Nov 2018
Towards Governing Agent's Efficacy: Action-Conditional $β$-VAE for
  Deep Transparent Reinforcement Learning
Towards Governing Agent's Efficacy: Action-Conditional βββ-VAE for Deep Transparent Reinforcement Learning
John Yang
Gyujeong Lee
Minsung Hyun
Simyung Chang
Nojun Kwak
29
3
0
11 Nov 2018
Preparing for the Unexpected: Diversity Improves Planning Resilience in
  Evolutionary Algorithms
Preparing for the Unexpected: Diversity Improves Planning Resilience in Evolutionary Algorithms
Thomas Gabor
Lenz Belzner
Thomy Phan
Kyrill Schmid
19
14
0
30 Oct 2018
Stability-certified reinforcement learning: A control-theoretic
  perspective
Stability-certified reinforcement learning: A control-theoretic perspective
Ming Jin
Javad Lavaei
31
85
0
26 Oct 2018
Supervising strong learners by amplifying weak experts
Supervising strong learners by amplifying weak experts
Paul Christiano
Buck Shlegeris
Dario Amodei
27
114
0
19 Oct 2018
Semi-supervised Deep Reinforcement Learning in Support of IoT and Smart
  City Services
Semi-supervised Deep Reinforcement Learning in Support of IoT and Smart City Services
M. Mohammadi
Ala I. Al-Fuqaha
Mohsen Guizani
Jun-Seok Oh
OffRL
HAI
22
337
0
09 Oct 2018
Scenic: A Language for Scenario Specification and Scene Generation
Scenic: A Language for Scenario Specification and Scene Generation
Daniel J. Fremont
T. Dreossi
Shromona Ghosh
Xiangyu Yue
Alberto L. Sangiovanni-Vincentelli
S. Seshia
42
246
0
25 Sep 2018
Emergence of Human-comparable Balancing Behaviors by Deep Reinforcement
  Learning
Emergence of Human-comparable Balancing Behaviors by Deep Reinforcement Learning
Chuanyu Yang
Taku Komura
Zhibin Li
24
20
0
06 Sep 2018
Using Machine Learning Safely in Automotive Software: An Assessment and
  Adaption of Software Process Requirements in ISO 26262
Using Machine Learning Safely in Automotive Software: An Assessment and Adaption of Software Process Requirements in ISO 26262
Rick Salay
Krzysztof Czarnecki
25
69
0
05 Aug 2018
Multi-Agent Generative Adversarial Imitation Learning
Multi-Agent Generative Adversarial Imitation Learning
Jiaming Song
Hongyu Ren
Dorsa Sadigh
Stefano Ermon
GAN
27
216
0
26 Jul 2018
Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf
  restraining specifications
Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf restraining specifications
Giuseppe De Giacomo
Luca Iocchi
Marco Favorito
F. Patrizi
OffRL
20
121
0
17 Jul 2018
A Simple Unified Framework for Detecting Out-of-Distribution Samples and
  Adversarial Attacks
A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks
Kimin Lee
Kibok Lee
Honglak Lee
Jinwoo Shin
OODD
23
2,004
0
10 Jul 2018
A Broader View on Bias in Automated Decision-Making: Reflecting on
  Epistemology and Dynamics
A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics
Roel Dobbe
Sarah Dean
T. Gilbert
Nitin Kohli
11
39
0
02 Jul 2018
Leveraging Uncertainty Estimates for Predicting Segmentation Quality
Leveraging Uncertainty Estimates for Predicting Segmentation Quality
Terrance Devries
Graham W. Taylor
UQCV
25
114
0
02 Jul 2018
Learning to Drive in a Day
Learning to Drive in a Day
Alex Kendall
Jeffrey Hawke
David Janz
Przemyslaw Mazur
Daniele Reda
John M. Allen
Vinh-Dieu Lam
Alex Bewley
Amar Shah
42
643
0
01 Jul 2018
Interpretable to Whom? A Role-based Model for Analyzing Interpretable
  Machine Learning Systems
Interpretable to Whom? A Role-based Model for Analyzing Interpretable Machine Learning Systems
Richard J. Tomsett
Dave Braines
Daniel Harborne
Alun D. Preece
Supriyo Chakraborty
FaML
29
164
0
20 Jun 2018
An Efficient, Generalized Bellman Update For Cooperative Inverse
  Reinforcement Learning
An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning
Dhruv Malik
Malayandi Palaniappan
J. F. Fisac
Dylan Hadfield-Menell
Stuart J. Russell
Anca Dragan
8
31
0
11 Jun 2018
Learning convex bounds for linear quadratic control policy synthesis
Learning convex bounds for linear quadratic control policy synthesis
Jack Umenberger
Thomas B. Schon
26
12
0
01 Jun 2018
Learning a Prior over Intent via Meta-Inverse Reinforcement Learning
Learning a Prior over Intent via Meta-Inverse Reinforcement Learning
Kelvin Xu
Ellis Ratner
Anca Dragan
Sergey Levine
Chelsea Finn
27
66
0
31 May 2018
To Trust Or Not To Trust A Classifier
To Trust Or Not To Trust A Classifier
Heinrich Jiang
Been Kim
Melody Y. Guan
Maya R. Gupta
UQCV
30
464
0
30 May 2018
Variational Inverse Control with Events: A General Framework for
  Data-Driven Reward Definition
Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition
Justin Fu
Avi Singh
Dibya Ghosh
Larry Yang
Sergey Levine
BDL
14
125
0
29 May 2018
Reinforced Imitation: Sample Efficient Deep Reinforcement Learning for
  Map-less Navigation by Leveraging Prior Demonstrations
Reinforced Imitation: Sample Efficient Deep Reinforcement Learning for Map-less Navigation by Leveraging Prior Demonstrations
Mark Pfeiffer
Samarth Shukla
M. Turchetta
Cesar Cadena
Andreas Krause
Roland Siegwart
Juan I. Nieto
27
157
0
18 May 2018
AGI Safety Literature Review
AGI Safety Literature Review
Tom Everitt
G. Lea
Marcus Hutter
AI4CE
36
115
0
03 May 2018
FPR -- Fast Path Risk Algorithm to Evaluate Collision Probability
FPR -- Fast Path Risk Algorithm to Evaluate Collision Probability
A. Blake
Alejandro Bordallo
Kamen Brestnichki
Majd Hawasly
Svetlin Penkov
S. Ramamoorthy
Alexandre Silva
30
6
0
15 Apr 2018
Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust
  Deep Learning
Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning
Nicolas Papernot
Patrick McDaniel
OOD
AAML
13
503
0
13 Mar 2018
Predictive Uncertainty Estimation via Prior Networks
Predictive Uncertainty Estimation via Prior Networks
A. Malinin
Mark Gales
UD
BDL
EDL
UQCV
PER
32
898
0
28 Feb 2018
Learning Confidence for Out-of-Distribution Detection in Neural Networks
Learning Confidence for Out-of-Distribution Detection in Neural Networks
Terrance Devries
Graham W. Taylor
OOD
OODD
43
581
0
13 Feb 2018
Learning from Richer Human Guidance: Augmenting Comparison-Based
  Learning with Feature Queries
Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries
Chandrayee Basu
M. Singhal
Anca Dragan
28
57
0
05 Feb 2018
AI Safety Gridworlds
AI Safety Gridworlds
Jan Leike
Miljan Martic
Victoria Krakovna
Pedro A. Ortega
Tom Everitt
Andrew Lefrancq
Laurent Orseau
Shane Legg
23
250
0
27 Nov 2017
Training Confidence-calibrated Classifiers for Detecting
  Out-of-Distribution Samples
Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples
Kimin Lee
Honglak Lee
Kibok Lee
Jinwoo Shin
OODD
70
873
0
26 Nov 2017
Good and safe uses of AI Oracles
Good and safe uses of AI Oracles
Stuart Armstrong
Xavier O'Rorke
30
26
0
15 Nov 2017
Inverse Reward Design
Inverse Reward Design
Dylan Hadfield-Menell
S. Milli
Pieter Abbeel
Stuart J. Russell
Anca Dragan
28
391
0
08 Nov 2017
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
Justin Fu
Katie Z Luo
Sergey Levine
53
739
0
30 Oct 2017
How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in
  Robotics
How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in Robotics
Anirudha Majumdar
Marco Pavone
13
192
0
30 Oct 2017
Safety-Aware Apprenticeship Learning
Safety-Aware Apprenticeship Learning
Weichao Zhou
Wenchao Li
36
34
0
22 Oct 2017
A Policy Search Method For Temporal Logic Specified Reinforcement
  Learning Tasks
A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks
Xiao Li
Yao Ma
C. Belta
18
59
0
27 Sep 2017
Previous
123...1089
Next