ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.06565
  4. Cited By
Concrete Problems in AI Safety

Concrete Problems in AI Safety

21 June 2016
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
ArXivPDFHTML

Papers citing "Concrete Problems in AI Safety"

50 / 482 papers shown
Title
LLM-SAP: Large Language Models Situational Awareness Based Planning
LLM-SAP: Large Language Models Situational Awareness Based Planning
Liman Wang
Hanyang Zhong
LLMAG
35
2
0
26 Dec 2023
The Adaptive Arms Race: Redefining Robustness in AI Security
The Adaptive Arms Race: Redefining Robustness in AI Security
Ilias Tsingenopoulos
Vera Rimmer
Davy Preuveneers
Fabio Pierazzi
Lorenzo Cavallaro
Wouter Joosen
AAML
85
0
0
20 Dec 2023
Toward Responsible AI Use: Considerations for Sustainability Impact
  Assessment
Toward Responsible AI Use: Considerations for Sustainability Impact Assessment
Eva Thelisson
Grzegorz Mika
Quentin Schneiter
Kirtan Padh
Himanshu Verma
26
0
0
19 Dec 2023
On a Functional Definition of Intelligence
On a Functional Definition of Intelligence
Warisa Sritriratanarak
Paulo Garcia
23
0
0
15 Dec 2023
Deep Learning for Koopman-based Dynamic Movement Primitives
Deep Learning for Koopman-based Dynamic Movement Primitives
Tyler Han
Carl Glen Henshaw
36
0
0
06 Dec 2023
Exploring the Robustness of Model-Graded Evaluations and Automated
  Interpretability
Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen
Ondvrej Kvapil
ELM
AAML
23
3
0
26 Nov 2023
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations
Xuzhe Dang
Stefan Edelkamp
37
4
0
06 Nov 2023
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Simon Lermen
Charlie Rogers-Smith
Jeffrey Ladish
ALM
31
83
0
31 Oct 2023
A Review of the Evidence for Existential Risk from AI via Misaligned
  Power-Seeking
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking
Rose Hadshar
26
6
0
27 Oct 2023
Factored Verification: Detecting and Reducing Hallucination in Summaries
  of Academic Papers
Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers
Charlie George
Andreas Stuhlmuller
HILM
28
5
0
16 Oct 2023
IW-GAE: Importance Weighted Group Accuracy Estimation for Improved
  Calibration and Model Selection in Unsupervised Domain Adaptation
IW-GAE: Importance Weighted Group Accuracy Estimation for Improved Calibration and Model Selection in Unsupervised Domain Adaptation
Taejong Joo
Diego Klabjan
43
1
0
16 Oct 2023
SALMON: Self-Alignment with Instructable Reward Models
SALMON: Self-Alignment with Instructable Reward Models
Zhiqing Sun
Songlin Yang
Hongxin Zhang
Qinhong Zhou
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
ALM
SyDa
41
35
0
09 Oct 2023
DeepDecipher: Accessing and Investigating Neuron Activation in Large
  Language Models
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Albert Garde
Esben Kran
Fazl Barez
19
2
0
03 Oct 2023
Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic
  Manipulation Tasks
Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks
Wenke Huang
Filippos Christianos
Zhibin Li
42
8
0
28 Sep 2023
Beyond Reverse KL: Generalizing Direct Preference Optimization with
  Diverse Divergence Constraints
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
Chaoqi Wang
Yibo Jiang
Yuguang Yang
Han Liu
Yuxin Chen
42
82
0
28 Sep 2023
Learning to Recover for Safe Reinforcement Learning
Learning to Recover for Safe Reinforcement Learning
Haoyu Wang
Xin Yuan
Qinqing Ren
36
0
0
21 Sep 2023
Learning Active Subspaces for Effective and Scalable Uncertainty
  Quantification in Deep Neural Networks
Learning Active Subspaces for Effective and Scalable Uncertainty Quantification in Deep Neural Networks
Sanket R. Jantre
Nathan M. Urban
Xiaoning Qian
Byung-Jun Yoon
BDL
UQCV
34
4
0
06 Sep 2023
On Reducing Undesirable Behavior in Deep Reinforcement Learning Models
On Reducing Undesirable Behavior in Deep Reinforcement Learning Models
Ophir M. Carmel
Guy Katz
40
0
0
06 Sep 2023
Iterative Reward Shaping using Human Feedback for Correcting Reward
  Misspecification
Iterative Reward Shaping using Human Feedback for Correcting Reward Misspecification
Jasmina Gajcin
J. McCarthy
Rahul Nair
Radu Marinescu
Elizabeth M. Daly
Ivana Dusparic
25
3
0
30 Aug 2023
RecRec: Algorithmic Recourse for Recommender Systems
RecRec: Algorithmic Recourse for Recommender Systems
Sahil Verma
Ashudeep Singh
Varich Boonsanong
John P. Dickerson
Chirag Shah
35
1
0
28 Aug 2023
The Promise and Peril of Artificial Intelligence -- Violet Teaming
  Offers a Balanced Path Forward
The Promise and Peril of Artificial Intelligence -- Violet Teaming Offers a Balanced Path Forward
A. Titus
Adam Russell
38
1
0
28 Aug 2023
Language Reward Modulation for Pretraining Reinforcement Learning
Language Reward Modulation for Pretraining Reinforcement Learning
Ademi Adeniji
Amber Xie
Carmelo Sferrazza
Younggyo Seo
Stephen James
Pieter Abbeel
39
26
0
23 Aug 2023
Simple synthetic data reduces sycophancy in large language models
Simple synthetic data reduces sycophancy in large language models
Jerry W. Wei
Da Huang
Yifeng Lu
Denny Zhou
Quoc V. Le
38
69
0
07 Aug 2023
Rating-based Reinforcement Learning
Rating-based Reinforcement Learning
Devin White
Mingkang Wu
Ellen R. Novoseller
Vernon J. Lawhern
Nicholas R. Waytowich
Yongcan Cao
ALM
19
6
0
30 Jul 2023
Designing Fiduciary Artificial Intelligence
Designing Fiduciary Artificial Intelligence
Sebastian Benthall
David Shekman
51
4
0
27 Jul 2023
Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent
  Problems in AI Alignment using Large-Language Models
Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent Problems in AI Alignment using Large-Language Models
S. Phelps
Rebecca E. Ranson
LLMAG
34
1
0
20 Jul 2023
Classical Out-of-Distribution Detection Methods Benchmark in Text
  Classification Tasks
Classical Out-of-Distribution Detection Methods Benchmark in Text Classification Tasks
M. Baran
Joanna Baran
Mateusz Wójcik
Maciej Ziȩba
Adam Gonczarek
OODD
49
4
0
13 Jul 2023
Empirically Validating Conformal Prediction on Modern Vision
  Architectures Under Distribution Shift and Long-tailed Data
Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data
Kevin Kasa
Graham W. Taylor
45
2
0
03 Jul 2023
Beyond AUROC & co. for evaluating out-of-distribution detection
  performance
Beyond AUROC & co. for evaluating out-of-distribution detection performance
Galadrielle Humblot-Renaux
Sergio Escalera
T. Moeslund
OODD
22
4
0
26 Jun 2023
A Cosine Similarity-based Method for Out-of-Distribution Detection
A Cosine Similarity-based Method for Out-of-Distribution Detection
Nguyen Ngoc-Hieu
Nguyen Hung-Quang
The-Anh Ta
Thanh Nguyen-Tang
Khoa D. Doan
Hoang Thanh-Tung
OODD
24
2
0
23 Jun 2023
A Data-Driven Measure of Relative Uncertainty for Misclassification
  Detection
A Data-Driven Measure of Relative Uncertainty for Misclassification Detection
Eduardo Dadalto Camara Gomes
Marco Romanelli
Georg Pichler
Pablo Piantanida
UQCV
43
5
0
02 Jun 2023
GPT4GEO: How a Language Model Sees the World's Geography
GPT4GEO: How a Language Model Sees the World's Geography
Jonathan Roberts
Timo Lüddecke
Sowmen Das
Kai Han
Samuel Albanie
29
60
0
30 May 2023
Reduced Precision Floating-Point Optimization for Deep Neural Network
  On-Device Learning on MicroControllers
Reduced Precision Floating-Point Optimization for Deep Neural Network On-Device Learning on MicroControllers
D. Nadalini
Manuele Rusci
Luca Benini
Francesco Conti
31
15
0
30 May 2023
Safety of autonomous vehicles: A survey on Model-based vs. AI-based
  approaches
Safety of autonomous vehicles: A survey on Model-based vs. AI-based approaches
Dimia Iberraken
Lounis Adouane
19
1
0
29 May 2023
Hybrid Energy Based Model in the Feature Space for Out-of-Distribution
  Detection
Hybrid Energy Based Model in the Feature Space for Out-of-Distribution Detection
Marc Lafon
Elias Ramzi
Clément Rambour
Nicolas Thome
OODD
35
10
0
26 May 2023
LM vs LM: Detecting Factual Errors via Cross Examination
LM vs LM: Detecting Factual Errors via Cross Examination
Roi Cohen
May Hamri
Mor Geva
Amir Globerson
HILM
41
120
0
22 May 2023
Explaining black box text modules in natural language with language
  models
Explaining black box text modules in natural language with language models
Chandan Singh
Aliyah R. Hsu
Richard Antonello
Shailee Jain
Alexander G. Huth
Bin-Xia Yu
Jianfeng Gao
MILM
36
47
0
17 May 2023
Towards ethical multimodal systems
Towards ethical multimodal systems
Alexis Roger
Esma Aïmeur
Irina Rish
40
3
0
26 Apr 2023
Approximate Shielding of Atari Agents for Safe Exploration
Approximate Shielding of Atari Agents for Safe Exploration
Alexander W. Goodall
Francesco Belardinelli
27
2
0
21 Apr 2023
The e-Bike Motor Assembly: Towards Advanced Robotic Manipulation for
  Flexible Manufacturing
The e-Bike Motor Assembly: Towards Advanced Robotic Manipulation for Flexible Manufacturing
Leonel Rozo
A. Kupcsik
Philipp Schillinger
Meng Guo
R. Krug
...
Patrick Kesper
Sabrina Hoppe
Hanna Ziesche
M. Burger
Kai O. Arras
38
5
0
20 Apr 2023
Fairness in AI and Its Long-Term Implications on Society
Fairness in AI and Its Long-Term Implications on Society
Ondrej Bohdal
Timothy M. Hospedales
Philip Torr
Fazl Barez
15
4
0
16 Apr 2023
Uncertainty-Aware Vehicle Energy Efficiency Prediction using an Ensemble
  of Neural Networks
Uncertainty-Aware Vehicle Energy Efficiency Prediction using an Ensemble of Neural Networks
Jihed Khiari
Cristina Olaverri-Monreal
19
1
0
14 Apr 2023
Learning Personalized Decision Support Policies
Learning Personalized Decision Support Policies
Umang Bhatt
Valerie Chen
Katherine M. Collins
Parameswaran Kamalaruban
Emma Kallina
Adrian Weller
Ameet Talwalkar
OffRL
56
10
0
13 Apr 2023
Transfer Knowledge from Head to Tail: Uncertainty Calibration under
  Long-tailed Distribution
Transfer Knowledge from Head to Tail: Uncertainty Calibration under Long-tailed Distribution
Jiahao Chen
Bingyue Su
32
11
0
13 Apr 2023
Establishing baselines and introducing TernaryMixOE for fine-grained
  out-of-distribution detection
Establishing baselines and introducing TernaryMixOE for fine-grained out-of-distribution detection
Noah Fleischmann
Walter D. Bennette
Nathan Inkawhich
OODD
18
0
0
30 Mar 2023
Enhancing Multiple Reliability Measures via Nuisance-extended
  Information Bottleneck
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck
Jongheon Jeong
Sihyun Yu
Hankook Lee
Jinwoo Shin
AAML
46
0
0
24 Mar 2023
MLTEing Models: Negotiating, Evaluating, and Documenting Model and
  System Qualities
MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities
Katherine R. Maffey
Kyle Dotterrer
Jennifer Niemann
Iain J. Cruickshank
Grace A. Lewis
Christian Kastner
32
4
0
03 Mar 2023
Toward Robust Uncertainty Estimation with Random Activation Functions
Toward Robust Uncertainty Estimation with Random Activation Functions
Y. Stoyanova
Soroush Ghandi
M. Tavakol
UQCV
26
2
0
28 Feb 2023
Reward Design with Language Models
Reward Design with Language Models
Minae Kwon
Sang Michael Xie
Kalesha Bullard
Dorsa Sadigh
LM&Ro
44
202
0
27 Feb 2023
Machine Love
Machine Love
Joel Lehman
28
5
0
18 Feb 2023
Previous
123456...8910
Next