ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.17688
  4. Cited By
Managing extreme AI risks amid rapid progress

Managing extreme AI risks amid rapid progress

26 October 2023
Yoshua Bengio
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
Trevor Darrell
Y. Harari
Ya-Qin Zhang
Lan Xue
Shai Shalev-Shwartz
Gillian Hadfield
Jeff Clune
Tegan Maharaj
Frank Hutter
Atilim Gunecs Baydin
Sheila A. McIlraith
Qiqi Gao
Ashwin Acharya
David M. Krueger
Anca Dragan
Philip H. S. Torr
Stuart J. Russell
Daniel Kahneman
J. Brauner
Sören Mindermann
ArXivPDFHTML

Papers citing "Managing extreme AI risks amid rapid progress"

19 / 19 papers shown
Title
Towards Contamination Resistant Benchmarks
Towards Contamination Resistant Benchmarks
Rahmatullah Musawi
Sheng Lu
42
0
0
13 May 2025
An alignment safety case sketch based on debate
An alignment safety case sketch based on debate
Marie Davidsen Buhl
Jacob Pfau
Benjamin Hilton
Geoffrey Irving
38
0
0
06 May 2025
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
Julien Piet
Xiao Huang
Dennis Jacob
Annabella Chow
Maha Alrashed
Geng Zhao
Zhanhao Hu
Chawin Sitawarin
Basel Alomair
David A. Wagner
AAML
70
0
0
28 Apr 2025
Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society
Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society
Feifei Zhao
Y. Wang
Enmeng Lu
Dongcheng Zhao
Bing Han
...
Chao Liu
Yaodong Yang
Yi Zeng
Boyuan Chen
Jinyu Fan
83
0
0
24 Apr 2025
The Pitfalls of "Security by Obscurity" And What They Mean for Transparent AI
The Pitfalls of "Security by Obscurity" And What They Mean for Transparent AI
Peter Hall
Olivia Mundahl
Sunoo Park
76
0
0
30 Jan 2025
Two Types of AI Existential Risk: Decisive and Accumulative
Two Types of AI Existential Risk: Decisive and Accumulative
Atoosa Kasirzadeh
65
14
0
20 Jan 2025
On the Inherent Robustness of One-Stage Object Detection against Out-of-Distribution Data
On the Inherent Robustness of One-Stage Object Detection against Out-of-Distribution Data
Aitor Martinez-Seras
Javier Del Ser
Alain Andres
Pablo García Bringas
Pablo Garcia-Bringas
OODD
45
0
0
07 Nov 2024
On the Role of Attention Heads in Large Language Model Safety
On the Role of Attention Heads in Large Language Model Safety
Zhenhong Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Fan Zhang
Yongbin Li
59
5
0
17 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
84
1
0
09 Oct 2024
Insuring Uninsurable Risks from AI: The State as Insurer of Last Resort
Insuring Uninsurable Risks from AI: The State as Insurer of Last Resort
Cristian Trout
21
0
0
10 Sep 2024
Safeguarding AI Agents: Developing and Analyzing Safety Architectures
Safeguarding AI Agents: Developing and Analyzing Safety Architectures
Ishaan Domkundwar
Mukunda N S
Ishaan Bhola
Riddhik Kochhar
LLMAG
31
1
0
03 Sep 2024
Responsible AI Question Bank: A Comprehensive Tool for AI Risk Assessment
Responsible AI Question Bank: A Comprehensive Tool for AI Risk Assessment
Sung Une Lee
Harsha Perera
Yue Liu
Boming Xia
Qinghua Lu
Liming Zhu
Olivier Salvado
Jon Whittle
19
1
0
02 Aug 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
82
19
0
02 Jul 2024
How Alignment and Jailbreak Work: Explain LLM Safety through
  Intermediate Hidden States
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
Zhenhong Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Yongbin Li
26
28
0
09 Jun 2024
Modeling Emotions and Ethics with Large Language Models
Modeling Emotions and Ethics with Large Language Models
Edward Y. Chang
37
1
0
15 Apr 2024
Safety Cases: How to Justify the Safety of Advanced AI Systems
Safety Cases: How to Justify the Safety of Advanced AI Systems
Joshua Clymer
Nick Gabrieli
David Krueger
Thomas Larsen
40
26
0
15 Mar 2024
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
213
192
0
20 Oct 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
379
8,495
0
28 Jan 2022
Unsolved Problems in ML Safety
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
186
273
0
28 Sep 2021
1