ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.17805
  4. Cited By
International AI Safety Report

International AI Safety Report

29 January 2025
Yoshua Bengio
Sören Mindermann
Daniel Privitera
T. Besiroglu
Rishi Bommasani
Stephen Casper
Yejin Choi
Philip Fox
Ben Garfinkel
Danielle Goldfarb
Hoda Heidari
A. Ho
Sayash Kapoor
Leila Khalatbari
Shayne Longpre
Sam Manning
Vasilios Mavroudis
Mantas Mazeika
Julian Michael
Jessica Newman
Kwan Yee Ng
Chinasa T. Okolo
Deborah Raji
Girish Sastry
Elizabeth Seger
Theodora Skeadas
Tobin South
Emma Strubell
F. Tramèr
Lucia Velasco
Nicole Wheeler
Daron Acemoglu
Olubayo Adekanmbi
David Dalrymple
Thomas G. Dietterich
Edward W. Felten
Pascale Fung
Pierre-Olivier Gourinchas
Fredrik Heintz
Geoffrey Hinton
N. Jennings
Andreas Krause
Susan Leavy
Percy Liang
Teresa Ludermir
Vidushi Marda
Helen Margetts
John McDermid
Jane Munga
Arvind Narayanan
Alondra Nelson
Clara Neppel
Alice Oh
Gopal Ramchurn
Stuart J. Russell
Marietje Schaake
Bernhard Schölkopf
Dawn Song
Alvaro Soto
Lee Tiedrich
Gaël Varoquaux
Andrew Yao
Ya-Qin Zhang
Fahad Albalawi
Marwan Alserkal
Olubunmi Ajala
Guillaume Avrin
Christian Busch
André Carlos Ponce de Leon Ferreira de Carvalho
Bronwyn Fox
Amandeep Singh Gill
Ahmet Halit Hatip
Juha Heikkilä
Gill Jolly
Ziv Katzir
Hiroaki Kitano
Antonio Krüger
Chris Johnson
Saif M. Khan
Kyoung Mu Lee
Dominic Vincent Ligot
Oleksii Molchanovskyi
Andrea Monti
Nusu Mwamanzi
Mona Nemer
Nuria Oliver
José Ramón López Portillo
Balaraman Ravindran
Raquel Pezoa Rivera
Hammam Riza
Crystal Rugege
Ciarán Seoighe
Jerry Sheehan
Haroon Sheikh
Denise Wong
Yi Zeng
ArXivPDFHTML

Papers citing "International AI Safety Report"

15 / 15 papers shown
Title
Mitigating Deceptive Alignment via Self-Monitoring
Mitigating Deceptive Alignment via Self-Monitoring
Jiaming Ji
Wenqi Chen
Kaile Wang
Donghai Hong
Sitong Fang
...
Jiayi Zhou
Juntao Dai
Sirui Han
Yike Guo
Yaodong Yang
LRM
33
0
0
24 May 2025
Tracr-Injection: Distilling Algorithms into Pre-trained Language Models
Tracr-Injection: Distilling Algorithms into Pre-trained Language Models
Tomás Vergara-Browne
Álvaro Soto
116
0
0
15 May 2025
An alignment safety case sketch based on debate
An alignment safety case sketch based on debate
Marie Davidsen Buhl
Jacob Pfau
Benjamin Hilton
Geoffrey Irving
69
0
0
06 May 2025
Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii
Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii
Kola Ayonrinde
Louis Jaburi
XAI
111
1
0
02 May 2025
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
Kola Ayonrinde
Louis Jaburi
MILM
129
1
0
01 May 2025
A Framework to Assess the Persuasion Risks Large Language Model Chatbots Pose to Democratic Societies
A Framework to Assess the Persuasion Risks Large Language Model Chatbots Pose to Democratic Societies
Zhongren Chen
Joshua Kalla
Quan Le
Shinpei Nakamura-Sakai
Jasjeet Sekhon
Ruixiao Wang
41
0
0
29 Apr 2025
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models
Ziwen Xu
Shuxun Wang
Kewei Xu
Haoming Xu
Mengru Wang
Xinle Deng
Yunzhi Yao
Guozhou Zheng
Ningyu Zhang
Xin Xu
KELM
LLMSV
393
1
0
21 Apr 2025
Perceptions of Agentic AI in Organizations: Implications for Responsible AI and ROI
Perceptions of Agentic AI in Organizations: Implications for Responsible AI and ROI
Lee Ackerman
36
0
0
15 Apr 2025
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Priyanshu Kumar
Devansh Jain
Akhila Yerukola
Liwei Jiang
Himanshu Beniwal
Thomas Hartvigsen
Maarten Sap
83
1
0
06 Apr 2025
A First-Principles Based Risk Assessment Framework and the IEEE P3396 Standard
A First-Principles Based Risk Assessment Framework and the IEEE P3396 Standard
Richard J. Tong
Marina Cortês
Jeanine A. DeFalco
Mark Underwood
Janusz Zalewski
61
0
0
31 Mar 2025
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur
Ekdeep Singh Lubana
Thomas Fel
Demba Ba
78
6
0
03 Mar 2025
À la recherche du sens perdu: your favourite LLM might have more to say than you can understand
K. O. T. Erziev
76
0
0
28 Feb 2025
Fair Foundation Models for Medical Image Analysis: Challenges and Perspectives
Fair Foundation Models for Medical Image Analysis: Challenges and Perspectives
Dilermando Queiroz
Anderson Carlos
André Anjos
Lilian Berton
83
0
0
24 Feb 2025
Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
Jacob Nielsen
Peter Schneider-Kamp
Lukas Galke
MQ
74
1
0
17 Feb 2025
Aligning Generalisation Between Humans and Machines
Aligning Generalisation Between Humans and Machines
Filip Ilievski
Barbara Hammer
F. V. Harmelen
Benjamin Paassen
S. Saralajew
...
Vered Shwartz
Gabriella Skitalinskaya
Clemens Stachl
Gido M. van de Ven
T. Villmann
226
1
0
23 Nov 2024
1