ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.04952
  4. Cited By
Open Problems in Machine Unlearning for AI Safety

Open Problems in Machine Unlearning for AI Safety

10 January 2025
Fazl Barez
Tingchen Fu
Ameya Prabhu
Stephen Casper
Amartya Sanyal
Adel Bibi
Aidan O'Gara
Robert Kirk
Ben Bucknall
Tim Fist
Luke Ong
Philip Torr
Kwok-Yan Lam
Robert F. Trager
David M. Krueger
Sören Mindermann
José Hernandez-Orallo
Mor Geva
Y. Gal
    MU
ArXivPDFHTML

Papers citing "Open Problems in Machine Unlearning for AI Safety"

11 / 11 papers shown
Title
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
Gabriele Sarti
Vilém Zouhar
Malvina Nissim
Arianna Bisazza
28
0
0
29 May 2025
Precise In-Parameter Concept Erasure in Large Language Models
Precise In-Parameter Concept Erasure in Large Language Models
Yoav Gur-Arieh
Clara Suslik
Yihuai Hong
Fazl Barez
Mor Geva
KELM
MU
54
0
0
28 May 2025
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
Biao Yi
Tiansheng Huang
Baolei Zhang
Tong Li
Lihai Nie
Zheli Liu
Li Shen
MU
AAML
51
0
0
22 May 2025
On the creation of narrow AI: hierarchy and nonlocality of neural network skills
On the creation of narrow AI: hierarchy and nonlocality of neural network skills
Eric J. Michaud
Asher Parker-Sartori
Max Tegmark
66
0
0
21 May 2025
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Jizhou Guo
Zhaomin Wu
Philip S. Yu
55
0
0
18 May 2025
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
Yiming Du
Wenyu Huang
Danna Zheng
Zhaowei Wang
Sébastien Montella
Mirella Lapata
Kam-Fai Wong
Jeff Z. Pan
KELM
MU
145
3
0
01 May 2025
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging
Haoming Xu
Shuxun Wang
Yanqiu Zhao
Yi Zhong
Ziyan Jiang
Ningyuan Zhao
Shumin Deng
Hong Chen
N. Zhang
MoMe
MU
117
0
0
27 Mar 2025
Do Sparse Autoencoders Generalize? A Case Study of Answerability
Do Sparse Autoencoders Generalize? A Case Study of Answerability
Lovis Heindrich
Philip Torr
Fazl Barez
Veronika Thost
95
1
0
27 Feb 2025
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
Junkai Chen
Zhijie Deng
Kening Zheng
Yibo Yan
Shuliang Liu
PeiJun Wu
Peijie Jiang
Qingbin Liu
Xuming Hu
MU
80
6
0
18 Feb 2025
ReLearn: Unlearning via Learning for Large Language Models
ReLearn: Unlearning via Learning for Large Language Models
Haoming Xu
Ningyuan Zhao
Liming Yang
Sendong Zhao
Shumin Deng
Mengru Wang
Bryan Hooi
Nay Oo
Ningyu Zhang
N. Zhang
MU
KELM
CLL
422
1
0
16 Feb 2025
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Zora Che
Stephen Casper
Robert Kirk
Anirudh Satheesh
Stewart Slocum
...
Zikui Cai
Bilal Chughtai
Y. Gal
Furong Huang
Dylan Hadfield-Menell
MU
AAML
ELM
109
3
0
03 Feb 2025
1