Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16737
Cited By
Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization
22 May 2025
Chengcan Wu
Zhixin Zhang
Zeming Wei
Yihao Zhang
Meng Sun
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization"
5 / 5 papers shown
Title
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
126
1,709
0
28 Sep 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
141
1,376
0
27 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
180
4,085
0
09 Jun 2023
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
133
1,475
0
24 May 2019
Intriguing properties of neural networks
Christian Szegedy
Wojciech Zaremba
Ilya Sutskever
Joan Bruna
D. Erhan
Ian Goodfellow
Rob Fergus
AAML
106
14,831
1
21 Dec 2013
1