Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025

Multimodal Large Language Models (MLLMs) have enabled transformative advancements across diverse applications but remain susceptible to safety threats, especially jailbreak attacks that induce harmful outputs. To systematically evaluate and improve their safety, we organized the Adversarial Testing & Large-model Alignment Safety Grand Challenge (ATLAS) 2025}. This technical report presents findings from the competition, which involved 86 teams testing MLLM vulnerabilities via adversarial image-text attacks in two phases: white-box and black-box evaluations. The competition results highlight ongoing challenges in securing MLLMs and provide valuable guidance for developing stronger defense mechanisms. The challenge establishes new benchmarks for MLLM safety evaluation and lays groundwork for advancing safer multimodal AI systems. The code and data for this challenge are openly available atthis https URL.
View on arXiv@article{ying2025_2506.12430, title={ Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025 }, author={ Zonghao Ying and Siyang Wu and Run Hao and Peng Ying and Shixuan Sun and Pengyu Chen and Junze Chen and Hao Du and Kaiwen Shen and Shangkun Wu and Jiwei Wei and Shiyuan He and Yang Yang and Xiaohai Xu and Ke Ma and Qianqian Xu and Qingming Huang and Shi Lin and Xun Wang and Changting Lin and Meng Han and Yilei Jiang and Siqi Lai and Yaozhi Zheng and Yifei Song and Xiangyu Yue and Zonglei Jing and Tianyuan Zhang and Zhilei Zhu and Aishan Liu and Jiakai Wang and Siyuan Liang and Xianglong Kong and Hainan Li and Junjie Mu and Haotong Qin and Yue Yu and Lei Chen and Felix Juefei-Xu and Qing Guo and Xinyun Chen and Yew Soon Ong and Xianglong Liu and Dawn Song and Alan Yuille and Philip Torr and Dacheng Tao }, journal={arXiv preprint arXiv:2506.12430}, year={ 2025 } }