5
0

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness

Zixin Chen
Hongzhan Lin
Kaixin Li
Ziyang Luo
Zhen Ye
Guang Chen
Zhiyong Huang
Jing Ma
Main:8 Pages
21 Figures
Bibliography:3 Pages
11 Tables
Appendix:9 Pages
Abstract

The proliferation of multimodal memes in the social media era demands that multimodal Large Language Models (mLLMs) effectively understand meme harmfulness. Existing benchmarks for assessing mLLMs on harmful meme understanding rely on accuracy-based, model-agnostic evaluations using static datasets. These benchmarks are limited in their ability to provide up-to-date and thorough assessments, as online memes evolve dynamically. To address this, we propose AdamMeme, a flexible, agent-based evaluation framework that adaptively probes the reasoning capabilities of mLLMs in deciphering meme harmfulness. Through multi-agent collaboration, AdamMeme provides comprehensive evaluations by iteratively updating the meme data with challenging samples, thereby exposing specific limitations in how mLLMs interpret harmfulness. Extensive experiments show that our framework systematically reveals the varying performance of different target mLLMs, offering in-depth, fine-grained analyses of model-specific weaknesses. Our code is available atthis https URL.

View on arXiv
@article{chen2025_2507.01702,
  title={ AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness },
  author={ Zixin Chen and Hongzhan Lin and Kaixin Li and Ziyang Luo and Zhen Ye and Guang Chen and Zhiyong Huang and Jing Ma },
  journal={arXiv preprint arXiv:2507.01702},
  year={ 2025 }
}
Comments on this paper