64
16

Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models

Zicheng Zhang
Haoning Wu
Zhongpeng Ji
Chunyi Li
Erli Zhang
Wei Sun
Xiaohong Liu
Xiongkuo Min
Fengyu Sun
Shangling Jui
Weisi Lin
Guangtao Zhai
Abstract

Recent advancements in Multi-modality Large Language Models (MLLMs) have demonstrated remarkable capabilities in complex high-level vision tasks. However, the exploration of MLLM potential in visual quality assessment, a vital aspect of low-level vision, remains limited. To address this gap, we introduce Q-Boost, a novel strategy designed to enhance low-level MLLMs in image quality assessment (IQA) and video quality assessment (VQA) tasks, which is structured around two pivotal components: 1) Triadic-Tone Integration: Ordinary prompt design simply oscillates between the binary extremes of positivepositive and negativenegative. Q-Boost innovates by incorporating a `middle ground' approach through neutralneutral prompts, allowing for a more balanced and detailed assessment. 2) Multi-Prompt Ensemble: Multiple quality-centric prompts are used to mitigate bias and acquire more accurate evaluation. The experimental results show that the low-level MLLMs exhibit outstanding zeros-shot performance on the IQA/VQA tasks equipped with the Q-Boost strategy.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.