Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models

23 December 2023

Zicheng Zhang

Haoning Wu

Zhongpeng Ji

Chunyi Li

Erli Zhang

Wei Sun

Xiaohong Liu

Xiongkuo Min

Fengyu Sun

Shangling Jui

Weisi Lin

Guangtao Zhai

ArXiv PDF HTML

Abstract

Recent advancements in Multi-modality Large Language Models (MLLMs) have demonstrated remarkable capabilities in complex high-level vision tasks. However, the exploration of MLLM potential in visual quality assessment, a vital aspect of low-level vision, remains limited. To address this gap, we introduce Q-Boost, a novel strategy designed to enhance low-level MLLMs in image quality assessment (IQA) and video quality assessment (VQA) tasks, which is structured around two pivotal components: 1) Triadic-Tone Integration: Ordinary prompt design simply oscillates between the binary extremes of $positive$ and $negative$ . Q-Boost innovates by incorporating a `middle ground' approach through $neutral$ prompts, allowing for a more balanced and detailed assessment. 2) Multi-Prompt Ensemble: Multiple quality-centric prompts are used to mitigate bias and acquire more accurate evaluation. The experimental results show that the low-level MLLMs exhibit outstanding zeros-shot performance on the IQA/VQA tasks equipped with the Q-Boost strategy.

View on arXiv

Comments on this paper