Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.11411
Cited By
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
18 February 2024
Yiyang Zhou
Chenhang Cui
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (85★)
Papers citing
"Aligning Modalities in Vision Large Language Models via Preference Fine-tuning"
35 / 85 papers shown
Title
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Haodong Duan
Zeang Sheng
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
108
12
0
23 Oct 2024
Modality-Fair Preference Optimization for Trustworthy MLLM Alignment
Songtao Jiang
Yan Zhang
Ruizhe Chen
Yeying Jin
Zuozhu Liu
Qinglin He
Yang Feng
Jian Wu
Zuozhu Liu
MoE
MLLM
100
12
0
20 Oct 2024
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Chenhang Cui
An Zhang
Yiyang Zhou
Zhaorun Chen
Gelei Deng
Huaxiu Yao
Tat-Seng Chua
212
8
0
18 Oct 2024
CREAM: Consistency Regularized Self-Rewarding Language Models
Zhaoxiang Wang
Weilei He
Zhiyuan Liang
Xuchao Zhang
Chetan Bansal
Ying Wei
Weitong Zhang
Huaxiu Yao
ALM
196
12
0
16 Oct 2024
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
Lei Li
Zhihui Xie
Mukai Li
Shunian Chen
Peiyi Wang
L. Chen
Yazheng Yang
Benyou Wang
Dianbo Sui
Qiang Liu
VLM
ALM
102
29
0
12 Oct 2024
Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models
Qin Liu
Chao Shang
Ling Liu
Nikolaos Pappas
Jie Ma
Neha Anna John
Srikanth Doss Kadarundalagi Raghuram Doss
Lluís Marquez
Miguel Ballesteros
Yassine Benajiba
103
9
0
11 Oct 2024
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
Yang Bai
Yang Zhou
Jun Zhou
Rick Siow Mong Goh
Daniel Ting
Yong Liu
VLM
79
1
0
09 Oct 2024
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
Jiayi He
Hehai Lin
Q. Wang
Yi R. Fung
Chenhui Xu
ReLM
LRM
218
7
0
05 Oct 2024
Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models
Yufang Liu
Tao Ji
Changzhi Sun
Yuanbin Wu
Aimin Zhou
VLM
MLLM
90
3
0
04 Oct 2024
LLaVA-Critic: Learning to Evaluate Multimodal Models
Tianyi Xiong
Xinze Wang
Dong Guo
Qinghao Ye
Haoqi Fan
Quanquan Gu
Heng Huang
Chunyuan Li
MLLM
VLM
LRM
139
53
0
03 Oct 2024
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
Fangxun Shu
Yue Liao
Le Zhuo
Chenning Xu
Guanghao Zhang
...
Bolin Li
Zhelun Yu
Si Liu
Hongsheng Li
Hao Jiang
VLM
MoE
72
18
0
28 Aug 2024
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Murun Yang
...
Chunliang Zhang
Tongran Liu
Quan Du
Di Yang
Jingbo Zhu
VLM
173
6
0
22 Aug 2024
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Peng Xia
Kangyu Zhu
Haoran Li
Hongtu Zhu
Yun Li
Gang Li
Linjun Zhang
Huaxiu Yao
MedIm
77
41
0
06 Jul 2024
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Zhaorun Chen
Yichao Du
Zichen Wen
Yiyang Zhou
Chenhang Cui
...
Jiawei Zhou
Zhuokai Zhao
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
EGVM
MLLM
117
35
0
05 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
144
117
0
03 Jul 2024
STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical
Guohao Sun
Can Qin
Huazhu Fu
Linwei Wang
Zhiqiang Tao
LM&MA
73
5
0
28 Jun 2024
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Fei Wang
Wenxuan Zhou
James Y. Huang
Nan Xu
Sheng Zhang
Hoifung Poon
Muhao Chen
118
28
0
17 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
228
33
0
17 Jun 2024
Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation
Oishi Banerjee
Hong-Yu Zhou
Subathra Adithan
Stephen Kwak
Kay Wu
Pranav Rajpurkar
MedIm
103
3
0
10 Jun 2024
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Peng Xia
Ze Chen
Juanxi Tian
Yangrui Gong
Ruibo Hou
...
Jimeng Sun
Zongyuan Ge
Gang Li
James Zou
Huaxiu Yao
MU
VLM
121
40
0
10 Jun 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Lin Chen
Xilin Wei
Jinsong Li
Xiaoyi Dong
Pan Zhang
...
Li Yuan
Yu Qiao
Dahua Lin
Feng Zhao
Jiaqi Wang
143
183
0
06 Jun 2024
Enhancing Large Vision Language Models with Self-Training on Image Comprehension
Yihe Deng
Pan Lu
Fan Yin
Ziniu Hu
Sheng Shen
James Zou
Kai-Wei Chang
Wei Wang
SyDa
VLM
LRM
100
46
0
30 May 2024
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Mustafa Shukor
Matthieu Cord
141
5
0
26 May 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Dinesh Manocha
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
203
38
0
24 May 2024
Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization
Beitao Chen
Xinyu Lyu
Lianli Gao
Jingkuan Song
Hengtao Shen
MLLM
189
12
0
24 May 2024
Calibrated Self-Rewarding Vision Language Models
Yiyang Zhou
Zhiyuan Fan
Dongjie Cheng
Sihan Yang
Zhaorun Chen
Chenhang Cui
Xiyao Wang
Yun Li
Linjun Zhang
Huaxiu Yao
VLM
143
34
0
23 May 2024
Automated Multi-level Preference for MLLMs
Mengxi Zhang
Wenhao Wu
Yu Lu
Yuxin Song
Kang Rong
...
Jianbo Zhao
Fanglong Liu
Yifan Sun
Haocheng Feng
Jingdong Wang
MLLM
125
15
0
18 May 2024
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
283
197
0
29 Apr 2024
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities
Xiaomin Yu
Yezhaohui Wang
Yanfang Chen
Zhen Tao
Dinghao Xi
Shichao Song
Pengnian Qi
Zhiyu Li
133
10
0
25 Apr 2024
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Zhuofan Zong
Bingqi Ma
Dazhong Shen
Guanglu Song
Hao Shao
Dongzhi Jiang
Hongsheng Li
Yu Liu
MoE
108
51
0
19 Apr 2024
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Liqiang Jing
Xinya Du
187
18
0
07 Apr 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Yuhang Zang
...
Haodong Duan
Jiaqi Wang
Yu Qiao
Dahua Lin
Feng Zhao
VLM
139
303
0
29 Mar 2024
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Zhaorun Chen
Zhuokai Zhao
Hongyin Luo
Huaxiu Yao
Bo Li
Jiawei Zhou
MLLM
121
75
0
01 Mar 2024
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models
Shangyu Xing
Fei Zhao
Zhen Wu
Tuo An
Weihao Chen
Chunhui Li
Jianbing Zhang
Xinyu Dai
MLLM
MU
125
5
0
15 Feb 2024
Multimodal Representation Learning by Alternating Unimodal Adaptation
Xiaohui Zhang
Jaehong Yoon
Mohit Bansal
Huaxiu Yao
118
26
0
17 Nov 2023
Previous
1
2