Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.20977
Cited By
Evaluating and Steering Modality Preferences in Multimodal Large Language Model
27 May 2025
Yu Zhang
Jinlong Ma
Yongshuai Hou
Xuefeng Bai
Kehai Chen
Yang Xiang
Jun Yu
Min Zhang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Evaluating and Steering Modality Preferences in Multimodal Large Language Model"
42 / 42 papers shown
Title
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
Ziyi Liu
Yang Liu
53
1
0
21 Apr 2025
MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
P. Zhang
Xianqiang Gao
Yuhan Wu
Kehui Liu
Dong Wang
Zechuan Wang
Bin Zhao
Yan Ding
Xiaochen Li
LM&Ro
102
4
0
14 Mar 2025
Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning
Mufan Xu
Gewen Liang
Kehai Chen
Wei Wang
Xun Zhou
M. Yang
Tiejun Zhao
Min Zhang
RALM
123
3
0
07 Mar 2025
Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding
Kyungmin Min
Minbeom Kim
Kang-il Lee
Dongryeol Lee
Kyomin Jung
MLLM
146
7
0
20 Feb 2025
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
344
699
0
20 Feb 2025
BANER: Boundary-Aware LLMs for Few-Shot Named Entity Recognition
Quanjiang Guo
Yihong Dong
Ling Tian
Zhao Kang
Yu Zhang
Sijie Wang
113
3
0
03 Dec 2024
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen
Tianshu Zhang
Shijie Huang
Yuwei Niu
Linfeng Zhang
Lijie Wen
Xuming Hu
MLLM
VLM
464
6
0
22 Nov 2024
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
212
1,038
0
25 Oct 2024
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
141
28
0
15 Oct 2024
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Guanyu Zhou
Yibo Yan
Xin Zou
Kun Wang
Aiwei Liu
Xuming Hu
70
12
0
07 Oct 2024
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities
Kenza Amara
Lukas Klein
Carsten T. Lüth
Paul Jäger
Hendrik Strobelt
Mennatallah El-Assady
72
2
0
02 Oct 2024
Question-guided Knowledge Graph Re-scoring and Injection for Knowledge Graph Question Answering
Yu Zhang
Kehai Chen
Xuefeng Bai
zhao kang
Quanjiang Guo
Min Zhang
96
12
0
02 Oct 2024
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLM
MLLM
107
121
0
29 Aug 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
80
139
0
09 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLM
SyDa
VLM
124
867
0
06 Aug 2024
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
Sheridan Feucht
David Atkinson
Byron C. Wallace
David Bau
97
8
0
28 Jun 2024
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
Kang-il Lee
Minbeom Kim
Seunghyun Yoon
Minsung Kim
Dongryeol Lee
Hyukhun Koh
Kyomin Jung
CoGe
VLM
161
8
0
13 Jun 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu
Anette Frank
MLLM
CoGe
VLM
130
6
0
29 Apr 2024
Uncovering Safety Risks of Large Language Models through Concept Activation Vector
Zhihao Xu
Ruixuan Huang
Changyu Chen
Shuai Wang
Xiting Wang
LLMSV
70
25
0
18 Apr 2024
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations
Deqing Fu
Ghazal Khalighinejad
Ollie Liu
Bhuwan Dhingra
Dani Yogatama
Robin Jia
Willie Neiswanger
89
26
0
01 Apr 2024
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective
Meiqi Chen
Yixin Cao
Yan Zhang
Chaochao Lu
90
16
0
27 Mar 2024
Advancing Parameter Efficiency in Fine-tuning via Representation Editing
Muling Wu
Tianlong Li
Xiaohua Wang
Changze Lv
Changze Lv
Zixuan Ling
Jianhao Zhu
Cenyuan Zhang
Xiaoqing Zheng
Xuanjing Huang
64
25
0
23 Feb 2024
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Hongliang He
Wenlin Yao
Kaixin Ma
Wenhao Yu
Yong Dai
Hongming Zhang
Zhenzhong Lan
Dong Yu
LLMAG
132
151
0
25 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
260
1,215
0
21 Dec 2023
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt
Buck Shlegeris
Kshitij Sachan
Fabien Roger
74
54
0
12 Dec 2023
Steering Llama 2 via Contrastive Activation Addition
Nina Rimsky
Nick Gabrieli
Julian Schulz
Meg Tong
Evan Hubinger
Alexander Matt Turner
LLMSV
59
226
0
09 Dec 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Ziyi Lin
Chris Liu
Renrui Zhang
Peng Gao
Longtian Qiu
...
Siyuan Huang
Yichi Zhang
Xuming He
Hongsheng Li
Yu Qiao
MLLM
VLM
87
230
0
13 Nov 2023
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Y. Zou
117
117
0
11 Nov 2023
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
168
2,825
0
05 Oct 2023
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Lai Wei
Zihao Jiang
Weiran Huang
Lichao Sun
VLM
MLLM
91
60
0
23 Aug 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
133
607
0
23 Jun 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,748
0
15 Mar 2023
Towards Reasoning in Large Language Models: A Survey
Jie Huang
Kevin Chen-Chuan Chang
LM&MA
ELM
LRM
152
644
0
20 Dec 2022
MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks
Letitia Parcalabescu
Anette Frank
65
28
0
15 Dec 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
290
1,299
0
20 Sep 2022
Extracting Latent Steering Vectors from Pretrained Language Models
Nishant Subramani
Nivedita Suresh
Matthew E. Peters
LLMSV
80
101
0
10 May 2022
Multimodal Dialogue Response Generation
Qingfeng Sun
Yujing Wang
Can Xu
Kai Zheng
Yaming Yang
Huang Hu
Fei Xu
Jessica Zhang
Xiubo Geng
Daxin Jiang
97
49
0
16 Oct 2021
Vision Matters When It Should: Sanity Checking Multimodal Machine Translation Models
Jiaoda Li
Duygu Ataman
Rico Sennrich
61
33
0
08 Sep 2021
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Zhengxiao Du
Yujie Qian
Xiao Liu
Ming Ding
J. Qiu
Zhilin Yang
Jie Tang
BDL
AI4CE
151
1,554
0
18 Mar 2021
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
Afra Alishahi
Grzegorz Chrupała
Tal Linzen
NAI
MILM
65
65
0
05 Apr 2019
An Analysis of Visual Question Answering Algorithms
Kushal Kafle
Christopher Kanan
85
234
0
28 Mar 2017
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
434
43,832
0
01 May 2014
1