ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08485
  4. Cited By
Visual Instruction Tuning

Visual Instruction Tuning

17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
    SyDa
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Visual Instruction Tuning"

50 / 3,278 papers shown
Title
Can Large Vision Language Models Read Maps Like a Human?
Can Large Vision Language Models Read Maps Like a Human?
Shuo Xing
Zezhou Sun
Shuangyu Xie
Kaiyuan Chen
Yanjia Huang
Yuping Wang
Jiachen Li
Dezhen Song
Zhengzhong Tu
72
3
0
18 Mar 2025
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Xinyu Fang
Zheyu Chen
Kai Lan
Lixin Ma
Shengyuan Ding
...
Zicheng Zhang
Guofeng Zhang
Haodong Duan
K. Chen
Dahua Lin
MLLM
68
1
0
18 Mar 2025
The Power of Context: How Multimodality Improves Image Super-Resolution
The Power of Context: How Multimodality Improves Image Super-Resolution
Kangfu Mei
Hossein Talebi
Mojtaba Ardakani
Vishal M. Patel
P. Milanfar
M. Delbracio
DiffM
90
1
0
18 Mar 2025
Tracking Meets Large Multimodal Models for Driving Scenario Understanding
Tracking Meets Large Multimodal Models for Driving Scenario Understanding
Ayesha Ishaq
Jean Lahoud
Fahad Shahbaz Khan
Salman Khan
Hisham Cholakkal
Rao Muhammad Anwer
59
0
0
18 Mar 2025
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
Donggon Jang
Yucheol Cho
Suin Lee
Taehyeon Kim
Dae-Shik Kim
VLM
65
1
0
18 Mar 2025
EIAD: Explainable Industrial Anomaly Detection Via Multi-Modal Large Language Models
EIAD: Explainable Industrial Anomaly Detection Via Multi-Modal Large Language Models
Zongyun Zhang
Jiacheng Ruan
Xian Gao
Ting Liu
Yuzhuo Fu
75
2
0
18 Mar 2025
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Yuxiang Lai
Shitian Zhao
Ming Li
Jike Zhong
Xiaofeng Yang
OffRL
LRM
LM&MA
VLM
81
11
0
18 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Junjie Bai
Prithvijit Chattopadhyay
Huayu Chen
...
Xiaodong Yang
Zhuolin Yang
Jingyang Zhang
Xiaohui Zeng
Zhe Zhang
AI4CE
LM&Ro
LRM
70
5
0
18 Mar 2025
ExDDV: A New Dataset for Explainable Deepfake Detection in Video
ExDDV: A New Dataset for Explainable Deepfake Detection in Video
Vlad Hondru
Eduard Hogea
Darian M. Onchis
Radu Tudor Ionescu
65
1
0
18 Mar 2025
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
Seungwon Lim
Sungwoong Kim
Jihwan Yu
Sungjae Lee
Jiwan Chung
Youngjae Yu
76
1
0
18 Mar 2025
Where do Large Vision-Language Models Look at when Answering Questions?
Where do Large Vision-Language Models Look at when Answering Questions?
X. Xing
Chia-Wen Kuo
Li Fuxin
Yulei Niu
Fan Chen
Ming Li
Ying Wu
Longyin Wen
Sijie Zhu
LRM
62
0
0
18 Mar 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Jing Zhang
65
0
0
18 Mar 2025
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Wei Song
Yansen Wang
Zijia Song
Yadong Li
Haoze Sun
Xin Wu
Zenan Zhou
Jianhua Xu
Jiaqi Wang
Kaicheng Yu
60
3
0
18 Mar 2025
Impossible Videos
Impossible Videos
Zechen Bai
Hai Ci
Mike Zheng Shou
EGVM
VGen
77
0
0
18 Mar 2025
LangDA: Building Context-Awareness via Language for Domain Adaptive Semantic Segmentation
LangDA: Building Context-Awareness via Language for Domain Adaptive Semantic Segmentation
Chang Liu
Bavesh Balaji
Saad Hossain
C Thomas
Kwei-Herng Lai
Raviteja Vemulapalli
Alexander Wong
Sirisha Rambhatla
51
0
0
17 Mar 2025
Concept-as-Tree: Synthetic Data is All You Need for VLM Personalization
Concept-as-Tree: Synthetic Data is All You Need for VLM Personalization
Ruichuan An
Kai Zeng
Ming Lu
Sihan Yang
Renrui Zhang
Huitong Ji
Qizhe Zhang
Yihao Luo
Hao Liang
Wentao Zhang
73
0
0
17 Mar 2025
Unified Autoregressive Visual Generation and Understanding with Continuous Tokens
Unified Autoregressive Visual Generation and Understanding with Continuous Tokens
Lijie Fan
Luming Tang
Siyang Qin
Tianhong Li
Xuan S. Yang
...
Tao Zhu
Michael Rubinstein
Michalis Raptis
Deqing Sun
Radu Soricut
60
5
0
17 Mar 2025
Quantum EigenGame for excited state calculation
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
53
1
0
17 Mar 2025
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Hao Yin
Guangzong Si
Zilei Wang
61
0
0
17 Mar 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Xinyu Ma
Ziyang Ding
Zhicong Luo
Chong Chen
Zonghao Guo
Derek F. Wong
Xiaoyi Feng
Maosong Sun
VLM
LRM
76
2
0
17 Mar 2025
Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference
Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference
Cheng Yuan
Ziqiang Liu
Jiashu Lv
Jiawei Shao
Yufei Jiang
Jingyang Zhang
Xuelong Li
52
1
0
17 Mar 2025
Grounded Chain-of-Thought for Multimodal Large Language Models
Grounded Chain-of-Thought for Multimodal Large Language Models
Qiong Wu
Xiangcong Yang
Yiyi Zhou
Chenxin Fang
Baiyang Song
Xiaoshuai Sun
Rongrong Ji
LRM
95
1
0
17 Mar 2025
Federated Continual Instruction Tuning
Federated Continual Instruction Tuning
Haiyang Guo
Fanhu Zeng
Fei Zhu
Wenzhuo Liu
Da-Han Wang
Jian Xu
Xu-Yao Zhang
Cheng-Lin Liu
CLL
FedML
70
1
0
17 Mar 2025
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data
Haozhe Si
Yuxuan Wan
Minh Do
Deepak Vasisht
Han Zhao
Hendrik Hamann
53
0
0
17 Mar 2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger
Nina Wenzel
David Griffiths
Haiming Gang
Justin Lazarow
...
Kai Kang
Marcin Eichner
Yue Yang
Afshin Dehghan
Peter Grasch
79
3
0
17 Mar 2025
ViSpeak: Visual Instruction Feedback in Streaming Videos
ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu
Q. Yang
Yuan-Ming Li
Yi-Xing Peng
Kun-Yu Lin
Xihan Wei
Jian-Fang Hu
Xiaohua Xie
Wei-Shi Zheng
VLM
69
1
0
17 Mar 2025
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models
Hao Yin
Guangzong Si
Zilei Wang
208
0
0
17 Mar 2025
PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior
PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior
Seanie Lee
Hwanhee Jung
Byoungsoo Koh
Qixing Huang
Sangho Yoon
Sangpil Kim
54
0
0
17 Mar 2025
UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation
UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation
Yinqiao Wang
Hao Xu
Pheng Ann Heng
Chi-Wing Fu
3DH
58
0
0
17 Mar 2025
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu
Siyuan Meng
Yanting Gao
Song Mao
Pinlong Cai
Guohang Yan
Yirong Chen
Zilin Bian
Botian Shi
Ding Wang
57
1
0
17 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
Huanjin Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
65
25
0
17 Mar 2025
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song
Xiaoye Qu
Jiawei Zhou
Yu Cheng
VLM
70
1
0
17 Mar 2025
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
Yingyue Li
Bencheng Liao
Wenyu Liu
Xinggang Wang
Mamba
66
0
0
17 Mar 2025
Web Artifact Attacks Disrupt Vision Language Models
Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem
Piotr Teterwak
Kate Saenko
Bryan A. Plummer
AAML
85
0
0
17 Mar 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Yong-Jin Liu
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&Ro
LRM
174
0
0
17 Mar 2025
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang
Changxu Cheng
Lingfeng Wang
Senda Chen
Wuyue Zhao
VLM
74
0
0
17 Mar 2025
Historic Scripts to Modern Vision: A Novel Dataset and A VLM Framework for Transliteration of Modi Script to Devanagari
Historic Scripts to Modern Vision: A Novel Dataset and A VLM Framework for Transliteration of Modi Script to Devanagari
Harshal Kausadikar
Tanvi Kale
Onkar Susladkar
Sparsh Mittal
60
0
0
17 Mar 2025
One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation
One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation
Daniil Selikhanovych
David Li
Aleksei Leonov
Nikita Gushchin
Sergei Kushneriuk
Alexander N. Filippov
Evgeny Burnaev
Iaroslav Koshelev
Alexander Korotin
DiffM
68
0
0
17 Mar 2025
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Tianle Li
Yongming Rao
Winston Hu
Yu Cheng
MLLM
70
0
0
16 Mar 2025
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills
Haoqi Yuan
Yu Bai
Yuhui Fu
Bohan Zhou
Yicheng Feng
Xinrun Xu
Yi Zhan
Börje F. Karlsson
Zongqing Lu
LM&Ro
90
0
0
16 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yansen Wang
Shengqiong Wu
Yuyao Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
95
11
0
16 Mar 2025
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Tianyuan Qu
Longxiang Tang
Bohao Peng
Senqiao Yang
Bei Yu
Jiaya Jia
VLM
266
0
0
16 Mar 2025
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders
Qing Li
Jiahui Geng
Derui Zhu
Fengyu Cai
Chenyang Lyu
Fakhri Karray
MU
60
0
0
16 Mar 2025
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
Yunze Liu
Peiran Wu
C. Liang
Junxiao Shen
Limin Wang
Li Yi
Mamba
59
0
0
16 Mar 2025
Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models
Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models
Junjie Chen
X. Liu
Subin Huang
Linfeng Zhang
Hang Yu
65
0
0
15 Mar 2025
TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification
TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification
Ans Munir
Faisal Z. Qureshi
M. H. Khan
Mohsen Ali
VLM
70
0
0
15 Mar 2025
Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models
Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models
Averi Bates
Ryan Vavricka
Shane Carleton
Ruosi Shao
Chongle Pan
61
0
0
15 Mar 2025
Hyperbolic Safety-Aware Vision-Language Models
Hyperbolic Safety-Aware Vision-Language Models
Tobia Poppi
Tejaswi Kasarla
Pascal Mettes
Lorenzo Baraldi
Rita Cucchiara
VLM
MU
68
0
0
15 Mar 2025
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
Leqi Shen
Guoqiang Gong
Tao He
Yifeng Zhang
Pengzhang Liu
Sicheng Zhao
Guiguang Ding
VLM
77
0
0
14 Mar 2025
Similarity-Aware Token Pruning: Your VLM but Faster
Ahmadreza Jeddi
Negin Baghbanzadeh
Elham Dolatabadi
Babak Taati
3DV
VLM
61
1
0
14 Mar 2025
Previous
123...101112...646566
Next