ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08485
  4. Cited By
Visual Instruction Tuning

Visual Instruction Tuning

17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
    SyDa
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Visual Instruction Tuning"

50 / 3,253 papers shown
Title
VLC Fusion: Vision-Language Conditioned Sensor Fusion for Robust Object Detection
VLC Fusion: Vision-Language Conditioned Sensor Fusion for Robust Object Detection
Aditya Taparia
Noel Ngu
Mario Leiva
Joshua Shay Kricheli
John Corcoran
Nathaniel D. Bastian
Gerardo Simari
Paulo Shakarian
Ransalu Senanayake
ObjD
7
0
0
19 May 2025
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
Zihua Wang
Ruibo Li
Haozhe Du
Joey Tianyi Zhou
Yu Zhang
Xu Yang
MLLM
17
0
0
19 May 2025
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Yunseok Jang
Yeda Song
Sungryull Sohn
Lajanugen Logeswaran
Tiange Luo
Dong-Ki Kim
Kyunghoon Bae
Honglak Lee
VGen
4
0
0
19 May 2025
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
Haiquan Wen
Yiwei He
Zhenglin Huang
Tianxiao Li
Zihan YU
Xingru Huang
Lu Qi
Baoyuan Wu
Xuelong Li
Guangliang Cheng
VGen
9
0
0
19 May 2025
AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning
AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning
Kai Zhang
Xingyu Chen
Xiaofeng Zhang
12
0
0
19 May 2025
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
Thong Nguyen
Zhiyuan Hu
Xu Lin
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
14
0
0
19 May 2025
BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation
BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation
Wenqi Lyu
Zerui Li
Yanyuan Qiao
Qi Wu
AAML
7
0
0
18 May 2025
NeuroGen: Neural Network Parameter Generation via Large Language Models
NeuroGen: Neural Network Parameter Generation via Large Language Models
Jiaqi Wang
Yusen Zhang
Xi Li
7
0
0
18 May 2025
Harnessing the Universal Geometry of Embeddings
Harnessing the Universal Geometry of Embeddings
Rishi Jha
Collin Zhang
Vitaly Shmatikov
John X. Morris
7
0
0
18 May 2025
CompBench: Benchmarking Complex Instruction-guided Image Editing
CompBench: Benchmarking Complex Instruction-guided Image Editing
Bohan Jia
Wenxuan Huang
Yuntian Tang
Junbo Qiao
Jincheng Liao
...
Lin Chen
Fei Zhao
Zihan Wang
Yuan Xie
Shaohui Lin
CoGe
12
0
0
18 May 2025
Mitigating Hallucinations via Inter-Layer Consistency Aggregation in Large Vision-Language Models
Mitigating Hallucinations via Inter-Layer Consistency Aggregation in Large Vision-Language Models
Kai Tang
Jinhao You
Xiuqi Ge
Hanze Li
Yichen Guo
Xiande Huang
MLLM
4
0
0
18 May 2025
PRETI: Patient-Aware Retinal Foundation Model via Metadata-Guided Representation Learning
PRETI: Patient-Aware Retinal Foundation Model via Metadata-Guided Representation Learning
Yeonkyung Lee
Woojung Han
Youngjun Jun
Hyeonmin Kim
Jungkyung Cho
Seong Jae Hwang
MedIm
11
0
0
18 May 2025
Spatial-LLaVA: Enhancing Large Language Models with Spatial Referring Expressions for Visual Understanding
Spatial-LLaVA: Enhancing Large Language Models with Spatial Referring Expressions for Visual Understanding
Xuefei Sun
Doncey Albin
Cecilia Mauceri
Dusty Woods
Christoffer Heckman
LRM
2
0
0
18 May 2025
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference
Yichen Guo
Hanze Li
Zonghao Zhang
Jinhao You
Kai Tang
Xiande Huang
VLM
9
0
0
18 May 2025
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu
Ming Ma
Xiaomin Yu
Pengxiang Ding
Han Zhao
Mingyang Sun
Siteng Huang
Donglin Wang
LRM
25
0
0
18 May 2025
Are vision language models robust to uncertain inputs?
Are vision language models robust to uncertain inputs?
Xi Wang
Eric Nalisnick
AAML
VLM
8
0
0
17 May 2025
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation
Teli Ma
Jia Zheng
Zifan Wang
Ziyao Gao
Jiaming Zhou
Junwei Liang
4
0
0
17 May 2025
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings
Jiajun Qin
Yuan Pu
Zhuolun He
Seunggeun Kim
David Z. Pan
Bei Yu
12
0
0
17 May 2025
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu
Zekun Li
Zheqi He
Peipei Li
Shuhan Xia
Xing Cui
Huaibo Huang
Xi Yang
Ran He
EGVM
AAML
28
0
0
17 May 2025
FIGhost: Fluorescent Ink-based Stealthy and Flexible Backdoor Attacks on Physical Traffic Sign Recognition
FIGhost: Fluorescent Ink-based Stealthy and Flexible Backdoor Attacks on Physical Traffic Sign Recognition
Shuai Yuan
Guowen Xu
Hongwei Li
Rui Zhang
Xinyuan Qian
Wenbo Jiang
Hangcheng Cao
Qingchuan Zhao
AAML
21
0
0
17 May 2025
PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging
PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging
Quoc-Huy Trinh
Minh-Van Nguyen
Jung Peng
Ulas Bagci
Debesh Jha
12
0
0
17 May 2025
Top-Down Compression: Revisit Efficient Vision Token Projection for Visual Instruction Tuning
Top-Down Compression: Revisit Efficient Vision Token Projection for Visual Instruction Tuning
Bonan li
Zicheng Zhang
Songhua Liu
Weihao Yu
Xinchao Wang
VLM
9
0
0
17 May 2025
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLM
LRM
25
0
0
17 May 2025
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
Jiarui Wang
Huiyu Duan
Ziheng Jia
Yu Zhao
Woo Yi Yang
...
Z. Chen
Juntong Wang
Yuke Xing
Guangtao Zhai
Xiongkuo Min
VGen
17
0
0
17 May 2025
iSegMan: Interactive Segment-and-Manipulate 3D Gaussians
iSegMan: Interactive Segment-and-Manipulate 3D Gaussians
Yian Zhao
Wanshi Xu
Ruochong Zheng
Pengchong Qiao
Chang Liu
Jie Chen
3DGS
7
0
0
17 May 2025
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Wenchuan Zhang
Penghao Zhang
Jingru Guo
Tao Cheng
Jie Chen
Shuwan Zhang
Zhang Zhang
Yuhao Yi
Hong Bu
AI4TS
LRM
19
0
0
16 May 2025
Search-TTA: A Multimodal Test-Time Adaptation Framework for Visual Search in the Wild
Search-TTA: A Multimodal Test-Time Adaptation Framework for Visual Search in the Wild
Derek Ming Siang Tan
Shailesh
Boyang Liu
Alok Raj
Qi Xuan Ang
...
Tanishq Duhan
Jimmy Chiun
Yuhong Cao
Florian Shkurti
Guillaume Sartoretti
22
0
0
16 May 2025
Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
Yansheng Qiu
Li Xiao
Zhaopan Xu
Pengfei Zhou
Zheng Wang
Kaipeng Zhang
ELM
LRM
17
0
0
16 May 2025
Memory-Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
Memory-Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
Fei Wu
Jia Hu
Geyong Min
Shiqiang Wang
22
0
0
16 May 2025
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision
Alexey Magay
Dhurba Tripathi
Yu Hao
Yi Fang
17
0
0
16 May 2025
Extracting Explainable Dates From Medical Images By Reverse-Engineering UNIX Timestamps
Extracting Explainable Dates From Medical Images By Reverse-Engineering UNIX Timestamps
Lee Harris
James Bentham
Philippe De Wilde
MedIm
17
0
0
16 May 2025
Unifying Segment Anything in Microscopy with Multimodal Large Language Model
Unifying Segment Anything in Microscopy with Multimodal Large Language Model
Manyu Li
Ruian He
Zixian Zhang
Weimin Tan
Bo Yan
VLM
12
0
0
16 May 2025
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation
Zihan Wang
Seungjun Lee
Gim Hee Lee
VGen
12
0
0
16 May 2025
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Yong-Jin Liu
Shengfang Zhai
Mingzhe Du
Yulin Chen
Tri Cao
...
Xuzhao Li
Kun Wang
Junfeng Fang
Jiaheng Zhang
Bryan Hooi
OffRL
LRM
16
0
0
16 May 2025
Unveiling the Potential of Vision-Language-Action Models with Open-Ended Multimodal Instructions
Unveiling the Potential of Vision-Language-Action Models with Open-Ended Multimodal Instructions
Wei Zhao
Gongsheng Li
Zhefei Gong
Pengxiang Ding
Han Zhao
Donglin Wang
LM&Ro
22
0
0
16 May 2025
Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
Yiwen Liu
Jessica Bader
Jae Myung Kim
DiffM
18
0
0
15 May 2025
Sage Deer: A Super-Aligned Driving Generalist Is Your Copilot
Sage Deer: A Super-Aligned Driving Generalist Is Your Copilot
Hao Lu
Jiaqi Tang
Jiyao Wang
Yaojie Lu
Xu Cao
...
Bin Huang
Dengbo He
Shuiguang Deng
Hao Chen
Ying-Cong Chen
35
0
0
15 May 2025
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
Mugilan Ganesan
Shri Kiran Srinivasan
Ankur Aggarwal
Nish Sinnadurai
Sean Lie
Vithursan Thangarasa
VLM
27
0
0
15 May 2025
Multi-Token Prediction Needs Registers
Multi-Token Prediction Needs Registers
Anastasios Gerontopoulos
Spyros Gidaris
N. Komodakis
24
0
0
15 May 2025
Task-Core Memory Management and Consolidation for Long-term Continual Learning
Task-Core Memory Management and Consolidation for Long-term Continual Learning
Tianyu Huai
Jie Zhou
Yuxuan Cai
Qin Chen
Wen Wu
Xingjiao Wu
Xipeng Qiu
Liang He
CLL
33
0
0
15 May 2025
Cross-Image Contrastive Decoding: Precise, Lossless Suppression of Language Priors in Large Vision-Language Models
Cross-Image Contrastive Decoding: Precise, Lossless Suppression of Language Priors in Large Vision-Language Models
Jianfei Zhao
Feng Zhang
Xingchen Sun
Chong Feng
MLLM
28
0
0
15 May 2025
ChronoSteer: Bridging Large Language Model and Time Series Foundation Model via Synthetic Data
ChronoSteer: Bridging Large Language Model and Time Series Foundation Model via Synthetic Data
Chengsen Wang
Qi Qi
Zhongwen Rao
Lujia Pan
Jingyu Wang
Jianxin Liao
AI4TS
24
0
0
15 May 2025
Zero-shot Quantization: A Comprehensive Survey
Zero-shot Quantization: A Comprehensive Survey
Minjun Kim
Jaehyeon Choi
Jongkeun Lee
Wonjin Cho
U. Kang
MQ
23
0
0
14 May 2025
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation
Enyu Zhao
Vedant Raval
Hejia Zhang
Jiageng Mao
Zeyu Shangguan
Stefanos Nikolaidis
Yishuo Wang
Daniel Seita
LM&Ro
CoGe
48
0
0
14 May 2025
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
Elodie Germani
Selin Türk Ilayda
Zeineddine Fatima
Mourad Charbel
Shadi Albarqouni
AI4CE
27
0
0
14 May 2025
Air-Ground Collaboration for Language-Specified Missions in Unknown Environments
Air-Ground Collaboration for Language-Specified Missions in Unknown Environments
Fernando Cladera
Zachary Ravichandran
Jason Hughes
Varun Murali
Carlos Nieto-Granda
M. Hsieh
George J. Pappas
Camillo J Taylor
Vijay Kumar
39
1
0
14 May 2025
Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware
Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware
Justin Yu
Letian Fu
Huang Huang
Karim El-Refai
Rares Andrei Ambrus
Richard Cheng
Muhammad Zubair Irshad
Ken Goldberg
26
0
0
14 May 2025
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Zhaochen Su
Linjie Li
Mingyang Song
Yunzhuo Hao
Zhengyuan Yang
...
Guanjie Chen
Jiawei Gu
Juntao Li
Xiaoye Qu
Yu Cheng
OffRL
LRM
31
0
0
13 May 2025
CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding
CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding
Wenxuan Ma
Xiaoge Cao
Yujie Zhang
Chaofan Zhang
Shaobo Yang
Peng Hao
Bin Fang
Yinghao Cai
Shaowei Cui
Shuo Wang
36
0
0
13 May 2025
ORACLE-Grasp: Zero-Shot Task-Oriented Robotic Grasping using Large Multimodal Models
ORACLE-Grasp: Zero-Shot Task-Oriented Robotic Grasping using Large Multimodal Models
Avihai Giuili
Rotem Atari
A. Sintov
VLM
32
0
0
13 May 2025
1234...646566
Next