ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.08485
  4. Cited By
Visual Instruction Tuning

Visual Instruction Tuning

17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
    SyDa
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Visual Instruction Tuning"

50 / 3,278 papers shown
Title
Contrastive Visual Data Augmentation
Contrastive Visual Data Augmentation
Yu Zhou
B. Li
Mohan Tang
Xiaomeng Jin
Te-Lin Wu
Kuan-Hao Huang
Heng Ji
Kai-Wei Chang
Nanyun Peng
64
0
0
24 Feb 2025
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
Xi Jiang
Jian Li
Hanqiu Deng
Yue Liu
Bin-Bin Gao
Yifeng Zhou
Jialin Li
Chengjie Wang
Feng Zheng
63
2
0
24 Feb 2025
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
Florent Bartoccioni
Elias Ramzi
Victor Besnier
Shashanka Venkataramanan
Tuan-Hung Vu
...
Mickael Chen
Éloi Zablocki
Andrei Bursuc
Eduardo Valle
Matthieu Cord
VGen
88
1
0
24 Feb 2025
Disentangling Visual Transformers: Patch-level Interpretability for Image Classification
Disentangling Visual Transformers: Patch-level Interpretability for Image Classification
Guillaume Jeanneret
Loïc Simon
F. Jurie
ViT
66
0
0
24 Feb 2025
Introducing Visual Perception Token into Multimodal Large Language Model
Introducing Visual Perception Token into Multimodal Large Language Model
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLM
LRM
89
0
0
24 Feb 2025
Game State and Spatio-temporal Action Detection in Soccer using Graph Neural Networks and 3D Convolutional Networks
Game State and Spatio-temporal Action Detection in Soccer using Graph Neural Networks and 3D Convolutional Networks
Jeremie Ochin
Guillaume Devineau
Bogdan Stanciulescu
Sotiris Manitsaris
3DPC
82
1
0
24 Feb 2025
TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Xiuwei Chen
Sihao Lin
Xiao Dong
Zhenpeng Chen
Meng Cao
Jiawei Han
Hang Xu
Xiaodan Liang
Mamba
68
0
0
24 Feb 2025
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Nidhal Jegham
Marwan Abdelatti
Abdeltawab Hendawi
VLM
LRM
60
1
0
23 Feb 2025
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Yubo Wang
Jianting Tang
Chaohu Liu
Linli Xu
AAML
71
1
0
23 Feb 2025
MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering
MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering
Caixiong Li
Xiongwei Zhao
Jinhang Zhang
Xing Zhang
Qihao Sun
Zhou Wu
ObjD
MLLM
VLM
61
0
0
23 Feb 2025
Model-agnostic Coreset Selection via LLM-based Concept Bottlenecks
Akshay Mehra
Trisha Mittal
Subhadra Gopalakrishnan
Joshua Kimball
50
0
0
23 Feb 2025
Audio-FLAN: A Preliminary Release
Audio-FLAN: A Preliminary Release
Liumeng Xue
Ziya Zhou
J. Pan
Zhiyu Li
Shuai Fan
...
Haohe Liu
Emmanouil Benetos
Ge Zhang
Yike Guo
Wei Xue
MLLM
AuLLM
CLIP
VLM
57
1
0
23 Feb 2025
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
Hantao Lou
Changye Li
Yalan Qin
Yaodong Yang
53
1
0
22 Feb 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang
Haifeng Huang
Yuzhang Shang
Mubarak Shah
Yan Yan
56
7
0
21 Feb 2025
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
276
0
0
21 Feb 2025
Chitrarth: Bridging Vision and Language for a Billion People
Chitrarth: Bridging Vision and Language for a Billion People
Shaharukh Khan
Ayush Tarun
Abhinav Ravi
Ali Faraz
Akshat Patidar
Praveen Kumar Pokala
Anagha Bhangare
Raja Kolla
Chandra Khatri
Shubham Agarwal
VLM
133
1
0
21 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
82
8
0
21 Feb 2025
PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation
PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation
Ziyan Wang
Sizhe Wei
Xiaoming Huo
Hao Wang
DiffM
107
0
0
20 Feb 2025
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Yi-Fan Zhang
Hang Li
D. Song
Lichao Sun
Tianlong Xu
Qingsong Wen
LLMAG
LRM
98
2
0
20 Feb 2025
Contrastive Localized Language-Image Pre-Training
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
Hao Zhang
Xuben Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Yue Yang
Zhe Gan
CLIP
VLM
68
7
0
20 Feb 2025
A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models
A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models
Mengyang Sun
Yihao Wang
Tao Feng
Dan Zhang
Yifan Zhu
J. Tang
MoE
48
0
0
20 Feb 2025
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Henry Hengyuan Zhao
Wenqi Pei
Yifei Tao
Haiyang Mei
Mike Zheng Shou
56
0
0
20 Feb 2025
Object-centric Binding in Contrastive Language-Image Pretraining
Object-centric Binding in Contrastive Language-Image Pretraining
Rim Assouel
Pietro Astolfi
Florian Bordes
M. Drozdzal
Adriana Romero Soriano
OCL
VLM
CoGe
108
0
0
19 Feb 2025
InsightVision: A Comprehensive, Multi-Level Chinese-based Benchmark for Evaluating Implicit Visual Semantics in Large Vision Language Models
InsightVision: A Comprehensive, Multi-Level Chinese-based Benchmark for Evaluating Implicit Visual Semantics in Large Vision Language Models
Xiaofei Yin
Y. Hong
Ya Guo
Yi Tu
Weiqiang Wang
Gongshen Liu
Huijia Zhu
VLM
67
0
0
19 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Zhiyu Li
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Yansen Wang
51
0
0
19 Feb 2025
What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
Zhihang Liu
Chen-Wei Xie
Bin Wen
Feiwu Yu
Jixuan Chen
...
Pandeng Li
Yun Zheng
Hongtao Xie
Yun Zheng
Hongtao Xie
VLM
CoGe
105
0
0
19 Feb 2025
Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Mengqi He
Jing Zhang
VLM
53
0
0
19 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
124
9
0
18 Feb 2025
Understanding and Rectifying Safety Perception Distortion in VLMs
Understanding and Rectifying Safety Perception Distortion in VLMs
Xiaohan Zou
Jian Kang
George Kesidis
Lu Lin
281
1
0
18 Feb 2025
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
Zhiyuan Liu
Yanchen Luo
Han Huang
Enzhi Zhang
Sihang Li
Sihang Li
Yaorui Shi
Xuben Wang
Kenji Kawaguchi
Tat-Seng Chua
102
3
0
18 Feb 2025
MatterChat: A Multi-Modal LLM for Material Science
MatterChat: A Multi-Modal LLM for Material Science
Yingheng Tang
Wenbin Xu
Jie Cao
Jianzhu Ma
Weilu Gao
Steve Farrell
Benjamin Erichson
Michael W. Mahoney
Andy Nonaka
113
3
0
18 Feb 2025
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Zekun Qi
Wenyao Zhang
Yufei Ding
Runpei Dong
Xinqiang Yu
...
Xin Jin
Kaisheng Ma
Zhizheng Zhang
He Wang
Li Yi
LM&Ro
133
4
0
18 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jun Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
73
1
0
18 Feb 2025
Pre-training Auto-regressive Robotic Models with 4D Representations
Pre-training Auto-regressive Robotic Models with 4D Representations
Dantong Niu
Yuvan Sharma
Haoru Xue
Giscard Biamby
Junyi Zhang
Ziteng Ji
Trevor Darrell
Roei Herzig
83
1
0
18 Feb 2025
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
Yuqi Pang
Bowen Yang
Haoqin Tu
Yun Cao
Zeyu Zhang
LRM
MLLM
66
0
0
17 Feb 2025
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
Kun Wu
Chengkai Hou
Jiaming Liu
Zhengping Che
Xiaozhu Ju
...
Zhenyu Wang
Pengju An
Siyuan Qian
Shanghang Zhang
Jian Tang
LM&Ro
124
14
0
17 Feb 2025
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
Yi Fang
Bowen Jin
Jiacheng Shen
Sirui Ding
Qiaoyu Tan
Jiawei Han
61
1
0
17 Feb 2025
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
L. Yang
Xinchen Zhang
Ye Tian
Chenming Shang
Minghao Xu
Wentao Zhang
Tengjiao Wang
102
1
0
17 Feb 2025
Unhackable Temporal Rewarding for Scalable Video MLLMs
Unhackable Temporal Rewarding for Scalable Video MLLMs
En Yu
Kangheng Lin
Liang Zhao
Yana Wei
Zining Zhu
...
Jianjian Sun
Zheng Ge
Xinsong Zhang
Jingyu Wang
Wenbing Tao
69
4
0
17 Feb 2025
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
Cristian Gutierrez
LRM
71
0
0
17 Feb 2025
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
Zeqing Wang
Wentao Wan
Qiqing Lao
Runmeng Chen
Minjie Lang
Keze Wang
Liang Lin
Liang Lin
LRM
109
3
0
17 Feb 2025
JExplore: Design Space Exploration Tool for Nvidia Jetson Boards
Basar Kutukcu
Sinan Xie
Sabur Baidya
Sujit Dey
47
0
0
16 Feb 2025
Multi-Faceted Multimodal Monosemanticity
Multi-Faceted Multimodal Monosemanticity
Hanqi Yan
Xiangxiang Cui
Lu Yin
Paul Pu Liang
Yulan He
Yifei Wang
48
0
0
16 Feb 2025
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Zuopeng Yang
Jiluan Fan
Anli Yan
Erdun Gao
Xin Lin
Tao Li
Kanghua mo
Changyu Dong
AAML
84
1
0
15 Feb 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision Team
Leonid Karlinsky
Assaf Arbelle
Abraham Daniels
A. Nassar
...
Sriram Raghavan
Tanveer Syeda-Mahmood
Peter W. J. Staar
Tal Drory
Rogerio Feris
VLM
AI4TS
121
0
0
14 Feb 2025
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Dongzhi Jiang
Renrui Zhang
Ziyu Guo
Yanwei Li
Yu Qi
...
Shen Yan
Bo Zhang
Chaoyou Fu
Peng Gao
Hongsheng Li
MLLM
LRM
98
23
0
13 Feb 2025
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
Mo Yu
Lemao Liu
J. Wu
Tsz Ting Chung
Shunchi Zhang
JiangNan Li
Dit-Yan Yeung
Jie Zhou
90
1
0
13 Feb 2025
From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine
From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine
Lukas Buess
Matthias Keicher
Nassir Navab
Andreas Maier
Soroosh Tayebi Arasteh
LM&MA
140
0
0
13 Feb 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras
Dimitrios Michail
Xiao Xiang Zhu
Begüm Demir
Ioannis Papoutsis
VLM
94
0
0
13 Feb 2025
Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches
Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches
D. Elbaz
Oren Salzman
OffRL
37
0
0
13 Feb 2025
Previous
123...151617...646566
Next