Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.08485
Cited By
Visual Instruction Tuning
17 April 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Instruction Tuning"
50 / 3,327 papers shown
Title
EVQAScore: A Fine-grained Metric for Video Question Answering Data Quality Evaluation
Hao Liang
Zirong Chen
Wentao Zhang
Wentao Zhang
49
1
0
11 Nov 2024
Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models
Xiaojun Wu
Junxi Liu
Huanyi Su
Zhouchi Lin
Yiyan Qi
...
Fuwei Wang
Saizhuo Wang
Fengrui Hua
Jia Li
Jian Guo
59
0
0
09 Nov 2024
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework
Yew Ken Chia
Liying Cheng
Hou Pong Chan
Chaoqun Liu
Maojia Song
Sharifah Mahani Aljunied
Soujanya Poria
Lidong Bing
RALM
VLM
55
5
0
09 Nov 2024
Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models
Xiao Liu
Lijun Zhang
Deepak Ganesan
Hui Guan
VLM
43
0
0
08 Nov 2024
Towards Multi-Modal Mastery: A 4.5B Parameter Truly Multi-Modal Small Language Model
Ben Koska
Mojmír Horváth
MoE
47
1
0
08 Nov 2024
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
74
9
0
08 Nov 2024
RT-Grasp: Reasoning Tuning Robotic Grasping via Multi-modal Large Language Model
Jinxuan Xu
Shiyu Jin
Yutian Lei
Yuqian Zhang
Liangjun Zhang
LRM
48
7
0
07 Nov 2024
Analyzing The Language of Visual Tokens
David M. Chan
Rodolfo Corona
J. S. Park
Cheol Jun Cho
Yutong Bai
Trevor Darrell
31
2
0
07 Nov 2024
HourVideo: 1-Hour Video-Language Understanding
Keshigeyan Chandrasegaran
Agrim Gupta
Lea M. Hadzic
Taran Kota
Jimming He
Cristobal Eyzaguirre
Zane Durante
Manling Li
Jiajun Wu
L. Fei-Fei
VLM
61
35
0
07 Nov 2024
Explainable Search and Discovery of Visual Cultural Heritage Collections with Multimodal Large Language Models
T. Arnold
L. Tilton
68
0
0
07 Nov 2024
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Zhaofeng Wu
Xinyan Velocity Yu
Dani Yogatama
Jiasen Lu
Yoon Kim
AIFin
54
16
0
07 Nov 2024
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe
Hanan Gani
Wenqi Zhu
Jiale Cao
Eric P. Xing
Fahad Shahbaz Khan
Salman Khan
MLLM
VGen
VLM
57
6
0
07 Nov 2024
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
Jingwei Xu
Chenyu Wang
Zibo Zhao
Wen Liu
Yi-An Ma
Shenghua Gao
60
14
0
07 Nov 2024
Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs
Chengxin Hu
Hao Li
Yihe Yuan
Jing Li
Ivor Tsang
53
0
0
07 Nov 2024
ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models
Ashutosh Srivastava
Tarun Ram Menta
Abhinav Java
Avadhoot Jadhav
Silky Singh
Surgan Jandial
Balaji Krishnamurthy
DiffM
45
1
0
06 Nov 2024
Large Generative Model-assisted Talking-face Semantic Communication System
Feibo Jiang
Siwei Tu
Li Dong
Cunhua Pan
Jiangzhou Wang
Xiaohu You
41
2
0
06 Nov 2024
VLA-3D: A Dataset for 3D Semantic Scene Understanding and Navigation
Haochen Zhang
Nader Zantout
Pujith Kachana
Zongyuan Wu
Ji Zhang
Wenshan Wang
3DV
LM&Ro
54
6
0
05 Nov 2024
MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning
Ziliang Gan
Yu Lu
D. Zhang
Haohan Li
Che Liu
...
Haipang Wu
Chaoyou Fu
Z. Xu
Rongjunchen Zhang
Yong Dai
71
7
0
05 Nov 2024
Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey
Ao Fu
Yi Zhou
Tao Zhou
Yue Yang
Bojun Gao
Qun Li
Guobin Wu
Ling Shao
VGen
64
2
0
05 Nov 2024
Membership Inference Attacks against Large Vision-Language Models
Zhan Li
Yongtao Wu
Yihang Chen
F. Tonin
Elias Abad Rocamora
Volkan Cevher
54
4
0
05 Nov 2024
Label Critic: Design Data Before Models
P. R. Bassi
Qilong Wu
Wenxuan Li
S. Decherchi
Andrea Cavalli
Alan Yuille
Zongwei Zhou
53
0
0
05 Nov 2024
DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark
Haodong Li
Haicheng Qu
Xiaofeng Zhang
53
1
0
05 Nov 2024
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
Yunkai Dang
Mengxi Gao
Yibo Yan
Xin Zou
Yanggan Gu
Aiwei Liu
Xuming Hu
53
6
0
05 Nov 2024
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Yingzi Ma
Jiongxiao Wang
Fei Wang
Siyuan Ma
Jiazhao Li
...
B. Li
Yejin Choi
Mengzhao Chen
Chaowei Xiao
Chaowei Xiao
MU
63
6
0
05 Nov 2024
FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models
Zhanwei Zhang
Shizhao Sun
Wenxiao Wang
D. Cai
Jiang Bian
AI4CE
41
2
0
05 Nov 2024
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Yangning Li
Hai-Tao Zheng
Xinyu Wang
Yong Jiang
Zhen Zhang
...
Hui Wang
Hai-Tao Zheng
Pengjun Xie
Philip S. Yu
Fei Huang
73
18
0
05 Nov 2024
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
Maitreya Patel
Abhiram Kusumba
Sheng Cheng
Changhoon Kim
Tejas Gokhale
Chitta Baral
Yezhou Yang
CoGe
CLIP
82
8
0
04 Nov 2024
INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
Edward Vendrow
Omiros Pantazis
Alexander Shepard
Gabriel J. Brostow
Kate E. Jones
Oisin Mac Aodha
Sara Beery
Grant Van Horn
VLM
58
4
0
04 Nov 2024
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Yang Yue
Yulin Wang
Bingyi Kang
Yizeng Han
Shenzhi Wang
Shiji Song
Jiashi Feng
Gao Huang
VLM
57
16
0
04 Nov 2024
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Ruyang Liu
Haoran Tang
Haibo Liu
Yixiao Ge
Ying Shan
Chen Li
Jiankun Yang
VLM
58
6
0
04 Nov 2024
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
Sheng-Chieh Lin
Chankyu Lee
Mohammad Shoeybi
Jimmy J. Lin
Bryan Catanzaro
Ming-Yu Liu
89
13
0
04 Nov 2024
One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering
Deepayan Das
Davide Talon
Massimiliano Mancini
Yiming Wang
Elisa Ricci
65
0
0
04 Nov 2024
UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models
Sejoon Oh
Yiqiao Jin
Megha Sharma
Donghyun Kim
Eric Ma
Gaurav Verma
Srijan Kumar
71
6
0
03 Nov 2024
Visual Fourier Prompt Tuning
Runjia Zeng
Cheng Han
Qifan Wang
Chunshu Wu
Tong Geng
Lifu Huang
Ying Nian Wu
Dongfang Liu
VPVLM
VLM
70
6
0
02 Nov 2024
Can Multimodal Large Language Model Think Analogically?
Diandian Guo
Cong Cao
Fangfang Yuan
Dakui Wang
Wei Ma
Yanbing Liu
Jianhui Fu
LRM
56
0
0
02 Nov 2024
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy
Shuqing Luo
Jie Peng
Pingzhi Li
Tianlong Chen
MoE
41
0
0
02 Nov 2024
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
Jian Chen
Ruiyi Zhang
Yufan Zhou
Tong Yu
Franck Dernoncourt
J. Gu
Ryan Rossi
Changyou Chen
Tong Sun
44
0
0
02 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
101
1
0
01 Nov 2024
Right this way: Can VLMs Guide Us to See More to Answer Questions?
Li Liu
Diji Yang
Sijia Zhong
Kalyana Suma Sree Tholeti
Lei Ding
Yi Zhang
Leilani H. Gilpin
69
2
0
01 Nov 2024
Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models
Jonggyu Jang
Hyeonsu Lyu
Jungyeon Koh
H. Yang
VLM
AAML
50
0
0
01 Nov 2024
SeafloorAI: A Large-scale Vision-Language Dataset for Seafloor Geological Survey
Kien X. Nguyen
Fengchun Qiao
Arthur Trembanis
Xi Peng
45
0
0
31 Oct 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Nursena Köprücü
Destiny Okpekpe
Antonio Orvieto
Mamba
59
1
0
31 Oct 2024
Biotic Browser: Applying StreamingLLM as a Persistent Web Browsing Co-Pilot
Kevin Dunnell
Andrew Stoddard
LM&Ro
25
0
0
31 Oct 2024
EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection
Qinqian Lei
Bo Wang
Robby T. Tan
VLM
53
5
0
31 Oct 2024
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
Mathilde Caron
Alireza Fathi
Cordelia Schmid
Ahmet Iscen
47
1
0
31 Oct 2024
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
Xiufeng Song
Xiao Guo
Junxuan Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
DiffM
VGen
81
9
0
31 Oct 2024
LLaMo: Large Language Model-based Molecular Graph Assistant
Jinyoung Park
Minseong Bae
Dohwan Ko
Hyunwoo J. Kim
51
2
0
31 Oct 2024
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
Sina Rismanchian
Yasaman Razeghi
Sameer Singh
Shayan Doroudi
58
1
0
31 Oct 2024
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Xinyuan Chang
Maixuan Xue
Xinran Liu
Zheng Pan
Xing Wei
81
2
0
31 Oct 2024
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
Jie Zhu
Yukang Chen
Mingyu Ding
Ping Luo
Leye Wang
Jingdong Wang
DiffM
47
4
0
30 Oct 2024
Previous
1
2
3
...
23
24
25
...
65
66
67
Next