ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.03744
  4. Cited By
Improved Baselines with Visual Instruction Tuning

Improved Baselines with Visual Instruction Tuning

5 October 2023
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Improved Baselines with Visual Instruction Tuning"

50 / 483 papers shown
Title
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language
  Models
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Mingze Xu
Mingfei Gao
Zhe Gan
Hong-You Chen
Zhengfeng Lai
Haiming Gang
Kai Kang
Afshin Dehghan
59
49
0
22 Jul 2024
Stretching Each Dollar: Diffusion Training from Scratch on a
  Micro-Budget
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag
Xianghao Kong
Jingtao Li
Michael Spranger
Lingjuan Lyu
DiffM
47
9
0
22 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal
  Reasoning
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
39
9
0
22 Jul 2024
ViLLa: Video Reasoning Segmentation with Large Language Model
ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
VOS
LRM
72
2
0
18 Jul 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
42
2
0
17 Jul 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
33
3
0
17 Jul 2024
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang
Bo Li
Peiyuan Zhang
Fanyi Pu
Joshua Adrian Cahyono
...
Shuai Liu
Yuanhan Zhang
Jingkang Yang
Chunyuan Li
Ziwei Liu
97
74
0
17 Jul 2024
Reflective Instruction Tuning: Mitigating Hallucinations in Large
  Vision-Language Models
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Jinrui Zhang
Teng Wang
Haigang Zhang
Ping Lu
Feng Zheng
MLLM
LRM
VLM
31
3
0
16 Jul 2024
VGBench: Evaluating Large Language Models on Vector Graphics
  Understanding and Generation
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
Bocheng Zou
Mu Cai
Jianrui Zhang
Yong Jae Lee
35
2
1
15 Jul 2024
GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via
  VLM
GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM
Keshav Bimbraw
Ye Wang
Jing Liu
T. Koike-Akino
VLM
MedIm
LM&MA
37
1
0
15 Jul 2024
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Shraman Pramanick
Rama Chellappa
Subhashini Venugopalan
50
13
0
12 Jul 2024
GeNet: A Multimodal LLM-Based Co-Pilot for Network Topology and
  Configuration
GeNet: A Multimodal LLM-Based Co-Pilot for Network Topology and Configuration
Beni Ifland
Elad Duani
Rubin Krief
Miro Ohana
Aviram Zilberman
...
Ortal Lavi
Hikichi Kenji
A. Shabtai
Yuval Elovici
Rami Puzis
33
3
0
11 Jul 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Mengzhao Chen
Wenqi Shao
Peng Xu
Jiahao Wang
Peng Gao
Kaipeng Zhang
Yu Qiao
MQ
43
24
0
10 Jul 2024
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for
  Interleaved Image-Text Generation
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
Ethan Chern
Jiadi Su
Yan Ma
Pengfei Liu
MLLM
29
28
0
08 Jul 2024
Hybrid Primal Sketch: Combining Analogy, Qualitative Representations,
  and Computer Vision for Scene Understanding
Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding
Kenneth D. Forbus
Kezhen Chen
Wangcheng Xu
Madeline Usher
43
0
0
05 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language
  Models: Challenges, Limitations, and Recommendations
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq R. Joty
Jimmy Huang
ELM
ALM
29
28
0
04 Jul 2024
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian
Hanrong Ye
J. Fauconnier
Peter Grasch
Yinfei Yang
Zhe Gan
108
13
0
01 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description
  Models
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
44
52
0
30 Jun 2024
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models
  via Counterfactual Probing
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
Yisong Xiao
Aishan Liu
QianJia Cheng
Zhenfei Yin
Siyuan Liang
Jiapeng Li
Jing Shao
Xianglong Liu
Dacheng Tao
43
4
0
30 Jun 2024
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Jinsheng Huang
Liang Chen
Taian Guo
Fu Zeng
Yusheng Zhao
...
Wei Ju
Luchen Liu
Tianyu Liu
Baobao Chang
Ming Zhang
43
5
0
29 Jun 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Xiang Li
Cristina Mata
J. Park
Kumara Kahatapitiya
Yoo Sung Jang
...
Kanchana Ranasinghe
R. Burgert
Mu Cai
Yong Jae Lee
Michael S. Ryoo
LM&Ro
72
25
0
28 Jun 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
58
25
0
28 Jun 2024
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Yicheng Chen
Xiangtai Li
Yining Li
Yanhong Zeng
Jianzong Wu
Xiangyu Zhao
Kai Chen
VLM
DiffM
56
3
0
28 Jun 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Di Zhang
Xi Li
MoE
56
2
0
28 Jun 2024
Curriculum Learning with Quality-Driven Data Selection
Curriculum Learning with Quality-Driven Data Selection
Biao Wu
Fang Meng
Ling-Hao Chen
32
2
0
27 Jun 2024
ColPali: Efficient Document Retrieval with Vision Language Models
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
67
21
0
27 Jun 2024
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Xiangyu Zhao
Xiangtai Li
Haodong Duan
Haian Huang
Yining Li
Kai Chen
Hua Yang
VLM
MLLM
42
10
0
25 Jun 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
75
31
0
24 Jun 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLM
VLM
40
22
0
18 Jun 2024
On Efficient Language and Vision Assistants for Visually-Situated
  Natural Language Understanding: What Matters in Reading and Reasoning
On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
Geewook Kim
Minjoon Seo
VLM
41
2
0
17 Jun 2024
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report
Franz Louis Cesista
VGen
52
6
0
17 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
85
24
0
17 Jun 2024
Concept-skill Transferability-based Data Selection for Large
  Vision-Language Models
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
Jaewoo Lee
Boyang Li
Sung Ju Hwang
VLM
43
8
0
16 Jun 2024
Reminding Multimodal Large Language Models of Object-aware Knowledge
  with Retrieved Tags
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags
Daiqing Qi
Handong Zhao
Zijun Wei
Sheng Li
46
2
0
16 Jun 2024
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for
  Remote Sensing Vision-Language Understanding
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding
Junwei Luo
Zhen Pang
Yongjun Zhang
Tingzhu Wang
Linlin Wang
...
Jiangwei Lao
Jian Wang
Jingdong Chen
Yihua Tan
Yansheng Li
48
21
0
14 Jun 2024
First Multi-Dimensional Evaluation of Flowchart Comprehension for
  Multimodal Large Language Models
First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models
Enming Zhang
Ruobing Yao
Huanyong Liu
Junhui Yu
Jiale Wang
ELM
LRM
55
0
0
14 Jun 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James Validad Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
92
9
0
14 Jun 2024
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
VLind-Bench: Measuring Language Priors in Large Vision-Language Models
Kang-il Lee
Minbeom Kim
Seunghyun Yoon
Minsung Kim
Dongryeol Lee
Hyukhun Koh
Kyomin Jung
CoGe
VLM
89
5
0
13 Jun 2024
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
Xuannan Liu
Zekun Li
Peipei Li
Shuhan Xia
Xing Cui
Linzhi Huang
Huaibo Huang
Weihong Deng
Zhaofeng He
41
13
0
13 Jun 2024
What If We Recaption Billions of Web Images with LLaMA-3?
What If We Recaption Billions of Web Images with LLaMA-3?
Xianhang Li
Haoqin Tu
Mude Hui
Zeyu Wang
Bingchen Zhao
...
Jieru Mei
Qing Liu
Huangjie Zheng
Yuyin Zhou
Cihang Xie
VLM
MLLM
44
35
0
12 Jun 2024
The Impact of Initialization on LoRA Finetuning Dynamics
The Impact of Initialization on LoRA Finetuning Dynamics
Soufiane Hayou
Nikhil Ghosh
Bin Yu
AI4CE
36
11
0
12 Jun 2024
Multimodal Table Understanding
Multimodal Table Understanding
Mingyu Zheng
Xinwei Feng
Q. Si
Qiaoqiao She
Zheng-Shen Lin
Wenbin Jiang
Weiping Wang
LMTD
VLM
45
14
0
12 Jun 2024
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal
  Large Language Models
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu
Zeyang Zhou
Kexin Huang
Dandan Liang
Yixu Wang
...
Keqing Wang
Yujiu Yang
Yan Teng
Yu Qiao
Yingchun Wang
ELM
47
13
0
11 Jun 2024
FaceGPT: Self-supervised Learning to Chat about 3D Human Faces
FaceGPT: Self-supervised Learning to Chat about 3D Human Faces
Haoran Wang
Mohit Mendiratta
Christian Theobalt
Adam Kortylewski
3DH
CVBM
31
3
0
11 Jun 2024
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agent
Wenjia Xu
Zijian Yu
Yixu Wang
Jiuniu Wang
Yuanben Zhang
Guangzuo Li
Mugen Peng
LLMAG
48
7
0
11 Jun 2024
F-LMM: Grounding Frozen Large Multimodal Models
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu
Sheng Jin
Wenwei Zhang
Lumin Xu
Wentao Liu
Wei Li
Chen Change Loy
MLLM
80
12
0
09 Jun 2024
M3GIA: A Cognition Inspired Multilingual and Multimodal General
  Intelligence Ability Benchmark
M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark
Wei Song
Yadong Li
Jianhua Xu
Guowei Wu
Lingfeng Ming
...
Weihua Luo
Houyi Li
Yi Du
Fangda Guo
Kaicheng Yu
ELM
LRM
39
7
0
08 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
63
32
0
07 Jun 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and
  Effective for LMMs
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
Lingchen Meng
Jianwei Yang
Rui Tian
Xiyang Dai
Zuxuan Wu
Jianfeng Gao
Yu-Gang Jiang
VLM
24
9
0
06 Jun 2024
Understanding Information Storage and Transfer in Multi-modal Large
  Language Models
Understanding Information Storage and Transfer in Multi-modal Large Language Models
Samyadeep Basu
Martin Grayson
C. Morrison
Besmira Nushi
S. Feizi
Daniela Massiceti
25
10
0
06 Jun 2024
Previous
123...106789
Next