Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.05425
Cited By
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
8 June 2023
Bo-wen Li
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Jingkang Yang
C. Li
Ziwei Liu
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MIMIC-IT: Multi-Modal In-Context Instruction Tuning"
49 / 49 papers shown
Title
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
70
0
0
03 May 2025
Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization
Huiyi Chen
Jiawei Peng
Kaihua Tang
Xin Geng
Xu Yang
32
0
0
19 Apr 2025
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
Yuxiang Lin
Jingdong Sun
Zhi-Qi Cheng
Jue Wang
Haomin Liang
Zebang Cheng
Yifei Dong
Jun-Yan He
Xiaojiang Peng
Xian-Sheng Hua
52
0
0
10 Apr 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
95
0
0
26 Mar 2025
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo
Fan Ma
Linchao Zhu
T. Wang
Fengyun Rao
Yi Yang
LRM
77
0
0
26 Mar 2025
Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model
Shiryu Ueno
Yoshikazu Hayashi
Shunsuke Nakatsuka
Yusei Yamada
Hiroaki Aizawa
K. Kato
MLLM
VLM
105
0
0
13 Feb 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
229
0
0
04 Feb 2025
MULTI: Multimodal Understanding Leaderboard with Text and Images
Zichen Zhu
Yang Xu
Lu Chen
Jingkai Yang
Yichuan Ma
...
Yingzi Ma
Situo Zhang
Zihan Zhao
Liangtai Sun
Kai Yu
VLM
54
5
0
08 Jan 2025
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia
Mingxing Li
Hancheng Ye
Wenjie Wu
Hongbin Zhou
...
Zeang Sheng
Botian Shi
Tao Chen
Junchi Yan
Bo Zhang
91
8
0
16 Dec 2024
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
100
0
0
04 Dec 2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Baiqi Li
Zhiqiu Lin
Wenxuan Peng
Jean de Dieu Nyandwi
Daniel Jiang
Zixian Ma
Simran Khanuja
Ranjay Krishna
Graham Neubig
Deva Ramanan
AAML
CoGe
VLM
71
22
0
18 Oct 2024
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
Yang Bai
Yang Zhou
Jun Zhou
Rick Siow Mong Goh
Daniel Ting
Yong Liu
VLM
52
0
0
09 Oct 2024
Scaling Large Motion Models with Million-Level Human Motions
Ye Wang
Sipeng Zheng
Bin Cao
Qianshan Wei
Qin Jin
Qin Jin
Zongqing Lu
VGen
42
0
0
04 Oct 2024
Generating Visual Stories with Grounded and Coreferent Characters
Danyang Liu
Mirella Lapata
Frank Keller
23
2
0
20 Sep 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi
Fuxiao Liu
Shihao Wang
Shijia Liao
Subhashree Radhakrishnan
...
Andrew Tao
Andrew Tao
Zhiding Yu
Guilin Liu
Guilin Liu
MLLM
38
54
0
28 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLM
MLLM
68
19
0
04 Aug 2024
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
Yisong Xiao
Aishan Liu
QianJia Cheng
Zhenfei Yin
Siyuan Liang
Jiapeng Li
Jing Shao
Xianglong Liu
Dacheng Tao
53
4
0
30 Jun 2024
First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models
Enming Zhang
Ruobing Yao
Huanyong Liu
Junhui Yu
Jiale Wang
ELM
LRM
55
0
0
14 Jun 2024
Grounding Multimodal Large Language Models in Actions
Andrew Szot
Bogdan Mazoure
Harsh Agrawal
Devon Hjelm
Z. Kira
Alexander Toshev
LM&Ro
40
10
0
12 Jun 2024
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu
Sheng Jin
Wenwei Zhang
Lumin Xu
Wentao Liu
Wei Li
Chen Change Loy
MLLM
80
12
0
09 Jun 2024
Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model
Seonhee Cho
Choonghan Kim
Jiho Lee
Chetan Chilkunda
Sujin Choi
Joo Heung Yoon
53
0
0
29 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
46
38
0
24 Apr 2024
MMInA: Benchmarking Multihop Multimodal Internet Agents
Ziniu Zhang
Shulin Tian
Liangyu Chen
Ziwei Liu
LLMAG
LM&Ro
35
13
0
15 Apr 2024
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
Guan-Feng Wang
Long Bai
Wan Jun Nah
Jie Wang
Zhaoxi Zhang
Zhen Chen
Jinlin Wu
Mobarakol Islam
Hongbin Liu
Hongliang Ren
46
14
0
22 Mar 2024
UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng
Bohan Zhou
Yicheng Feng
Ye Wang
Zongqing Lu
VLM
MLLM
46
7
0
14 Mar 2024
Embodied Understanding of Driving Scenarios
Yunsong Zhou
Linyan Huang
Qingwen Bu
Jia Zeng
Tianyu Li
Hang Qiu
Hongzi Zhu
Minyi Guo
Yu Qiao
Hongyang Li
LM&Ro
62
31
0
07 Mar 2024
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models
Jiawei Liang
Siyuan Liang
Man Luo
Aishan Liu
Dongchen Han
Ee-Chien Chang
Xiaochun Cao
42
38
0
21 Feb 2024
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
Fuwen Luo
Chi Chen
Zihao Wan
Zhaolu Kang
Qidong Yan
...
Xiaoyue Mi
Peng Li
Ning Ma
Maosong Sun
Yang Liu
43
5
0
21 Feb 2024
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
Ziyue Wang
Chi Chen
Yiqi Zhu
Fuwen Luo
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Maosong Sun
Yang Liu
46
5
0
19 Feb 2024
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen
Ruizhe Li
Yuchen Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Ensiong Chng
Chao-Han Huck Yang
36
20
0
08 Feb 2024
In-context Learning with Retrieved Demonstrations for Language Models: A Survey
an Luo
Xin Xu
Yue Liu
Panupong Pasupat
Mehran Kazemi
RALM
34
55
0
21 Jan 2024
COCO is "ALL'' You Need for Visual Instruction Fine-tuning
Xiaotian Han
Yiqi Wang
Bohan Zhai
Quanzeng You
Hongxia Yang
VLM
MLLM
33
2
0
17 Jan 2024
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Erik Cambria
Fukun Yin
Gang Yu
Tao Chen
36
24
0
17 Dec 2023
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
M. S. Seyfioglu
Wisdom O. Ikezogwo
Fatemeh Ghezloo
Ranjay Krishna
Linda G. Shapiro
32
37
0
07 Dec 2023
Describing Differences in Image Sets with Natural Language
Lisa Dunlap
Yuhui Zhang
Xiaohan Wang
Ruiqi Zhong
Trevor Darrell
Jacob Steinhardt
Joseph E. Gonzalez
Serena Yeung-Levy
CoGe
VLM
32
30
0
05 Dec 2023
Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models
Bingshuai Liu
Chenyang Lyu
Zijun Min
Zhanyu Wang
Jinsong Su
Longyue Wang
LRM
39
7
0
04 Dec 2023
Dolphins: Multimodal Language Model for Driving
Yingzi Ma
Yulong Cao
Jiachen Sun
Marco Pavone
Chaowei Xiao
MLLM
38
51
0
01 Dec 2023
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Yifan Du
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Jinpeng Wang
Chuyuan Wang
Mingchen Cai
Ruihua Song
Ji-Rong Wen
VLM
MLLM
LRM
75
22
0
02 Nov 2023
Improving Automatic VQA Evaluation Using Large Language Models
Oscar Manas
Benno Krojer
Aishwarya Agrawal
32
21
0
04 Oct 2023
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning
Mustafa Shukor
Alexandre Ramé
Corentin Dancette
Matthieu Cord
LRM
MLLM
46
20
0
01 Oct 2023
Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition
W. He
Kai Han
Ying Nie
Chengcheng Wang
Yunhe Wang
VLM
48
6
0
25 Sep 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
41
22
0
31 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
Fahad Shahbaz Khan
VLM
38
119
0
25 Jul 2023
Visual Instruction Tuning with Polite Flamingo
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
34
42
0
03 Jul 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
171
588
0
06 Apr 2023
Learning by Distilling Context
Charles Burton Snell
Dan Klein
Ruiqi Zhong
ReLM
LRM
174
44
0
30 Sep 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
275
1,026
0
13 Oct 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
299
1,084
0
17 Feb 2021
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
119
276
0
24 Jan 2020
1