ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.10773
  4. Cited By
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction
  Tuning

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

21 December 2022
Zhiyang Xu
Ying Shen
Lifu Huang
    MLLM
ArXivPDFHTML

Papers citing "MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning"

42 / 92 papers shown
Title
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
34
170
0
20 Sep 2023
MMICL: Empowering Vision-language Model with Multi-Modal In-Context
  Learning
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Haozhe Zhao
Zefan Cai
Shuzheng Si
Xiaojian Ma
Kaikai An
Liang Chen
Zixuan Liu
Sheng Wang
Wenjuan Han
Baobao Chang
MLLM
VLM
28
133
0
14 Sep 2023
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the
  Wild
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Huayang Li
Siheng Li
Deng Cai
Longyue Wang
Lemao Liu
Taro Watanabe
Yujiu Yang
Shuming Shi
MLLM
52
17
0
14 Sep 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal
  Instruction-Following Models
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
35
22
0
31 Aug 2023
Evaluating the Robustness to Instructions of Large Language Models
Yuansheng Ni
Sichao Jiang
Xinyu Wu
Hui Shen
Yuli Zhou
ALM
22
2
0
28 Aug 2023
Position-Enhanced Visual Instruction Tuning for Multimodal Large
  Language Models
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
Chi Chen
Ruoyu Qin
Fuwen Luo
Xiaoyue Mi
Peng Li
Maosong Sun
Yang Liu
MLLM
VLM
14
45
0
25 Aug 2023
Instruction Tuning for Large Language Models: A Survey
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang
Linfeng Dong
Xiaoya Li
Sen Zhang
Xiaofei Sun
...
Jiwei Li
Runyi Hu
Tianwei Zhang
Fei Wu
Guoyin Wang
LM&MA
24
534
0
21 Aug 2023
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual
  Questions
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu
Y. Xu
Y. Li
W. Li
Z. Chen
Z. Tu
MLLM
VLM
28
121
0
19 Aug 2023
OctoPack: Instruction Tuning Code Large Language Models
OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff
Qian Liu
A. Zebaze
Qinkai Zheng
Binyuan Hui
Terry Yue Zhuo
Swayam Singh
Xiangru Tang
Leandro von Werra
Shayne Longpre
VLM
ALM
65
117
0
14 Aug 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following
  Inspired by Real-World Use
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
31
77
0
12 Aug 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMe
MLLM
61
42
0
30 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future
  Challenges
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
26
18
0
21 Jul 2023
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Yang Zhao
Zhijie Lin
Daquan Zhou
Zilong Huang
Jiashi Feng
Bingyi Kang
MLLM
33
106
0
17 Jul 2023
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed
  Question Answering
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
Pei Ke
Fei Huang
Fei Mi
Yasheng Wang
Qun Liu
Xiaoyan Zhu
Minlie Huang
ReLM
ELM
34
10
0
13 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
64
523
0
12 Jul 2023
Large Multimodal Models: Notes on CVPR 2023 Tutorial
Large Multimodal Models: Notes on CVPR 2023 Tutorial
Chunyuan Li
MLLM
VLM
14
20
0
26 Jun 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Bill Xu
Enhong Chen
MLLM
LRM
54
555
0
23 Jun 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language
  Models
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Yunhang Shen
Yulei Qin
Mengdan Zhang
...
Xiawu Zheng
Ke Li
Xing Sun
Zhenyu Qiu
Rongrong Ji
ELM
MLLM
42
759
0
23 Jun 2023
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and
  Text Integration
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration
Chenyang Lyu
Minghao Wu
Longyue Wang
Xinting Huang
Bingshuai Liu
Zefeng Du
Shuming Shi
Zhaopeng Tu
MLLM
AuLLM
29
160
0
15 Jun 2023
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
  Framework, and Benchmark
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
Zhen-fei Yin
Jiong Wang
Jianjian Cao
Zhelun Shi
Dingning Liu
...
Lei Bai
Xiaoshui Huang
Zhiyong Wang
Jing Shao
Wanli Ouyang
MLLM
24
152
0
11 Jun 2023
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Bo-wen Li
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Jingkang Yang
C. Li
Ziwei Liu
MLLM
VLM
34
224
0
08 Jun 2023
M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual
  Instruction Tuning
M3^33IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
...
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
MLLM
VLM
27
115
0
07 Jun 2023
Layout and Task Aware Instruction Prompt for Zero-shot Document Image
  Question Answering
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
Wenjin Wang
Yunhao Li
Yixin Ou
Yin Zhang
VLM
21
24
0
01 Jun 2023
ChatBridge: Bridging Modalities with Large Language Model as a Language
  Catalyst
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Zijia Zhao
Longteng Guo
Tongtian Yue
Si-Qing Chen
Shuai Shao
Xinxin Zhu
Zehuan Yuan
Jing Liu
MLLM
32
52
0
25 May 2023
AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes
AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes
Barry Menglong Yao
Yu Chen
Qifan Wang
Sijia Wang
Minqian Liu
Zhiyang Xu
Licheng Yu
Lifu Huang
11
7
0
24 May 2023
Instruction Tuned Models are Quick Learners
Instruction Tuned Models are Quick Learners
Himanshu Gupta
Saurabh Arjun Sawant
Swaroop Mishra
Mutsumi Nakamura
Arindam Mitra
Santosh Mashetty
Chitta Baral
26
26
0
17 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLM
VLM
17
1,903
0
11 May 2023
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
T. Gong
Chengqi Lyu
Shilong Zhang
Yudong Wang
Miao Zheng
Qianmengke Zhao
Kuikun Liu
Wenwei Zhang
Ping Luo
Kai-xiang Chen
MLLM
34
252
0
08 May 2023
Otter: A Multi-Modal Model with In-Context Instruction Tuning
Otter: A Multi-Modal Model with In-Context Instruction Tuning
Bo-wen Li
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Jingkang Yang
Ziwei Liu
MLLM
37
504
0
05 May 2023
Understand the Dynamic World: An End-to-End Knowledge Informed Framework
  for Open Domain Entity State Tracking
Understand the Dynamic World: An End-to-End Knowledge Informed Framework for Open Domain Entity State Tracking
Mingchen Li
Lifu Huang
43
9
0
26 Apr 2023
Improving Diffusion Models for Scene Text Editing with Dual Encoders
Improving Diffusion Models for Scene Text Editing with Dual Encoders
Jiabao Ji
Guanhua Zhang
Zhaowen Wang
Bairu Hou
Zhifei Zhang
Brian L. Price
Shiyu Chang
DiffM
32
29
0
12 Apr 2023
Unified Text Structuralization with Instruction-tuned Language Models
Unified Text Structuralization with Instruction-tuned Language Models
Xuanfan Ni
Piji Li
Huayang Li
36
13
0
27 Mar 2023
Large Language Model Instruction Following: A Survey of Progresses and
  Challenges
Large Language Model Instruction Following: A Survey of Progresses and Challenges
Renze Lou
Kai Zhang
Wenpeng Yin
ALM
LRM
29
20
0
18 Mar 2023
Can Pre-trained Vision and Language Models Answer Visual
  Information-Seeking Questions?
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
Yang Chen
Hexiang Hu
Yi Luan
Haitian Sun
Soravit Changpinyo
Alan Ritter
Ming-Wei Chang
39
80
0
23 Feb 2023
The Flan Collection: Designing Data and Methods for Effective
  Instruction Tuning
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
Shayne Longpre
Le Hou
Tu Vu
Albert Webson
Hyung Won Chung
...
Denny Zhou
Quoc V. Le
Barret Zoph
Jason W. Wei
Adam Roberts
ALM
29
623
0
31 Jan 2023
A Survey on In-context Learning
A Survey on In-context Learning
Qingxiu Dong
Lei Li
Damai Dai
Ce Zheng
Jingyuan Ma
...
Zhiyong Wu
Baobao Chang
Xu Sun
Lei Li
Zhifang Sui
ReLM
AIMat
20
461
0
31 Dec 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,915
0
04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
213
1,656
0
15 Oct 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
275
1,082
0
17 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
256
525
0
04 Feb 2021
Assessing Image Quality Issues for Real-World Problems
Assessing Image Quality Issues for Real-World Problems
Tai-Yin Chiu
Yinan Zhao
Danna Gurari
49
54
0
27 Mar 2020
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in
  Natural Images
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
188
515
0
26 Jan 2016
Previous
12