Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.02992
Cited By
v1
v2
v3 (latest)
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
4 October 2023
Xichen Pan
Li Dong
Shaohan Huang
Zhiliang Peng
Wenhu Chen
Furu Wei
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Kosmos-G: Generating Images in Context with Multimodal Large Language Models"
7 / 57 papers shown
Title
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
102
189
0
28 Dec 2023
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
LRM
178
313
0
20 Dec 2023
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Jinguo Zhu
Xiaohan Ding
Yixiao Ge
Yuying Ge
Sijie Zhao
Hengshuang Zhao
Xiaohua Wang
Ying Shan
ViT
VLM
97
40
0
14 Dec 2023
Customization Assistant for Text-to-image Generation
Yufan Zhou
Ruiyi Zhang
Jiuxiang Gu
Tongfei Sun
DiffM
101
13
0
05 Dec 2023
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Zineng Tang
Ziyi Yang
Mahmoud Khademi
Yang Liu
Chenguang Zhu
Mohit Bansal
LRM
MLLM
AuLLM
144
56
0
30 Nov 2023
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Wei-Ge Chen
Irina Spiridonova
Jianwei Yang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
102
38
0
01 Nov 2023
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Huayang Li
Siheng Li
Deng Cai
Longyue Wang
Lemao Liu
Taro Watanabe
Yujiu Yang
Shuming Shi
MLLM
154
18
0
14 Sep 2023
Previous
1
2