ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.02992
  4. Cited By
Kosmos-G: Generating Images in Context with Multimodal Large Language
  Models
v1v2v3 (latest)

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

4 October 2023
Xichen Pan
Li Dong
Shaohan Huang
Zhiliang Peng
Wenhu Chen
Furu Wei
    VLM
ArXiv (abs)PDFHTML

Papers citing "Kosmos-G: Generating Images in Context with Multimodal Large Language Models"

7 / 57 papers shown
Title
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLMMLLM
102
189
0
28 Dec 2023
Generative Multimodal Models are In-Context Learners
Generative Multimodal Models are In-Context Learners
Quan-Sen Sun
Yufeng Cui
Xiaosong Zhang
Fan Zhang
Qiying Yu
...
Yueze Wang
Yongming Rao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLMLRM
178
313
0
20 Dec 2023
VL-GPT: A Generative Pre-trained Transformer for Vision and Language
  Understanding and Generation
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Jinguo Zhu
Xiaohan Ding
Yixiao Ge
Yuying Ge
Sijie Zhao
Hengshuang Zhao
Xiaohua Wang
Ying Shan
ViTVLM
97
40
0
14 Dec 2023
Customization Assistant for Text-to-image Generation
Customization Assistant for Text-to-image Generation
Yufan Zhou
Ruiyi Zhang
Jiuxiang Gu
Tongfei Sun
DiffM
101
13
0
05 Dec 2023
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Zineng Tang
Ziyi Yang
Mahmoud Khademi
Yang Liu
Chenguang Zhu
Mohit Bansal
LRMMLLMAuLLM
144
56
0
30 Nov 2023
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
  Generation and Editing
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Wei-Ge Chen
Irina Spiridonova
Jianwei Yang
Jianfeng Gao
Chun-yue Li
MLLMVLM
102
38
0
01 Nov 2023
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the
  Wild
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild
Huayang Li
Siheng Li
Deng Cai
Longyue Wang
Lemao Liu
Taro Watanabe
Yujiu Yang
Shuming Shi
MLLM
154
18
0
14 Sep 2023
Previous
12