ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.02992
  4. Cited By
Kosmos-G: Generating Images in Context with Multimodal Large Language
  Models
v1v2v3 (latest)

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

4 October 2023
Xichen Pan
Li Dong
Shaohan Huang
Zhiliang Peng
Wenhu Chen
Furu Wei
    VLM
ArXiv (abs)PDFHTML

Papers citing "Kosmos-G: Generating Images in Context with Multimodal Large Language Models"

50 / 57 papers shown
Title
FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers
FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers
Yanbing Zhang
Zhe Wang
Qin Zhou
Mengping Yang
DiffM
9
0
0
21 Jul 2025
Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation
Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation
Chaehun Shin
Jooyoung Choi
Johan Barthelemy
Jungbeom Lee
Sungroh Yoon
DiffM
93
0
0
04 Jun 2025
Create Anything Anywhere: Layout-Controllable Personalized Diffusion Model for Multiple Subjects
Create Anything Anywhere: Layout-Controllable Personalized Diffusion Model for Multiple Subjects
Wei Li
Hebei Li
Yansong Peng
Siying Wu
Yueyi Zhang
Xiaoyan Sun
DiffM
130
0
0
27 May 2025
Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM
Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM
Peng Liu
Xiaoming Ren
Fengkai Liu
Qingsong Xie
Quanlong Zheng
Yanhao Zhang
Haonan Lu
Yujiu Yang
EGVMVGen
95
0
0
26 May 2025
Regularized Personalization of Text-to-Image Diffusion Models without Distributional Drift
Regularized Personalization of Text-to-Image Diffusion Models without Distributional Drift
Gihoon Kim
Hyungjin Park
Taesup Kim
DiffMVLM
215
0
0
26 May 2025
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Bingda Tang
Boyang Zheng
Xichen Pan
Sayak Paul
Saining Xie
90
0
0
15 May 2025
Behind Maya: Building a Multilingual Vision Language Model
Behind Maya: Building a Multilingual Vision Language Model
Nahid Alam
Karthik Reddy Kanjula
Surya Guthikonda
Timothy Chung
Bala Krishna S Vegesna
...
Isha Chaturvedi
Genta Indra Winata
Ashvanth.S
Snehanshu Mukherjee
Alham Fikri Aji
MLLMVLM
95
1
0
13 May 2025
STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
Bo Wang
Haoyang Huang
Zhiying Lu
Fengyuan Liu
Guoqing Ma
Jianlong Yuan
Y. Zhang
Nan Duan
Daxin Jiang
VGen
121
1
0
13 May 2025
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA
Karthik Reddy Kanjula
Surya Guthikonda
Nahid Alam
Shayekh Bin Islam
102
0
0
09 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
375
5
0
05 May 2025
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li
Xin Gu
Fan Chen
X. Xing
Longyin Wen
Chong Chen
Sijie Zhu
DiffM
277
2
0
05 May 2025
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms
Chengkai Huang
Hongtao Huang
Tong Yu
Kaige Xie
Junda Wu
Shuai Zhang
Julian McAuley
Dietmar Jannach
Lina Yao
LRMAI4CE
94
1
0
23 Apr 2025
Personalized Text-to-Image Generation with Auto-Regressive Models
Personalized Text-to-Image Generation with Auto-Regressive Models
Kaiyue Sun
Xian Liu
Yao Teng
Xihui Liu
96
2
0
17 Apr 2025
Flux Already Knows -- Activating Subject-Driven Image Generation without Training
Flux Already Knows -- Activating Subject-Driven Image Generation without Training
Hao Kang
Stathi Fotiadis
Liming Jiang
Qing Yan
Yumin Jia
Zichuan Liu
Min Jin Chong
Xin Lu
99
3
0
12 Apr 2025
Transfer between Modalities with MetaQueries
Transfer between Modalities with MetaQueries
Xichen Pan
Satya Narayan Shukla
Aashu Singh
Zhuokai Zhao
Shlok Kumar Mishra
...
Jiuhai Chen
Kunpeng Li
F. Xu
Ji Hou
Saining Xie
DiffM
117
29
0
08 Apr 2025
InstructVEdit: A Holistic Approach for Instructional Video Editing
InstructVEdit: A Holistic Approach for Instructional Video Editing
Chi Zhang
C. Feng
Feng Yan
Qiming Zhang
Mingjin Zhang
Yujie Zhong
Jing Zhang
Lin Ma
DiffMVGen
100
1
0
22 Mar 2025
TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning in Text-to-Image Models
TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning in Text-to-Image Models
Teng-Fang Hsiao
Bo-Kai Ruan
Yi-Lun Wu
Tzu-Ling Lin
Hong-Han Shuai
VLM
149
1
0
19 Mar 2025
Piece it Together: Part-Based Concepting with IP-Priors
Elad Richardson
Kfir Goldberg
Yuval Alaluf
Daniel Cohen-Or
DiffM
112
0
0
13 Mar 2025
Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias
Mingxiao Li
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
EGVM
81
1
0
09 Mar 2025
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
Jian Ma
Qirong Peng
Xu Guo
Chen Chen
H. Lu
Zhenyu Yang
VLM
198
1
0
08 Mar 2025
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models
Zhendong Wang
Jianmin Bao
Shuyang Gu
Dong Chen
Wengang Zhou
Haoyang Li
DiffM
93
4
0
03 Mar 2025
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
Zhipeng Huang
Shaobin Zhuang
Canmiao Fu
Binxin Yang
Ying Zhang
Chong Sun
Zhizheng Zhang
Yali Wang
Chen Li
Zheng-Jun Zha
DiffM
134
4
0
03 Mar 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
196
17
0
06 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLMVLMLRM
378
69
0
03 Jan 2025
RealCustom++: Representing Images as Real-Word for Real-Time Customization
RealCustom++: Representing Images as Real-Word for Real-Time Customization
Zhendong Mao
Mengqi Huang
Fei Ding
Mingcong Liu
Qian He
Xiaojun Chang
DiffM
184
11
0
03 Jan 2025
DreamOmni: Unified Image Generation and Editing
DreamOmni: Unified Image Generation and Editing
Bin Xia
Yuechen Zhang
Jingyao Li
Chengyao Wang
Yitong Wang
Xinglong Wu
Bei Yu
Jiaya Jia
SyDaMLLM
137
5
0
22 Dec 2024
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Chaehun Shin
Jooyoung Choi
Heeseung Kim
Sungroh Yoon
DiffM
196
17
0
23 Nov 2024
Novel Object Synthesis via Adaptive Text-Image Harmony
Novel Object Synthesis via Adaptive Text-Image Harmony
Zeren Xiong
Zedong Zhang
Zikun Chen
Shuo Chen
Xianrui Li
Gan Sun
Jian Yang
Jun Li
DiffM
108
7
0
28 Oct 2024
MotionBank: A Large-scale Video Motion Benchmark with Disentangled
  Rule-based Annotations
MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations
Liang Xu
Shaoyang Hua
Zili Lin
Yifan Liu
Feipeng Ma
Yichao Yan
Xin Jin
Xiaokang Yang
Wenjun Zeng
VGen
111
8
0
17 Oct 2024
FreeEdit: Mask-free Reference-based Image Editing with Multi-modal
  Instruction
FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction
Runze He
Kai Ma
Linjiang Huang
Shaofei Huang
Jialin Gao
Xiaoming Wei
Jiao Dai
Jizhong Han
Si Liu
DiffM
90
11
0
26 Sep 2024
Multi-modal Generative AI: Multi-modal LLMs, Diffusions and the Unification
Multi-modal Generative AI: Multi-modal LLMs, Diffusions and the Unification
X. Wang
Yuwei Zhou
Bin Huang
Hong Chen
Wenwu Zhu
DiffM
177
1
0
23 Sep 2024
OmniGen: Unified Image Generation
OmniGen: Unified Image Generation
Shitao Xiao
Yueze Wang
Yueze Wang
Huaying Yuan
Xingrun Xing
Ruiran Yan
Shuting Wang
Tiejun Huang
Zheng Liu
DiffMVLMSyDa
161
108
0
17 Sep 2024
GroundingBooth: Grounding Text-to-Image Customization
GroundingBooth: Grounding Text-to-Image Customization
Zhexiao Xiong
Wei Xiong
Jing Shi
He Zhang
Yizhi Song
Nathan Jacobs
DiffM
176
9
0
13 Sep 2024
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Lingwei Meng
Shujie Hu
Jiawen Kang
Zhaoqing Li
Yuejiao Wang
Wenxuan Wu
Xixin Wu
Xunying Liu
Helen Meng
AuLLM
178
6
0
13 Sep 2024
CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization
CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization
Nan Chen
Mengqi Huang
Zhuowei Chen
Yang Zheng
Lei Zhang
Zhendong Mao
DiffM
173
7
0
09 Sep 2024
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware
  Open-domain Visual Storytelling
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Zilyu Ye
Yu Lei
Ruotian Peng
Jinjin Cao
Zhiyang Chen
...
Mingyuan Zhou
Xiaoqian Shen
Mohamed Elhoseiny
Nan Zhuang
Guo-Jun Qi
VGenVLM
93
1
0
07 Aug 2024
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Haozhe Zhao
Xiaojian Ma
Liang Chen
Shuzheng Si
Rujie Wu
Kaikai An
Peiyu Yu
Minjia Zhang
Qing Li
Baobao Chang
118
83
0
07 Jul 2024
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
William Berman
A. Peysakhovich
93
5
0
26 Jun 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
211
48
0
24 Jun 2024
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven
  Text-to-Image Generation
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation
Yufan Zhou
Ruiyi Zhang
Kaizhi Zheng
Nanxuan Zhao
Jiuxiang Gu
Zichao Wang
Xin Eric Wang
Tong Sun
DiffM
71
2
0
13 Jun 2024
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal
  Prompts
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Yucheng Han
Rui Wang
Chi Zhang
Juntao Hu
Pei Cheng
Bin-Bin Fu
Hanwang Zhang
123
8
0
13 Jun 2024
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
X. Wang
Siming Fu
Qihan Huang
Wanggui He
Hao Jiang
DiffM
160
61
0
11 Jun 2024
RefDrop: Controllable Consistency in Image or Video Generation via
  Reference Feature Guidance
RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance
JiaoJiao Fan
Haotian Xue
Qinsheng Zhang
Yongxin Chen
87
2
0
27 May 2024
A Survey on Personalized Content Synthesis with Diffusion Models
A Survey on Personalized Content Synthesis with Diffusion Models
Xu-Lu Zhang
Xiao Wei
Wengyu Zhang
Jinlin Wu
Jiaxin Wu
Zhen Lei
Zhaoxiang Zhang
Zhen Lei
Qing Li
EGVM
268
22
0
09 May 2024
Controllable Generation with Text-to-Image Diffusion Models: A Survey
Controllable Generation with Text-to-Image Diffusion Models: A Survey
Pu Cao
Feng Zhou
Qing-Huang Song
Lu Yang
141
42
0
07 Mar 2024
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Zekun Qi
Runpei Dong
Shaochen Zhang
Haoran Geng
Chunrui Han
Zheng Ge
Li Yi
Kaisheng Ma
220
75
0
27 Feb 2024
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized
  Diffusion Models
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models
Shyam Marjit
Harshit Singh
Nityanand Mathur
Sayak Paul
Chia-Mu Yu
Pin-Yu Chen
DiffM
98
8
0
27 Feb 2024
$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion
  Models by Leveraging CLIP Latent Space
λλλ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
Maitreya Patel
Sangmin Jung
Chitta Baral
Yezhou Yang
VLM
99
38
0
07 Feb 2024
CreativeSynth: Cross-Art-Attention for Artistic Image Synthesis with Multimodal Diffusion
CreativeSynth: Cross-Art-Attention for Artistic Image Synthesis with Multimodal Diffusion
Nisha Huang
Weiming Dong
Yuxin Zhang
Fan Tang
Ronghui Li
Chongyang Ma
Xiu Li
Tong-Yee Lee
Changsheng Xu
DiffM
98
7
0
25 Jan 2024
Instruct-Imagen: Image Generation with Multi-modal Instruction
Instruct-Imagen: Image Generation with Multi-modal Instruction
Hexiang Hu
Kelvin C. K. Chan
Yu-Chuan Su
Wenhu Chen
Yandong Li
...
Xue Ben
Boqing Gong
William W. Cohen
Ming-Wei Chang
Xuhui Jia
MLLM
154
53
0
03 Jan 2024
12
Next