ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.10056
  4. Cited By
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

17 March 2023
Can Qin
Ning Yu
Chen Xing
Shu Zhen Zhang
Zeyuan Chen
Stefano Ermon
Yun Fu
Caiming Xiong
Ran Xu
    DiffM
ArXivPDFHTML

Papers citing "GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation"

25 / 25 papers shown
Title
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment
Hao Zhou
Xiaobao Guo
Yuzhe Zhu
A. Kong
DiffM
49
1
0
13 Mar 2025
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
Jian Ma
Qirong Peng
Xu Guo
Chen Chen
H. Lu
Zhenyu Yang
VLM
64
1
0
08 Mar 2025
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal
  Latent Alignment
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Kim Sung-Bin
Arda Senocak
Hyunwoo Ha
Tae-Hyun Oh
DiffM
78
0
0
09 Dec 2024
Artificial Intelligence for Biomedical Video Generation
Artificial Intelligence for Biomedical Video Generation
Linyuan Li
Jianing Qiu
Anujit Saha
Lin Li
Poyuan Li
Mengxian He
Ziyu Guo
Wu Yuan
VGen
60
1
0
12 Nov 2024
Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
34
6
0
09 Oct 2024
ControlAR: Controllable Image Generation with Autoregressive Models
ControlAR: Controllable Image Generation with Autoregressive Models
Zongming Li
Tianheng Cheng
Shoufa Chen
Peize Sun
Haocheng Shen
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
DiffM
132
14
0
03 Oct 2024
Label-free Neural Semantic Image Synthesis
Label-free Neural Semantic Image Synthesis
Jiayi Wang
Kevin Laube
Yumeng Li
J. H. Metzen
Shin-I Cheng
Julio Borges
Anna Khoreva
DiffM
31
0
0
01 Jul 2024
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
Jiannan Huang
Jun Hao Liew
Hanshu Yan
Yuyang Yin
Yao Zhao
Yunchao Wei
Yunchao Wei
DiffM
90
6
0
27 May 2024
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Shiqi Yang
Zhi-Wei Zhong
Mengjie Zhao
Shusuke Takahashi
Masato Ishii
Takashi Shibuya
Yuki Mitsufuji
43
2
0
23 May 2024
SonicDiffusion: Audio-Driven Image Generation and Editing with
  Pretrained Diffusion Models
SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models
Burak Can Biner
Farrin Marouf Sofian
Umur Berkay Karakacs
Duygu Ceylan
Erkut Erdem
Aykut Erdem
21
7
0
01 May 2024
Controllable Generation with Text-to-Image Diffusion Models: A Survey
Controllable Generation with Text-to-Image Diffusion Models: A Survey
Pu Cao
Feng Zhou
Qing-Huang Song
Lu Yang
69
35
0
07 Mar 2024
PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation
  in non-English Text-to-Image Generation
PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
Jiancang Ma
Chen Chen
Qingsong Xie
H. Lu
DiffM
VLM
25
3
0
28 Nov 2023
Language-driven Scene Synthesis using Multi-conditional Diffusion Model
Language-driven Scene Synthesis using Multi-conditional Diffusion Model
An Vuong
M. Vu
T. Nguyen
Baoru Huang
Dzung Nguyen
T. Vo
Anh Nguyen
DiffM
39
9
0
24 Oct 2023
Kosmos-G: Generating Images in Context with Multimodal Large Language
  Models
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Xichen Pan
Li Dong
Shaohan Huang
Zhiliang Peng
Wenhu Chen
Furu Wei
VLM
6
62
0
04 Oct 2023
A Survey of Diffusion Based Image Generation Models: Issues and Their
  Solutions
A Survey of Diffusion Based Image Generation Models: Issues and Their Solutions
Tianyi Zhang
Zheng Wang
Jin Huang
M. M. Tasnim
Wei Shi
VLM
11
21
0
25 Aug 2023
Align, Adapt and Inject: Sound-guided Unified Image Generation
Align, Adapt and Inject: Sound-guided Unified Image Generation
Yue Yang
Kaipeng Zhang
Yuying Ge
Wenqi Shao
Zeyue Xue
Yu Qiao
Ping Luo
DiffM
16
5
0
20 Jun 2023
Generating Images with Multimodal Language Models
Generating Images with Multimodal Language Models
Jing Yu Koh
Daniel Fried
Ruslan Salakhutdinov
MLLM
28
241
0
26 May 2023
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal
  Image Generation
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
Marco Bellagente
Manuel Brack
H. Teufel
Felix Friedrich
Bjorn Deiseroth
...
Koen Oostermeijer
Andres Felipe Cruz Salinas
P. Schramowski
Kristian Kersting
Samuel Weinbach
36
15
0
24 May 2023
Diffusion Models in Vision: A Survey
Diffusion Models in Vision: A Survey
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
DiffM
VLM
MedIm
194
1,140
0
10 Sep 2022
On the Principles of Parsimony and Self-Consistency for the Emergence of
  Intelligence
On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence
Y. Ma
Doris Y. Tsao
H. Shum
64
75
0
11 Jul 2022
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
73
72
0
25 May 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
390
4,125
0
28 Jan 2022
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
250
2,603
0
04 May 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,774
0
24 Feb 2021
Domain-Adversarial Training of Neural Networks
Domain-Adversarial Training of Neural Networks
Yaroslav Ganin
E. Ustinova
Hana Ajakan
Pascal Germain
Hugo Larochelle
François Laviolette
M. Marchand
Victor Lempitsky
GAN
OOD
165
9,327
0
28 May 2015
1