Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.15296
Cited By
v1
v2
v3 (latest)
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
24 May 2023
Marco Bellagente
Manuel Brack
H. Teufel
Felix Friedrich
Bjorn Deiseroth
C. Eichenberg
Andrew M. Dai
R. Baldock
Souradeep Nanda
Koen Oostermeijer
Andres Felipe Cruz Salinas
P. Schramowski
Kristian Kersting
Samuel Weinbach
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation"
47 / 47 papers shown
Title
The Impact of Positional Encoding on Length Generalization in Transformers
Amirhossein Kazemnejad
Inkit Padhi
Karthikeyan N. Ramamurthy
Payel Das
Siva Reddy
73
203
0
31 May 2023
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation
Can Qin
Ning Yu
Chen Xing
Shu Zhen Zhang
Zeyuan Chen
Stefano Ermon
Yun Fu
Caiming Xiong
Ran Xu
DiffM
99
21
0
17 Mar 2023
Universal Guidance for Diffusion Models
Arpit Bansal
Hong-Min Chu
Avi Schwarzschild
Soumyadip Sengupta
Micah Goldblum
Jonas Geiping
Tom Goldstein
VLM
87
273
0
14 Feb 2023
Adding Conditional Control to Text-to-Image Diffusion Models
Lvmin Zhang
Anyi Rao
Maneesh Agrawala
AI4CE
180
4,168
1
10 Feb 2023
Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness
Felix Friedrich
Manuel Brack
Lukas Struppek
Dominik Hintersdorf
P. Schramowski
Sasha Luccioni
Kristian Kersting
83
126
0
07 Feb 2023
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
Hila Chefer
Yuval Alaluf
Yael Vinker
Lior Wolf
Daniel Cohen-Or
DiffM
111
514
0
31 Jan 2023
AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation
Bjorn Deiseroth
Mayukh Deb
Samuel Weinbach
Manuel Brack
P. Schramowski
Kristian Kersting
41
25
0
19 Jan 2023
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Weixi Feng
Xuehai He
Tsu-Jui Fu
Varun Jampani
Arjun Reddy Akula
P. Narayana
Sugato Basu
Xinze Wang
William Yang Wang
CoGe
118
317
0
09 Dec 2022
Sketch-Guided Text-to-Image Diffusion Models
A. Voynov
Kfir Aberman
Daniel Cohen-Or
DiffM
85
209
0
24 Nov 2022
Retrieval-Augmented Multimodal Language Modeling
Michihiro Yasunaga
Armen Aghajanyan
Weijia Shi
Rich James
J. Leskovec
Percy Liang
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
61
108
0
22 Nov 2022
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
Xingqian Xu
Zhangyang Wang
Eric Zhang
Kai Wang
Humphrey Shi
DiffM
100
198
0
15 Nov 2022
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
Zhongzhi Chen
Guangyi Liu
Bo Zhang
Fulong Ye
Qinghong Yang
Ledell Yu Wu
VLM
81
90
0
12 Nov 2022
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models
P. Schramowski
Manuel Brack
Bjorn Deiseroth
Kristian Kersting
128
308
0
09 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
397
2,392
0
09 Nov 2022
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji
Seungjun Nah
Xun Huang
Arash Vahdat
Jiaming Song
...
Timo Aila
S. Laine
Bryan Catanzaro
Tero Karras
Xuan Li
VLM
MoE
177
828
0
02 Nov 2022
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
200
3,493
0
16 Oct 2022
ILLUME: Rationalizing Vision-Language Models through Human Interactions
Manuel Brack
P. Schramowski
Bjorn Deiseroth
Kristian Kersting
VLM
MLLM
52
3
0
17 Aug 2022
Prompt-to-Prompt Image Editing with Cross Attention Control
Amir Hertz
Ron Mokady
J. Tenenbaum
Kfir Aberman
Yael Pritch
Daniel Cohen-Or
DiffM
203
1,789
0
02 Aug 2022
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Rinon Gal
Yuval Alaluf
Yuval Atzmon
Or Patashnik
Amit H. Bermano
Gal Chechik
Daniel Cohen-Or
164
1,894
0
02 Aug 2022
Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models
Robin Rombach
A. Blattmann
Bjorn Ommer
DiffM
69
71
0
26 Jul 2022
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
196
3,963
0
26 Jul 2022
Blended Latent Diffusion
Omri Avrahami
Ohad Fried
Dani Lischinski
DiffM
117
388
0
06 Jun 2022
Compositional Visual Generation with Composable Diffusion Models
Nan Liu
Shuang Li
Yilun Du
Antonio Torralba
J. Tenenbaum
DiffM
CoGe
174
525
0
03 Jun 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
461
6,067
0
23 May 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
413
6,908
0
13 Apr 2022
SGPT: GPT Sentence Embeddings for Semantic Search
Niklas Muennighoff
RALM
155
187
0
17 Feb 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
474
15,734
0
20 Dec 2021
Few-shot Learning with Multilingual Language Models
Xi Lin
Todor Mihaylov
Mikel Artetxe
Tianlu Wang
Shuohui Chen
...
Luke Zettlemoyer
Zornitsa Kozareva
Mona T. Diab
Ves Stoyanov
Xian Li
BDL
ELM
LRM
117
308
0
20 Dec 2021
More Control for Free! Image Synthesis with Semantic Diffusion Guidance
Xihui Liu
Dong Huk Park
S. Azadi
Gong Zhang
Arman Chopikyan
Yuxiao Hu
Humphrey Shi
Anna Rohrbach
Trevor Darrell
DiffM
95
256
0
10 Dec 2021
MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning
C. Eichenberg
Sid Black
Samuel Weinbach
Letitia Parcalabescu
Anette Frank
MLLM
VLM
58
100
0
09 Dec 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
243
1,441
0
03 Nov 2021
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
Elad Ben-Zaken
Shauli Ravfogel
Yoav Goldberg
171
1,240
0
18 Jun 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
284
2,521
0
20 Apr 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
148
1,582
0
18 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
967
29,810
0
26 Feb 2021
Improved Denoising Diffusion Probabilistic Models
Alex Nichol
Prafulla Dhariwal
DiffM
352
3,715
0
18 Feb 2021
mT5: A massively multilingual pre-trained text-to-text transformer
Linting Xue
Noah Constant
Adam Roberts
Mihir Kale
Rami Al-Rfou
Aditya Siddhant
Aditya Barua
Colin Raffel
142
2,559
0
22 Oct 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
856
42,332
0
28 May 2020
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
228
6,585
0
05 Nov 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.3K
12,301
0
27 Aug 2019
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Kenneth Marino
Mohammad Rastegari
Ali Farhadi
Roozbeh Mottaghi
117
1,090
0
31 May 2019
Parameter-Efficient Transfer Learning for NLP
N. Houlsby
A. Giurgiu
Stanislaw Jastrzebski
Bruna Morrone
Quentin de Laroussilhe
Andrea Gesmundo
Mona Attariyan
Sylvain Gelly
221
4,514
0
02 Feb 2019
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
524
4,494
0
18 Apr 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
347
3,270
0
02 Dec 2016
A large annotated corpus for learning natural language inference
Samuel R. Bowman
Gabor Angeli
Christopher Potts
Christopher D. Manning
324
4,293
0
21 Aug 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
218
2,493
0
01 Apr 2015
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
424
43,814
0
01 May 2014
1