Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.02114
Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"
50 / 1,097 papers shown
Title
Attribute-Centric Compositional Text-to-Image Generation
Yuren Cong
Martin Renqiang Min
Erran L. Li
Bodo Rosenhahn
M. Yang
68
11
0
04 Jan 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
521
0
02 Jan 2023
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
Jiale Xu
Xintao Wang
Weihao Cheng
Yan-Pei Cao
Ying Shan
Xiaohu Qie
Shenghua Gao
188
161
0
28 Dec 2022
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
Woohyun Kang
Jonghwan Mun
Sungjun Lee
Byungseok Roh
VLM
11
18
0
27 Dec 2022
Do DALL-E and Flamingo Understand Each Other?
Hang Li
Jindong Gu
Rajat Koner
Sahand Sharifzadeh
Volker Tresp
MLLM
21
12
0
23 Dec 2022
Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias
Robert Wolfe
Yiwei Yang
Billy Howe
Aylin Caliskan
DiffM
13
51
0
21 Dec 2022
Character-Aware Models Improve Visual Text Rendering
Rosanne Liu
Daniel H Garrette
Chitwan Saharia
William Chan
Adam Roberts
Sharan Narang
Irina Blok
R. Mical
Mohammad Norouzi
Noah Constant
VLM
31
71
0
20 Dec 2022
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
Samuel Cahyawijaya
Holy Lovenia
Alham Fikri Aji
Genta Indra Winata
Bryan Wilie
...
Timothy Baldwin
Sebastian Ruder
Herry Sujaini
S. Sakti
Ayu Purwarianti
39
48
0
19 Dec 2022
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
Qiucheng Wu
Yujian Liu
Handong Zhao
Ajinkya Kale
T. Bui
Tong Yu
Zhe-nan Lin
Yang Zhang
Shiyu Chang
DiffM
CoGe
30
97
0
16 Dec 2022
Objaverse: A Universe of Annotated 3D Objects
Matt Deitke
Dustin Schwenk
Jordi Salvador
Luca Weihs
Oscar Michel
Eli VanderBilt
Ludwig Schmidt
Kiana Ehsani
Aniruddha Kembhavi
Ali Farhadi
29
884
0
15 Dec 2022
CLIPPO: Image-and-Language Understanding from Pixels Only
Michael Tschannen
Basil Mustafa
N. Houlsby
CLIP
VLM
32
47
0
15 Dec 2022
The Infinite Index: Information Retrieval on Generative Text-To-Image Models
Niklas Deckers
Maik Frobe
Johannes Kiesel
G. Pandolfo
Christopher Schröder
Benno Stein
Martin Potthast
DiffM
42
16
0
14 Dec 2022
Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti
Romain Beaumont
Ross Wightman
Mitchell Wortsman
Gabriel Ilharco
Cade Gordon
Christoph Schuhmann
Ludwig Schmidt
J. Jitsev
VLM
CLIP
59
739
0
14 Dec 2022
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
29
125
0
13 Dec 2022
Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion
Tengfei Wang
Bo Zhang
Ting Zhang
Shuyang Gu
Jianmin Bao
...
Jingjing Shen
Dong Chen
Fang Wen
Qifeng Chen
B. Guo
35
279
0
12 Dec 2022
Measuring Data
Margaret Mitchell
A. Luccioni
Nathan Lambert
Marissa Gerchick
Angelina McMillan-Major
Ezinwanne Ozoani
Nazneen Rajani
Tristan Thrush
Yacine Jernite
Douwe Kiela
27
16
0
09 Dec 2022
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
Zhiheng Li
Ivan Evtimov
Albert Gordo
C. Hazirbas
Tal Hassner
Cristian Canton Ferrer
Chenliang Xu
Mark Ibrahim
34
71
0
09 Dec 2022
SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model
Shaoan Xie
Zhifei Zhang
Zhe-nan Lin
Tobias Hinz
Kun Zhang
DiffM
21
231
0
09 Dec 2022
Spurious Features Everywhere -- Large-Scale Detection of Harmful Spurious Features in ImageNet
Yannic Neuhaus
Maximilian Augustin
Valentyn Boreiko
Matthias Hein
AAML
36
30
0
09 Dec 2022
Multi-Concept Customization of Text-to-Image Diffusion
Nupur Kumari
Bin Zhang
Richard Y. Zhang
Eli Shechtman
Jun-Yan Zhu
61
823
0
08 Dec 2022
Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval
Mustafa Shukor
Nicolas Thome
Matthieu Cord
CLIP
CoGe
34
8
0
08 Dec 2022
SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhixing Zhang
Ligong Han
Arna Ghosh
Dimitris N. Metaxas
Jian Ren
DiffM
51
154
0
08 Dec 2022
NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors
Congyue Deng
C. Jiang
C. Qi
Xinchen Yan
Yin Zhou
Leonidas J. Guibas
Drago Anguelov
DiffM
29
161
0
06 Dec 2022
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
57
309
0
06 Dec 2022
M-VADER: A Model for Diffusion with Multimodal Context
Samuel Weinbach
Marco Bellagente
C. Eichenberg
Andrew M. Dai
R. Baldock
Souradeep Nanda
Bjorn Deiseroth
Koen Oostermeijer
H. Teufel
Andres Felipe Cruz Salinas
DiffM
35
11
0
06 Dec 2022
Controllable Image Captioning via Prompting
Ning Wang
Jiahao Xie
Jihao Wu
Mingbo Jia
Linlin Li
22
23
0
04 Dec 2022
ObjectStitch: Generative Object Compositing
Yi-Zhe Song
Zhifei Zhang
Zhe-nan Lin
Scott D. Cohen
Brian L. Price
Jianming Zhang
Seunggeun Kim
Daniel G. Aliaga
DiffM
20
31
0
02 Dec 2022
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
Junbum Cha
Jonghwan Mun
Byungseok Roh
VLM
26
87
0
01 Dec 2022
Finetune like you pretrain: Improved finetuning of zero-shot vision models
Sachin Goyal
Ananya Kumar
Sankalp Garg
Zico Kolter
Aditi Raghunathan
CLIP
VLM
50
136
0
01 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
24
11
0
29 Nov 2022
The Myth of Culturally Agnostic AI Models
E. Cetinic
DiffM
VLM
17
10
0
28 Nov 2022
Diffusion Probabilistic Model Made Slim
Xingyi Yang
Daquan Zhou
Jiashi Feng
Xinchao Wang
DiffM
27
103
0
27 Nov 2022
SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation
Huaishao Luo
Junwei Bao
Youzheng Wu
Xiaodong He
Tianrui Li
VLM
32
144
0
27 Nov 2022
Expanding Small-Scale Datasets with Guided Imagination
Yifan Zhang
Daquan Zhou
Bryan Hooi
Kaixin Wang
Jiashi Feng
44
45
0
25 Nov 2022
Shifted Diffusion for Text-to-image Generation
Yufan Zhou
Bingchen Liu
Yizhe Zhu
Xiao Yang
Changyou Chen
Jinhui Xu
DiffM
24
40
0
24 Nov 2022
Open-vocabulary Attribute Detection
M. A. Bravo
Sudhanshu Mittal
Simon Ging
Thomas Brox
VLM
ObjD
19
30
0
23 Nov 2022
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Tsu-jui Fu
Licheng Yu
Ning Zhang
Cheng-Yang Fu
Jong-Chyi Su
William Yang Wang
Sean Bell
VGen
56
37
0
23 Nov 2022
Retrieval-Augmented Multimodal Language Modeling
Michihiro Yasunaga
Armen Aghajanyan
Weijia Shi
Rich James
J. Leskovec
Percy Liang
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
16
95
0
22 Nov 2022
On the Transferability of Visual Features in Generalized Zero-Shot Learning
Paola Cascante-Bonilla
Leonid Karlinsky
James Smith
Yanjun Qi
Vicente Ordonez
33
2
0
22 Nov 2022
Teaching Structured Vision&Language Concepts to Vision&Language Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Yikang Shen
Roei Herzig
...
Donghyun Kim
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
53
70
0
21 Nov 2022
Video Background Music Generation: Dataset, Method and Evaluation
Le Zhuo
Zhaokai Wang
Baisen Wang
Yue Liao
Chenxi Bao
Stanley Peng
Miao Lu
Xiaobo Li
Fei Fang
Si Liu
VGen
23
27
0
21 Nov 2022
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task
Shangda Wu
Maosong Sun
17
20
0
21 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
42
41
0
17 Nov 2022
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning
James Smith
Paola Cascante-Bonilla
Assaf Arbelle
Donghyun Kim
Yikang Shen
David D. Cox
Diyi Yang
Z. Kira
Rogerio Feris
Leonid Karlinsky
VLM
44
20
0
17 Nov 2022
A Creative Industry Image Generation Dataset Based on Captions
Xiang Yuejia
Lv Chuanhao
Liu Qingdazhu
Yang Xiaocui
Liu Bo
Ju Meizhi
3DV
17
2
0
16 Nov 2022
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
Xingqian Xu
Zhangyang Wang
Eric Zhang
Kai Wang
Humphrey Shi
DiffM
35
183
0
15 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
87
675
0
14 Nov 2022
Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding
Zijiao Chen
Jiaxin Qing
Tiange Xiang
Wan Lin Yue
J. Zhou
DiffM
MedIm
27
146
0
13 Nov 2022
Large-Scale Bidirectional Training for Zero-Shot Image Captioning
Taehoon Kim
Mark A Marsden
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Alessandra Sala
S. Kim
VLM
29
4
0
13 Nov 2022
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
Zhongzhi Chen
Guangyi Liu
Bo-Wen Zhang
Fulong Ye
Qinghong Yang
Ledell Yu Wu
VLM
37
79
0
12 Nov 2022
Previous
1
2
3
...
19
20
21
22
Next