Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.02114
Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"
50 / 1,102 papers shown
Title
CLIPAG: Towards Generator-Free Text-to-Image Generation
Roy Ganz
Michael Elad
VLM
41
7
0
29 Jun 2023
Prompt Ensemble Self-training for Open-Vocabulary Domain Adaptation
Jiaxing Huang
Jingyi Zhang
Han Qiu
Sheng Jin
Shijian Lu
VPVLM
VLM
29
0
0
29 Jun 2023
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
Zhenlin Xu
Yi Zhu
Tiffany Deng
Abhay Mittal
Yanbei Chen
Manchen Wang
Paolo Favaro
Joseph Tighe
Davide Modolo
VLM
CoGe
36
8
0
28 Jun 2023
What Makes ImageNet Look Unlike LAION
Ali Shirali
Moritz Hardt
19
9
0
27 Jun 2023
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \
10,000 Budget; An Extra \
4,000 Unlocks 81.8% Accuracy
Xianhang Li
Zeyu Wang
Cihang Xie
CLIP
VLM
56
19
0
27 Jun 2023
Are aligned neural networks adversarially aligned?
Nicholas Carlini
Milad Nasr
Christopher A. Choquette-Choo
Matthew Jagielski
Irena Gao
...
Pang Wei Koh
Daphne Ippolito
Katherine Lee
Florian Tramèr
Ludwig Schmidt
AAML
32
225
0
26 Jun 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
62
562
0
23 Jun 2023
DISCO-10M: A Large-Scale Music Dataset
Luca A. Lanzendörfer
Florian Grötschla
Emil Funke
Roger Wattenhofer
30
12
0
23 Jun 2023
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter
Binjie Zhang
Yixiao Ge
Xuyuan Xu
Ying Shan
Mike Zheng Shou
52
8
0
22 Jun 2023
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Hugo Laurenccon
Lucile Saulnier
Léo Tronchon
Stas Bekman
Amanpreet Singh
...
Siddharth Karamcheti
Alexander M. Rush
Douwe Kiela
Matthieu Cord
Victor Sanh
25
231
0
21 Jun 2023
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zilun Zhang
Tiancheng Zhao
Yulong Guo
Jianwei Yin
DiffM
VLM
32
56
0
20 Jun 2023
A Universal Semantic-Geometric Representation for Robotic Manipulation
Tong Zhang
Yingdong Hu
Hanchen Cui
Hang Zhao
Yang Gao
90
18
0
18 Jun 2023
Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks
Hongcheng Gao
Hao Zhang
Yinpeng Dong
Zhijie Deng
AAML
54
21
0
16 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng Xu
Wenqi Shao
Kaipeng Zhang
Peng Gao
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
41
159
0
15 Jun 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Sihan Chen
Xingjian He
Handong Li
Xiaojie Jin
Jiashi Feng
Jiaheng Liu
VLM
CLIP
32
8
0
15 Jun 2023
2nd Place Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly Detection
Yunkang Cao
Xiaohao Xu
Chen Sun
Y. Cheng
Liang Gao
Weiming Shen
47
1
0
15 Jun 2023
ViP: A Differentially Private Foundation Model for Computer Vision
Yaodong Yu
Maziar Sanjabi
Yi Ma
Kamalika Chaudhuri
Chuan Guo
29
12
0
15 Jun 2023
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
Le Zhang
Rabiul Awal
Aishwarya Agrawal
CoGe
VLM
41
10
0
15 Jun 2023
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations
Gregor Geigle
Radu Timofte
Goran Glavaš
VLM
MLLM
41
5
0
14 Jun 2023
Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis
Zhiyu Jin
Xuli Shen
Bin Li
Xiangyang Xue
39
36
0
14 Jun 2023
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
Seoyeon Kim
Minguk Kang
Dongwon Kim
Jaesik Park
Suha Kwak
VLM
35
10
0
14 Jun 2023
MOFI: Learning Image Representations from Noisy Entity Annotated Images
Wentao Wu
Aleksei Timofeev
Chen Chen
Bowen Zhang
Kun Duan
...
Yantao Zheng
Jonathon Shlens
Xianzhi Du
Zhe Gan
Yinfei Yang
VLM
26
7
0
13 Jun 2023
Image Captioners Are Scalable Vision Learners Too
Michael Tschannen
Manoj Kumar
Andreas Steiner
Xiaohua Zhai
N. Houlsby
Lucas Beyer
VLM
CLIP
32
54
0
13 Jun 2023
Scalable 3D Captioning with Pretrained Models
Tiange Luo
C. Rockwell
Honglak Lee
Justin Johnson
32
153
0
12 Jun 2023
Retrieval-Enhanced Contrastive Vision-Text Models
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIP
VLM
31
26
0
12 Jun 2023
VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models
Sheng-Yen Chou
Pin-Yu Chen
Tsung-Yi Ho
DiffM
15
53
0
12 Jun 2023
Sticker820K: Empowering Interactive Retrieval with Stickers
Sijie Zhao
Yixiao Ge
Zhongang Qi
Lin Song
Xiaohan Ding
Zehua Xie
Ying Shan
34
6
0
12 Jun 2023
Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Yida Chen
Fernanda Viégas
Martin Wattenberg
DiffM
14
22
0
09 Jun 2023
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Bo Li
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Jingkang Yang
C. Li
Ziwei Liu
MLLM
VLM
45
224
0
08 Jun 2023
Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet
Gonzalo Martínez
Lauren Watson
Pedro Reviriego
José Alberto Hernández
Marc Juárez
Rik Sarkar
22
53
0
08 Jun 2023
Improving neural network representations using human similarity judgments
Lukas Muttenthaler
Lorenz Linhardt
Jonas Dippel
Robert A. Vandermeulen
Katherine L. Hermann
Andrew Kyle Lampinen
Simon Kornblith
53
31
0
07 Jun 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
35
136
0
07 Jun 2023
M
3
^3
3
IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
...
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
MLLM
VLM
27
115
0
07 Jun 2023
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu
Jiacheng Zhu
William Jongwon Han
Aditesh Kumar
Karthik Mittal
...
Linjie Li
Jianfeng Wang
Ding Zhao
Bo Li
Lijuan Wang
VGen
26
5
0
07 Jun 2023
HeadSculpt: Crafting 3D Head Avatars with Text
Xiaoping Han
Yukang Cao
Kai Han
Xiatian Zhu
Jiankang Deng
Yi-Zhe Song
Tao Xiang
Kwan-Yee K. Wong
DiffM
27
46
0
05 Jun 2023
User-friendly Image Editing with Minimal Text Input: Leveraging Captioning and Injection Techniques
Sunwoo Kim
Wooseok Jang
Hyunsung Kim
Junho Kim
Yunjey Choi
Seung Wook Kim
Gayeong Lee
DiffM
47
6
0
05 Jun 2023
VideoComposer: Compositional Video Synthesis with Motion Controllability
Xiang Wang
Hangjie Yuan
Shiwei Zhang
Dayou Chen
Jiuniu Wang
Yingya Zhang
Yujun Shen
Deli Zhao
Jingren Zhou
VGen
DiffM
38
319
0
03 Jun 2023
Revisiting the Role of Language Priors in Vision-Language Models
Zhiqiu Lin
Xinyue Chen
Deepak Pathak
Pengchuan Zhang
Deva Ramanan
VLM
36
22
0
02 Jun 2023
Multilingual Conceptual Coverage in Text-to-Image Models
Michael Stephen Saxon
William Yang Wang
EGVM
49
8
0
02 Jun 2023
CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities
A. Agrawal
Raghav Arora
Ahana Datta
Snehasis Banerjee
Brojeshwar Bhowmick
Krishna Murthy Jatavallabhula
Mohan Sridharan
Madhava Krishna
30
2
0
02 Jun 2023
DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection
Hossein Aboutalebi
Daniel Mao
Rongqi Fan
Carol Xu
Chris He
Alexander Wong
AAML
28
8
0
02 Jun 2023
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
Yonglong Tian
Lijie Fan
Phillip Isola
Huiwen Chang
Dilip Krishnan
VLM
DiffM
45
142
0
01 Jun 2023
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
Jinbo Xing
Menghan Xia
Yuxin Liu
Yuechen Zhang
Yong Zhang
...
Haoxin Chen
Xiaodong Cun
Xintao Wang
Ying Shan
T. Wong
VGen
DiffM
47
63
0
01 Jun 2023
Vocabulary-free Image Classification
Alessandro Conti
Enrico Fini
Massimiliano Mancini
Paolo Rota
Yiming Wang
Elisa Ricci
VLM
44
23
0
01 Jun 2023
Learning Disentangled Prompts for Compositional Image Synthesis
Kihyuk Sohn
Albert Eaton Shaw
Yuan Hao
Han Zhang
Luisa F. Polanía
Huiwen Chang
Lu Jiang
Irfan Essa
VLM
24
6
0
01 Jun 2023
Improving CLIP Training with Language Rewrites
Lijie Fan
Dilip Krishnan
Phillip Isola
Dina Katabi
Yonglong Tian
BDL
VLM
CLIP
38
157
0
31 May 2023
Too Large; Data Reduction for Vision-Language Pre-Training
Alex Jinpeng Wang
Kevin Qinghong Lin
David Junhao Zhang
Stan Weixian Lei
Mike Zheng Shou
VLM
33
24
0
31 May 2023
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Roei Herzig
Donghyun Kim
...
Yikang Shen
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
62
53
0
31 May 2023
Multi-modal Queried Object Detection in the Wild
Yifan Xu
Mengdan Zhang
Chaoyou Fu
Peixian Chen
Xiaoshan Yang
Ke Li
Changsheng Xu
ObjD
VLM
40
30
0
30 May 2023
Scalable Performance Analysis for Vision-Language Models
Santiago Castro
Oana Ignat
Rada Mihalcea
VLM
40
1
0
30 May 2023
Previous
1
2
3
...
15
16
17
...
21
22
23
Next