Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.02114
Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"
50 / 1,102 papers shown
Title
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes
Zehan Wang
Haifeng Huang
Yang Zhao
Ziang Zhang
Zhou Zhao
19
62
0
17 Aug 2023
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
Kaicheng Yang
Jiankang Deng
Xiang An
Jiawei Li
Ziyong Feng
Jia Guo
Jing Yang
Tongliang Liu
VLM
CLIP
48
46
0
16 Aug 2023
Gradient-Based Post-Training Quantization: Challenging the Status Quo
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
MQ
47
0
0
15 Aug 2023
UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity
Weijian Mai
Zhijun Zhang
DiffM
24
32
0
14 Aug 2023
CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation
Hongguang Zhu
Yunchao Wei
Xiaodan Liang
Chunjie Zhang
Yao-Min Zhao
VLM
32
28
0
14 Aug 2023
ModelScope Text-to-Video Technical Report
Jiuniu Wang
Hangjie Yuan
Dayou Chen
Yingya Zhang
Xiang Wang
Shiwei Zhang
VGen
DiffM
38
392
0
12 Aug 2023
Distributionally Robust Classification on a Data Budget
Ben Feuer
Ameya Joshi
Minh Pham
Chinmay Hegde
OOD
39
2
0
07 Aug 2023
SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs
Sheng Li
Nima Tajbakhsh
MLLM
21
48
0
07 Aug 2023
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models
Zheng Ma
Mianzhi Pan
Wenhan Wu
Ka Leong Cheng
Jianbing Zhang
Shujian Huang
Jiajun Chen
VLM
CoGe
31
3
0
06 Aug 2023
Improving Generalization of Image Captioning with Unsupervised Prompt Learning
Hongchen Wei
Zhenzhong Chen
VLM
45
3
0
05 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
60
622
0
04 Aug 2023
A Parameter-efficient Multi-subject Model for Predicting fMRI Activity
Connor Lane
Gregory Kiar
22
2
0
04 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRM
MLLM
53
85
0
03 Aug 2023
PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts
Bang An
Sicheng Zhu
Michael-Andrei Panaitescu-Liess
Chaithanya Kumar Mummadi
Furong Huang
VLM
38
7
0
02 Aug 2023
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMe
MLLM
61
43
0
30 Jul 2023
VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation
Zekun Qi
Muzhou Yu
Runpei Dong
Kaisheng Ma
3DPC
26
11
0
28 Jul 2023
RSGPT: A Remote Sensing Vision Language Model and Benchmark
Yuan Hu
Jianlong Yuan
Congcong Wen
Xiaonan Lu
Xiang Li
VLM
31
104
0
28 Jul 2023
A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot
Milad Abdollahzadeh
Touba Malekzadeh
Christopher T. H. Teo
Keshigeyan Chandrasegaran
Guimeng Liu
Ngai-man Cheung
VLM
MedIm
54
21
0
26 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
43
119
0
25 Jul 2023
The Visual Language of Fabrics
Valentin Deschaintre
Julia Guerrero-Viu
Diego F. F. Gutierrez
T. Boubekeur
B. Masiá
3DV
40
9
0
25 Jul 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
37
27
0
24 Jul 2023
On the Connection between Pre-training Data Diversity and Fine-tuning Robustness
Vivek Ramanujan
Thao Nguyen
Sewoong Oh
Ludwig Schmidt
Ali Farhadi
OOD
14
22
0
24 Jul 2023
ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting
Zongsheng Yue
Jianyi Wang
Chen Change Loy
DiffM
47
213
0
23 Jul 2023
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
Jiancang Ma
Junhao Liang
Chen Chen
H. Lu
31
138
0
21 Jul 2023
Tuning Pre-trained Model via Moment Probing
Mingze Gao
Qilong Wang
Zhenyi Lin
Pengfei Zhu
Qinghua Hu
Jingbo Zhou
27
7
0
21 Jul 2023
General Image-to-Image Translation with One-Shot Image Guidance
Bin Cheng
Zuhao Liu
Yunbo Peng
Yue-Hsun Lin
ViT
DiffM
29
37
0
20 Jul 2023
Identifying Interpretable Subspaces in Image Representations
Neha Kalibhat
S. Bhardwaj
Bayan Bruss
Hamed Firooz
Maziar Sanjabi
S. Feizi
FAtt
47
26
0
20 Jul 2023
Generative Prompt Model for Weakly Supervised Object Localization
Yuzhong Zhao
QiXiang Ye
Weijia Wu
Chunhua Shen
Fang Wan
WSOL
VLM
DiffM
37
28
0
19 Jul 2023
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
Chaoyang Zhu
Long Chen
ObjD
VLM
36
33
0
18 Jul 2023
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Yang Zhao
Zhijie Lin
Daquan Zhou
Zilong Huang
Jiashi Feng
Bingyi Kang
MLLM
44
108
0
17 Jul 2023
Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models
Moab Arar
Rinon Gal
Yuval Atzmon
Gal Chechik
Daniel Cohen-Or
Ariel Shamir
Amit H. Bermano
DiffM
51
76
0
13 Jul 2023
MMBench: Is Your Multi-modal Model an All-around Player?
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Bo Li
Songyang Zhang
...
Jiaqi Wang
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
29
934
0
12 Jul 2023
Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback
Jaskirat Singh
Liang Zheng
42
18
0
10 Jul 2023
A Demand-Driven Perspective on Generative Audio AI
Sangshin Oh
Minsung Kang
Hyeongi Moon
Keunwoo Choi
Ben Sangbae Chon
33
3
0
10 Jul 2023
Measuring the Success of Diffusion Models at Imitating Human Artists
Stephen Casper
Z. Guo
Shreya Mogulothu
Zachary Marinov
C. Deshpande
Rui-Jie Yew
Zheng Dai
Dylan Hadfield-Menell
19
10
0
08 Jul 2023
Fooling Contrastive Language-Image Pre-trained Models with CLIPMasterPrints
Matthias Anton Freiberger
Peter Kun
Christian Igel
A. Løvlie
S. Risi
VLM
AAML
58
2
0
07 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
92
225
0
07 Jul 2023
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
31
5
0
06 Jul 2023
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Xuanlin Li
Yunhao Fang
Minghua Liu
Z. Ling
Zhuowen Tu
Haoran Su
VLM
31
25
0
06 Jul 2023
T-MARS: Improving Visual Representations by Circumventing Text Feature Learning
Pratyush Maini
Sachin Goyal
Zachary Chase Lipton
J. Zico Kolter
Aditi Raghunathan
VLM
45
33
0
06 Jul 2023
Several categories of Large Language Models (LLMs): A Short Survey
Saurabh Pahune
Manoj Chandrasekharan
AILaw
25
14
0
05 Jul 2023
Collaborative Score Distillation for Consistent Visual Synthesis
Subin Kim
Kyungmin Lee
June Suk Choi
Jongheon Jeong
Kihyuk Sohn
Jinwoo Shin
DiffM
37
21
0
04 Jul 2023
JourneyDB: A Benchmark for Generative Image Understanding
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Hongsheng Li
62
104
0
03 Jul 2023
Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models
Litu Rout
Negin Raoof
Giannis Daras
Constantine Caramanis
A. Dimakis
Sanjay Shakkottai
DiffM
49
93
0
02 Jul 2023
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models
Uddeshya Upadhyay
Shyamgopal Karthik
Massimiliano Mancini
Zeynep Akata
MLLM
VLM
28
4
0
01 Jul 2023
DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation
Zhuowei Chen
Shancheng Fang
Wei Liu
Qian He
Mengqi Huang
Yongdong Zhang
Zhendong Mao
DiffM
31
24
0
01 Jul 2023
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
Guocheng Qian
Jinjie Mai
Abdullah Hamdi
Jian Ren
Aliaksandr Siarohin
...
Hsin-Ying Lee
Ivan Skorokhodov
Peter Wonka
Sergey Tulyakov
Guohao Li
DiffM
41
355
0
30 Jun 2023
DisCo: Disentangled Control for Realistic Human Dance Generation
Tan Wang
Linjie Li
Kevin Qinghong Lin
Yuanhao Zhai
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
VGen
37
75
0
30 Jun 2023
Generate Anything Anywhere in Any Scene
Yuheng Li
Haotian Liu
Yangming Wen
Yong Jae Lee
DiffM
67
12
0
29 Jun 2023
Previous
1
2
3
...
14
15
16
...
21
22
23
Next