Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.02114
Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"
50 / 1,102 papers shown
Title
Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection
Sungjune Park
Hyunjun Kim
Y. Ro
47
11
0
02 Nov 2023
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
Yifan Du
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Jinpeng Wang
Chuyuan Wang
Mingchen Cai
Ruihua Song
Ji-Rong Wen
VLM
MLLM
LRM
78
22
0
02 Nov 2023
On Manipulating Scene Text in the Wild with Diffusion Models
Joshua Santoso
Christian Simon
Williem Pao
DiffM
48
6
0
01 Nov 2023
Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection
Min Jae Jung
S. Han
Joohee Kim
30
13
0
01 Nov 2023
MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval
Youbo Lei
Feifei He
Chen Chen
Yingbin Mo
Sijia Li
Defeng Xie
H. Lu
VLM
59
0
0
30 Oct 2023
Generated Distributions Are All You Need for Membership Inference Attacks Against Generative Models
Minxing Zhang
Ning Yu
Rui Wen
Michael Backes
Yang Zhang
DiffM
18
18
0
30 Oct 2023
Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision
Bobby Azad
Reza Azad
Sania Eskandari
Afshin Bozorgpour
A. Kazerouni
I. Rekik
Dorit Merhof
VLM
MedIm
101
60
0
28 Oct 2023
PERF: Panoramic Neural Radiance Field from a Single Panorama
Guangcong Wang
Peng Wang
Zhaoxi Chen
Wenping Wang
Chen Change Loy
Ziwei Liu
MDE
25
31
0
25 Oct 2023
Machine Learning Approaches for Fine-Grained Symptom Estimation in Schizophrenia: A Comprehensive Review
Niki Maria Foteinopoulou
Ioannis Patras
19
0
0
25 Oct 2023
EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression Recognition
Niki Maria Foteinopoulou
Ioannis Patras
VLM
29
16
0
25 Oct 2023
Open-NeRF: Towards Open Vocabulary NeRF Decomposition
Hao Zhang
Fang Li
Narendra Ahuja
35
12
0
25 Oct 2023
Knowledge Editing for Large Language Models: A Survey
Song Wang
Yaochen Zhu
Haochen Liu
Zaiyi Zheng
Chen Chen
Wenlin Yao
KELM
81
138
0
24 Oct 2023
Online Detection of AI-Generated Images
David C. Epstein
Ishan Jain
Oliver Wang
Richard Y. Zhang
37
53
0
23 Oct 2023
Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models
Shawn Shan
Wenxin Ding
Josephine Passananti
Stanley Wu
Haitao Zheng
Ben Y. Zhao
SILM
DiffM
36
45
0
20 Oct 2023
MarineGPT: Unlocking Secrets of Ocean to the Public
Ziqiang Zheng
Jipeng Zhang
Tuan-Anh Vu
Shizhe Diao
Yue Him Wong Tim
Sai-Kit Yeung
53
12
0
20 Oct 2023
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
Chongyu Fan
Jiancheng Liu
Yihua Zhang
Eric Wong
Dennis Wei
Sijia Liu
MU
32
128
0
19 Oct 2023
Evaluating the Fairness of Discriminative Foundation Models in Computer Vision
Junaid Ali
Matthäus Kleindessner
F. Wenzel
Kailash Budhathoki
V. Cevher
Chris Russell
VLM
70
10
0
18 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
33
5
0
17 Oct 2023
A Survey on Video Diffusion Models
Zhen Xing
Qijun Feng
Haoran Chen
Qi Dai
Hang-Rui Hu
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
EGVM
VGen
62
119
0
16 Oct 2023
TOSS:High-quality Text-guided Novel View Synthesis from a Single Image
Yukai Shi
Jianan Wang
He Cao
Boshi Tang
Xianbiao Qi
Tianyu Yang
Yukun Huang
Shilong Liu
Lei Zhang
H. Shum
DiffM
32
20
0
16 Oct 2023
Prompting Scientific Names for Zero-Shot Species Recognition
Shubham Parashar
Zhiqiu Lin
Yanan Li
Shu Kong
VLM
23
12
0
15 Oct 2023
Does CLIP's Generalization Performance Mainly Stem from High Train-Test Similarity?
Prasanna Mayilvahanan
Thaddäus Wiedemer
E. Rusak
Matthias Bethge
Wieland Brendel
OODD
45
22
0
14 Oct 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
168
448
0
14 Oct 2023
Vision-by-Language for Training-Free Compositional Image Retrieval
Shyamgopal Karthik
Karsten Roth
Massimiliano Mancini
Zeynep Akata
CoGe
30
53
0
13 Oct 2023
Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy
Anton Baryshnikov
Max Ryabinin
VLM
34
2
0
13 Oct 2023
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Xi Chen
Xiao Wang
Lucas Beyer
Alexander Kolesnikov
Jialin Wu
...
Keran Rong
Tianli Yu
Daniel Keysers
Xiao-Qi Zhai
Radu Soricut
MLLM
VLM
41
94
0
13 Oct 2023
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Sreyan Ghosh
Ashish Seth
Sonal Kumar
Utkarsh Tyagi
Chandra Kiran Reddy Evuru
S. Ramaneswaran
S. Sakshi
Oriol Nieto
R. Duraiswami
Dinesh Manocha
AuLLM
VLM
CoGe
48
23
0
12 Oct 2023
Defending Our Privacy With Backdoors
Dominik Hintersdorf
Lukas Struppek
Daniel Neider
Kristian Kersting
SILM
AAML
31
2
0
12 Oct 2023
Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
Sravanti Addepalli
Ashish Ramayee Asokan
Lakshay Sharma
R. V. Babu
VLM
26
15
0
12 Oct 2023
Interpretable Diffusion via Information Decomposition
Xianghao Kong
Ollie Liu
Han Li
Dani Yogatama
Greg Ver Steeg
29
21
0
12 Oct 2023
TabLib: A Dataset of 627M Tables with Context
Gus Eggert
Kevin Huo
Mike Biven
Justin Waugh
LMTD
34
11
0
11 Oct 2023
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai
Haotian Zhang
Bowen Zhang
Wentao Wu
Haoping Bai
...
Zhe Gan
Jiulong Shan
Chen-Nee Chuah
Yinfei Yang
Meng Cao
CLIP
VLM
37
28
0
11 Oct 2023
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Che Liu
Sibo Cheng
Miaojing Shi
Anand Shah
Wenjia Bai
Rossella Arcucci
26
26
0
11 Oct 2023
State of the Art on Diffusion Models for Visual Computing
Ryan Po
Wang Yifan
Vladislav Golyanik
Kfir Aberman
Jonathan T. Barron
...
Matthias Nießner
Bjorn Ommer
Christian Theobalt
Peter Wonka
Gordon Wetzstein
38
103
0
11 Oct 2023
Latent Diffusion Counterfactual Explanations
Karim Farid
Simon Schrodi
Max Argus
Thomas Brox
DiffM
48
13
0
10 Oct 2023
On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets
Ning Liao
Shaofeng Zhang
Renqiu Xia
Min Cao
Yu Qiao
Junchi Yan
MLLM
34
0
0
10 Oct 2023
JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling
Jingyang Zhang
Shiwei Li
Yuanxun Lu
Tian Fang
David McKinnon
Yanghai Tsin
Long Quan
Yao Yao
27
10
0
10 Oct 2023
WinSyn: A High Resolution Testbed for Synthetic Data
Tom Kelly
John C. Femiani
Peter Wonka
31
2
0
09 Oct 2023
Learning Interactive Real-World Simulators
Mengjiao Yang
Yilun Du
Kamyar Ghasemipour
Jonathan Tompson
Leslie Kaelbling
Dale Schuurmans
Pieter Abbeel
LM&Ro
PINN
30
183
0
09 Oct 2023
Implicit Concept Removal of Diffusion Models
Zhili Liu
Kai Chen
Yifan Zhang
Jianhua Han
Lanqing Hong
Hang Xu
Zhenguo Li
Dit-Yan Yeung
James T. Kwok
28
13
0
09 Oct 2023
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
Zuxuan Wu
Zejia Weng
Wujian Peng
Xitong Yang
Ang Li
Larry S. Davis
Yu-Gang Jiang
CLIP
VLM
41
21
0
08 Oct 2023
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Nina Shvetsova
Anna Kukleva
Xudong Hong
Christian Rupprecht
Bernt Schiele
Hilde Kuehne
47
25
0
07 Oct 2023
On the Performance of Multimodal Language Models
Utsav Garg
Erhan Bas
MLLM
27
0
0
04 Oct 2023
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Xichen Pan
Li Dong
Shaohan Huang
Zhiliang Peng
Wenhu Chen
Furu Wei
VLM
8
62
0
04 Oct 2023
Delving into CLIP latent space for Video Anomaly Recognition
Luca Zanella
Benedetta Liberatori
Willi Menapace
Fabio Poiesi
Yiming Wang
Elisa Ricci
31
23
0
04 Oct 2023
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Zejun Li
Ye Wang
Mengfei Du
Qingwen Liu
Binhao Wu
...
Zhihao Fan
Jie Fu
Jingjing Chen
Xuanjing Huang
Zhongyu Wei
35
13
0
04 Oct 2023
Sieve: Multimodal Dataset Pruning Using Image Captioning Models
Anas Mahmoud
Mostafa Elhoushi
Amro Abbas
Yu Yang
Newsha Ardalani
Hugh Leather
Ari S. Morcos
VLM
CLIP
40
20
0
03 Oct 2023
HallE-Control: Controlling Object Hallucination in Large Multimodal Models
Bohan Zhai
Shijia Yang
Chenfeng Xu
Sheng Shen
Kurt Keutzer
Chunyuan Li
Manling Li
MLLM
31
12
0
03 Oct 2023
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei
M. Delbracio
Hossein Talebi
Zhengzhong Tu
Vishal M. Patel
P. Milanfar
VLM
DiffM
64
11
0
02 Oct 2023
Trained Latent Space Navigation to Prevent Lack of Photorealism in Generated Images on Style-based Models
Takumi Harada
Kazuyuki Aihara
Hiroyuki Sakai
32
0
0
02 Oct 2023
Previous
1
2
3
...
12
13
14
...
21
22
23
Next