Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.02114
Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"
50 / 1,097 papers shown
Title
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang
Jingyi Zhang
LM&MA
ELM
LRM
50
20
0
28 Aug 2024
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
Junyao Ge
Yang Zheng
Kaitai Guo
Jimin Liang
Jimin Liang
41
1
0
27 Aug 2024
Social perception of faces in a vision-language model
C. I. Hausladen
Manuel Knott
Colin F. Camerer
Pietro Perona
CVBM
VLM
45
2
0
26 Aug 2024
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models
Qihang Ge
Wei Sun
Yu Zhang
Yunhao Li
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Xiongkuo Min
Guangtao Zhai
54
4
0
26 Aug 2024
Evaluating Attribute Comprehension in Large Vision-Language Models
Haiwen Zhang
Zixi Yang
Yuanzhi Liu
Xinran Wang
Zheqi He
Kongming Liang
Zhanyu Ma
ELM
37
0
0
25 Aug 2024
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
Tao Wu
Yong Zhang
Xintao Wang
Xianpan Zhou
Guangcong Zheng
Zhongang Qi
Ying Shan
Xi Li
VGen
DiffM
24
26
0
23 Aug 2024
ParGo: Bridging Vision-Language with Partial and Global Views
An-Lan Wang
Bin Shan
Wei Shi
Kun-Yu Lin
Xiang Fei
Guozhi Tang
Lei Liao
Jingqun Tang
Can Huang
Wei-Shi Zheng
MLLM
VLM
90
14
0
23 Aug 2024
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
42
61
0
22 Aug 2024
Diffusion-Based Visual Art Creation: A Survey and New Perspectives
Bingyuan Wang
Qifeng Chen
Zeyu Wang
54
7
0
22 Aug 2024
CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Kaicheng Yang
Tiancheng Gu
Xiang An
Haiqiang Jiang
Xiangzi Dai
Ziyong Feng
Weidong Cai
Jiankang Deng
VLM
54
7
0
18 Aug 2024
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
72
6
0
13 Aug 2024
ARPA: A Novel Hybrid Model for Advancing Visual Word Disambiguation Using Large Language Models and Transformers
Aristi Papastavrou
Maria Lymperaiou
Giorgos Stamou
AI4CE
40
1
0
12 Aug 2024
UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization
Junjie He
Yifeng Geng
Liefeng Bo
DiffM
54
20
0
12 Aug 2024
PRECISe : Prototype-Reservation for Explainable Classification under Imbalanced and Scarce-Data Settings
Vaibhav Ganatra
Drishti Goel
42
0
0
11 Aug 2024
Civiverse: A Dataset for Analyzing User Engagement with Open-Source Text-to-Image Models
Maria-Teresa De Rosa Palmini
Laura Wagner
Eva Cetinic
41
2
0
10 Aug 2024
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Dongsheng Wang
Jiequan Cui
Miaoge Li
Wang Lin
Bo Chen
Hanwang Zhang
MLLM
34
3
0
09 Aug 2024
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Zilyu Ye
Yu Lei
Ruotian Peng
Jinjin Cao
Zhiyang Chen
...
Mingyuan Zhou
Xiaoqian Shen
Mohamed Elhoseiny
Nan Zhuang
Guo-Jun Qi
VGen
VLM
40
1
0
07 Aug 2024
Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey
V. T. Truong
Luan Ba Dang
Long Bao Le
DiffM
MedIm
56
16
0
06 Aug 2024
Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI
Robert Wolfe
Aayushi Dangol
Alexis Hiniker
Bill Howe
36
2
0
04 Aug 2024
Are Bigger Encoders Always Better in Vision Large Models?
Bozhou Li
Hao Liang
Zimo Meng
Wentao Zhang
VLM
40
3
0
01 Aug 2024
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models
Ali Abdollahi
Mahdi Ghaznavi
Mohammad Reza Karimi Nejad
Arash Mari Oriyad
Reza Abbasi
Ali Salesi
Melika Behjati
M. Rohban
M. Baghshah
CoGe
34
1
0
30 Jul 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Xiaowei Chi
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
75
7
0
30 Jul 2024
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
Lorenzo Baraldi
Federico Cocchi
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
38
8
0
29 Jul 2024
Multi-label Cluster Discrimination for Visual Representation Learning
Xiang An
Kaicheng Yang
Xiangzi Dai
Ziyong Feng
Jiankang Deng
VLM
45
6
0
24 Jul 2024
DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors
Zizheng Yan
Jiapeng Zhou
Fanpeng Meng
Yushuang Wu
Lingteng Qiu
Zisheng Ye
Shuguang Cui
Guanying Chen
Xiaoguang Han
DiffM
34
4
0
23 Jul 2024
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
Yiwei Ma
Zhibin Wang
Xiaoshuai Sun
Weihuang Lin
Qiang-feng Zhou
Jiayi Ji
Rongrong Ji
MLLM
VLM
57
1
0
23 Jul 2024
AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection
Yunkang Cao
Jiangning Zhang
Luca Frittoli
Yuqi Cheng
Nong Sang
Giacomo Boracchi
VLM
56
29
0
22 Jul 2024
Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models
Xiao Liu
Xiaoliu Guan
Yu Wu
Jiaxu Miao
42
7
0
22 Jul 2024
Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
Yanting Yang
Minghao Chen
Qibo Qiu
Jiahao Wu
Wenxiao Wang
Binbin Lin
Ziyu Guan
Xiaofei He
LM&Ro
45
2
0
20 Jul 2024
LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
Mingkang Zhu
Xi Chen
Zhongdao Wang
Hengshuang Zhao
Jiaya Jia
DiffM
42
2
0
18 Jul 2024
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Kirolos Ataallah
Xiaoqian Shen
Eslam Abdelrahman
Essam Sleiman
Mingchen Zhuge
Jian Ding
Deyao Zhu
Jürgen Schmidhuber
Mohamed Elhoseiny
VLM
30
17
0
17 Jul 2024
Mask-Free Neuron Concept Annotation for Interpreting Neural Networks in Medical Domain
Hyeon Bae Kim
Yong Hyun Ahn
Seong Tae Kim
47
1
0
16 Jul 2024
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Yuanhao Zhai
K. Lin
Linjie Li
Chung-Ching Lin
Jianfeng Wang
Zhengyuan Yang
David Doermann
Junsong Yuan
Zicheng Liu
Lijuan Wang
DiffM
VGen
29
3
0
15 Jul 2024
Open Vocabulary Multi-Label Video Classification
Rohit Gupta
Mamshad Nayeem Rizve
Jayakrishnan Unnikrishnan
Ashish Tawari
Son Tran
Mubarak Shah
Benjamin Z. Yao
Trishul Chilimbi
VLM
67
1
0
12 Jul 2024
LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models
Yabin Zhang
Wenjie Zhu
Chenhang He
Lei Zhang
VLM
53
7
0
12 Jul 2024
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
Wenshuo Peng
Kaipeng Zhang
Yue Yang
Hao Zhang
Ping Luo
VLM
29
2
0
11 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
60
5
0
11 Jul 2024
15M Multimodal Facial Image-Text Dataset
Dawei Dai
Yutang Li
Yingge Liu
Mingming Jia
Zhang YuanHui
Guoyin Wang
VLM
31
7
0
11 Jul 2024
LEMoN: Label Error Detection using Multimodal Neighbors
Haoran Zhang
Aparna Balagopalan
Nassim Oufattole
Hyewon Jeong
Yan Wu
Jiacheng Zhu
Marzyeh Ghassemi
48
0
0
10 Jul 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Feng Li
Renrui Zhang
Hao Zhang
Yuanhan Zhang
Bo Li
Wei Li
Zejun Ma
Chunyuan Li
MLLM
VLM
52
196
0
10 Jul 2024
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Wanggui He
Siming Fu
Mushui Liu
Xierui Wang
Wenyi Xiao
...
Zhelun Yu
Haoyuan Li
Ziwei Huang
Leilei Gan
Hao Jiang
DiffM
24
23
0
10 Jul 2024
A Single Transformer for Scalable Vision-Language Modeling
Yangyi Chen
Xingyao Wang
Hao Peng
Heng Ji
LRM
42
15
0
08 Jul 2024
MagMax: Leveraging Model Merging for Seamless Continual Learning
Daniel Marczak
Bartłomiej Twardowski
Tomasz Trzciñski
Sebastian Cygert
MoMe
CLL
53
18
0
08 Jul 2024
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
Yu Zeng
Vishal M. Patel
Haochen Wang
Xun Huang
Ting-Chun Wang
Xuan Li
Yogesh Balaji
DiffM
29
18
0
08 Jul 2024
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Tiancheng Zhao
Qianqian Zhang
Kyusong Lee
Peng Liu
Lu Zhang
Chunxin Fang
Jiajia Liao
Kelei Jiang
Yibo Ma
Ruochen Xu
MLLM
VLM
54
5
0
06 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
45
100
0
03 Jul 2024
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
Marco Mistretta
Alberto Baldrati
Marco Bertini
Andrew D. Bagdanov
VPVLM
VLM
35
6
0
03 Jul 2024
Text-Aware Diffusion for Policy Learning
Calvin Luo
Mandy He
Zilai Zeng
Chen Sun
35
4
0
02 Jul 2024
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
Jian Ma
Yonglin Deng
Chen Chen
H. Lu
Zhenyu Yang
Zhenyu Yang
VLM
DiffM
97
6
0
02 Jul 2024
CLIP the Divergence: Language-guided Unsupervised Domain Adaptation
Jinjing Zhu
Yucheng Chen
Lin Wang
VLM
54
3
0
01 Jul 2024
Previous
1
2
3
4
5
6
...
20
21
22
Next