Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.02114
Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"
50 / 1,097 papers shown
Title
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
Chen Tang
Xinzhu Ma
Encheng Su
Xiufeng Song
Xiaohong Liu
Wei-Hong Li
Lei Bai
Wanli Ouyang
Xiangyu Yue
3DGS
AI4TS
72
0
0
26 Mar 2025
RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models
Mehdi Moshtaghi
Siavash H. Khajavi
Joni Pajarinen
VLM
54
0
0
25 Mar 2025
EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models
Yufei Cai
Hu Han
Yuxiang Wei
Shiguang Shan
Xilin Chen
DiffM
VGen
65
0
0
25 Mar 2025
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Xu Zheng
Ziqiao Weng
Yuanhuiyi Lyu
Lutao Jiang
Haiwei Xue
Bin Ren
Danda Pani Paudel
N. Sebe
Luc Van Gool
Xuming Hu
3DV
42
5
0
23 Mar 2025
HyperLoRA: Parameter-Efficient Adaptive Generation for Portrait Synthesis
Mengtian Li
Jinshu Chen
Wanquan Feng
Bingchuan Li
Fei Dai
Mingcong Liu
Qian He
3DH
52
0
0
21 Mar 2025
DermDiff: Generative Diffusion Model for Mitigating Racial Biases in Dermatology Diagnosis
Nusrat Munia
Abdullah-Al-Zubaer Imran
MedIm
47
1
0
21 Mar 2025
AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process
J. Hu
Shuyong Gao
Qianyu Guo
Yan Wang
Qishan Wang
Yuang Feng
Wenqiang Zhang
DiffM
VGen
47
0
0
21 Mar 2025
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam
Soowon Son
Zhan Xu
Jing Shi
Difan Liu
Feng Liu
Aashish Misraa
Seungryong Kim
Yang Zhou
DiffM
51
0
0
19 Mar 2025
Tracking Meets Large Multimodal Models for Driving Scenario Understanding
Ayesha Ishaq
Jean Lahoud
Fahad Shahbaz Khan
Salman Khan
Hisham Cholakkal
Rao Muhammad Anwer
59
0
0
18 Mar 2025
AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations
Quang-Trung Truong
Wong Yuk Kwan
Duc Thanh Nguyen
Binh-Son Hua
Sai-Kit Yeung
VGen
53
0
0
17 Mar 2025
TextInVision: Text and Prompt Complexity Driven Visual Text Generation Benchmark
Forouzan Fallah
Maitreya Patel
Agneet Chatterjee
Vlad I. Morariu
Chitta Baral
Yezhou Yang
CoGe
63
0
0
17 Mar 2025
Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem
Piotr Teterwak
Kate Saenko
Bryan A. Plummer
AAML
80
0
0
17 Mar 2025
PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data
ChangHee Yang
H. Song
Seokhun Choi
Seungwoo Lee
Jaechul Kim
Hoseok Do
48
0
0
17 Mar 2025
Hyperbolic Safety-Aware Vision-Language Models
Tobia Poppi
Tejaswi Kasarla
Pascal Mettes
Lorenzo Baraldi
Rita Cucchiara
VLM
MU
61
0
0
15 Mar 2025
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren
Wentao Ma
Huan Yang
Cong Wei
Ge Zhang
Wenhu Chen
Mamba
59
3
0
14 Mar 2025
PARIC: Probabilistic Attention Regularization for Language Guided Image Classification from Pre-trained Vison Language Models
Mayank Nautiyal
Stela Arranz Gheorghe
Kristiana Stefa
Li Ju
Ida-Maria Sintorn
Prashant Singh
VLM
56
0
0
14 Mar 2025
DreamInsert: Zero-Shot Image-to-Video Object Insertion from A Single Image
Qi Zhao
Zhan Ma
Pan Zhou
VGen
75
0
0
13 Mar 2025
A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection
Shenghao Fu
Junkai Yan
Q. Yang
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
ObjD
VLM
48
0
0
13 Mar 2025
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
Sangwon Jang
June Suk Choi
Jaehyeong Jo
Kimin Lee
Sung Ju Hwang
DiffM
WIGM
84
1
0
12 Mar 2025
Filter Like You Test: Data-Driven Data Filtering for CLIP Pretraining
Mikey Shechter
Yair Carmon
CLIP
47
0
0
11 Mar 2025
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks
Junying Wang
Hongyuan Zhang
Yuan Yuan
AAML
PICV
80
0
0
11 Mar 2025
Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation
Mingkang Zhu
Xi Chen
Zihan Wang
Bei Yu
Hengshuang Zhao
Jiaya Jia
MoMe
55
0
0
11 Mar 2025
Controlling Latent Diffusion Using Latent CLIP
Jason Becker
Chris Wendler
Peter Baylies
Robert West
Christian Wressnegger
DiffM
VLM
68
0
0
11 Mar 2025
ComicsPAP: understanding comic strips by picking the correct panel
Emanuele Vivoli
Artemis LLabres
Mohamed Ali Soubgui
Marco Bertini
Ernest Valveny Llobet
Dimosthenis Karatzas
65
0
0
11 Mar 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
LM&Ro
46
0
0
10 Mar 2025
SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models
Ouxiang Li
Yuan Wang
Xinting Hu
Houcheng Jiang
Tao Liang
Y. Hao
Guojun Ma
Fuli Feng
DiffM
49
1
0
10 Mar 2025
Just Functioning as a Hook for Two-Stage Referring Multi-Object Tracking
Weize Li
Yunhao Du
Qixiang Yin
Zhaohui Hou
Zhicheng Zhao
Daqi Liu
64
0
0
10 Mar 2025
Consistent Image Layout Editing with Diffusion Models
Tao Xia
Yudi Zhang
Ting Liu Lei Zhang
DiffM
66
1
0
09 Mar 2025
DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability
Xirui Hu
Jiahao Wang
Hao Chen
Weizhan Zhang
Benqi Wang
Yangfu Li
Haishun Nan
DiffM
67
0
0
09 Mar 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Junyan Lin
Haoran Chen
Yue Fan
Yingqi Fan
Xin Jin
Hui Su
Jinlan Fu
Xiaoyu Shen
68
0
0
08 Mar 2025
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion
Ziyi Yang
Fanqi Wan
Longguang Zhong
Canbin Huang
Guosheng Liang
Xiaojun Quan
MoMe
95
0
0
06 Mar 2025
Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024
Nuria Alina Chandra
Ryan Murtfeldt
Lin Qiu
Arnab Karmakar
Hannah Lee
...
Sejin Paik
Changyeon Lee
Jongwook Choi
Aerin Kim
O. Etzioni
64
5
0
04 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei-Ming Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
45
1
0
04 Mar 2025
CacheQuant: Comprehensively Accelerated Diffusion Models
Xuewen Liu
Zhikai Li
Qingyi Gu
DiffM
40
0
0
03 Mar 2025
Language-Assisted Feature Transformation for Anomaly Detection
EungGu Yun
Heonjin Ha
Yeongwoo Nam
Bryan Dongik Lee
68
0
0
03 Mar 2025
SolidMark: Evaluating Image Memorization in Generative Models
Nicky Kriplani
Minh Pham
Gowthami Somepalli
Chinmay Hegde
Niv Cohen
VLM
45
1
0
01 Mar 2025
Analyzing CLIP's Performance Limitations in Multi-Object Scenarios: A Controlled High-Resolution Study
Reza Abbasi
Ali Nazari
Aminreza Sefid
Mohammadali Banayeeanzade
M. Rohban
M. Baghshah
VLM
64
1
0
27 Feb 2025
Vision-Encoders (Already) Know What They See: Mitigating Object Hallucination via Simple Fine-Grained CLIPScore
Hongseok Oh
Wonseok Hwang
VLM
41
0
0
27 Feb 2025
Interpreting CLIP with Hierarchical Sparse Autoencoders
Vladimir Zaigrajew
Hubert Baniecki
P. Biecek
56
0
0
27 Feb 2025
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
Reza Abbasi
Ali Nazari
Aminreza Sefid
Mohammadali Banayeeanzade
M. Rohban
M. Baghshah
VLM
89
1
0
27 Feb 2025
SCA3D: Enhancing Cross-modal 3D Retrieval via 3D Shape and Caption Paired Data Augmentation
Junlong Ren
Hao Wu
Hui Xiong
Haoran Wang
68
0
0
26 Feb 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Xiangyu Zhao
Shengyuan Ding
Zicheng Zhang
Haian Huang
Maosong Cao
...
Wenhai Wang
Guangtao Zhai
Haodong Duan
Hua Yang
Kai Chen
126
7
0
25 Feb 2025
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts
Zhenghao Liu
Xingsheng Zhu
Tianshuo Zhou
Xinyi Zhang
Xiaoyuan Yi
Yukun Yan
Yu Gu
Ge Yu
Maosong Sun
RALM
VLM
43
1
0
24 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
122
9
0
18 Feb 2025
Pre-training Auto-regressive Robotic Models with 4D Representations
Dantong Niu
Yuvan Sharma
Haoru Xue
Giscard Biamby
Junyi Zhang
Ziteng Ji
Trevor Darrell
Roei Herzig
78
1
0
18 Feb 2025
Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
Daiki Chijiwa
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Susumu Takeuchi
47
0
0
18 Feb 2025
Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
Zikang Liu
K. Zhou
Wayne Xin Zhao
Dawei Gao
Yaliang Li
Zhicheng Dou
MLLM
VLM
LRM
94
0
0
17 Feb 2025
Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
Zahra Bayramli
Ayhan Suleymanzade
Na Min An
Huzama Ahmad
Eunsu Kim
Junyeong Park
James Thorne
Alice H. Oh
91
0
0
13 Feb 2025
GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis
Angelos Zavras
Dimitrios Michail
Xiao Xiang Zhu
Begüm Demir
Ioannis Papoutsis
VLM
86
0
0
13 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
105
4
0
12 Feb 2025
Previous
1
2
3
4
5
...
20
21
22
Next