Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.02114
Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"
50 / 1,097 papers shown
Title
AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection
Shuheng Zhang
Yong-Jin Liu
Hongbo Zhou
Jun Peng
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
VGen
43
0
0
08 Feb 2025
The Cake that is Intelligence and Who Gets to Bake it: An AI Analogy and its Implications for Participation
Martin Mundt
Anaelia Ovalle
Felix Friedrich
A Pranav
Subarnaduti Paul
Manuel Brack
Kristian Kersting
William Agnew
314
0
0
05 Feb 2025
Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding
Jingming Xia
Guanqun Cao
Guang Ma
Yiben Luo
Qinzhao Li
John Oyekan
MDE
59
0
0
01 Feb 2025
Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation
Reza Akbarian Bafghi
Carden Bagwell
Avinash Ravichandran
Ashish Shrivastava
M. Raissi
48
0
0
28 Jan 2025
Rethinking the Bias of Foundation Model under Long-tailed Distribution
Jiahao Chen
Bin Qin
Jiangmeng Li
Hao Chen
Bing-Huang Su
85
0
0
27 Jan 2025
LDR-Net: A Novel Framework for AI-generated Image Detection via Localized Discrepancy Representation
Jiaxin Chen
Miao Hu
Dengyong Zhang
Yun Song
Xin Liao
29
0
0
23 Jan 2025
A Comprehensive Social Bias Audit of Contrastive Vision Language Models
Zahraa Al Sahili
Ioannis Patras
Matthew Purver
VLM
72
1
0
22 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
105
18
0
17 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
88
12
0
06 Jan 2025
Refining Skewed Perceptions in Vision-Language Models through Visual Representations
Haocheng Dai
Sarang Joshi
VLM
74
0
0
03 Jan 2025
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang
Hang Zhang
Xin Li
Jiashuo Sun
Yongliang Shen
Weiming Lu
Deli Zhao
Yueting Zhuang
Lidong Bing
VLM
43
2
0
01 Jan 2025
ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers
Chao Fan
Qipei Mei
Xiaonan Wang
Xinming Li
33
3
0
31 Dec 2024
Demystifying CLIP Data
Hu Xu
Saining Xie
Xiaoqing Ellen Tan
Po-Yao (Bernie) Huang
Russell Howes
Vasu Sharma
Shang-Wen Li
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLM
CLIP
53
109
0
31 Dec 2024
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Hao Fei
Shengqiong Wu
Hao Zhang
Tat-Seng Chua
Shuicheng Yan
64
39
0
31 Dec 2024
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
55
4
0
31 Dec 2024
CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder
Lichen Ma
Tiezhu Yue
Pei Fu
Yujie Zhong
Kai Zhou
Xiaoming Wei
Jie Hu
DiffM
78
2
0
23 Dec 2024
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Cijo Jose
Théo Moutakanni
Dahyun Kang
Federico Baldassarre
Timothée Darcet
...
Maxime Oquab
Oriane Siméoni
Huy V. Vo
Patrick Labatut
Piotr Bojanowski
CLIP
VLM
100
6
0
20 Dec 2024
Generating floorplans for various building functionalities via latent diffusion model
Mohamed R. Ibrahim
J. Musil
Irene Gallou
DiffM
AI4CE
65
0
0
09 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
M. Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
90
4
0
08 Dec 2024
FLAIR: VLM with Fine-grained Language-informed Image Representations
Rui Xiao
Sanghwan Kim
Mariana-Iuliana Georgescu
Zeynep Akata
Stephan Alaniz
VLM
CLIP
79
2
0
04 Dec 2024
Composed Image Retrieval for Training-Free Domain Conversion
Nikos Efthymiadis
Bill Psomas
Zakaria Laskar
Konstantinos Karantzalos
Yannis Avrithis
Ondřej Chum
Giorgos Tolias
76
0
0
04 Dec 2024
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Q. He
Jinlong Peng
P. Xu
Boyuan Jiang
Xiaobin Hu
...
Yong-Jin Liu
Yishuo Wang
Chengjie Wang
Xiaomeng Li
Jun Zhang
DiffM
122
1
0
04 Dec 2024
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
Xianrui Li
Kai Qiu
Hongyu Chen
Jason Kuen
Jiuxiang Gu
Jiadong Wang
Zhe-nan Lin
Bhiksha Raj
VLM
125
3
0
02 Dec 2024
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
Xingsong Ye
Yongkun Du
Yunbo Tao
Z. Chen
DiffM
116
0
0
02 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
85
2
0
02 Dec 2024
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang
Chen Ju
Weixiong Lin
Shuai Xiao
Mengting Chen
...
Mingshuai Yao
Jinsong Lan
Ying Chen
Qingwen Liu
Yanfeng Wang
VLM
CLIP
78
4
0
30 Nov 2024
SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Rong-Cheng Tu
Wenhao Sun
Zhao Jin
Jingyi Liao
Jiaxing Huang
Dacheng Tao
VGen
DiffM
97
3
0
28 Nov 2024
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
Shimin Chen
Xiaohan Lan
Yitian Yuan
Zequn Jie
Lin Ma
VLM
MLLM
83
13
0
27 Nov 2024
DoubleCCA: Improving Foundation Model Group Robustness with Random Sentence Embeddings
Hong Liu
Yitong Lu
78
0
0
25 Nov 2024
LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation
Fan Deng
Yaguang Wu
Xinyang Yu
Xiangjun Huang
Jian Yang
Guangyu Yan
Qiang Xu
DiffM
94
0
0
22 Nov 2024
AnyText2: Visual Text Generation and Editing With Customizable Attributes
Yuxiang Tuo
Yifeng Geng
Liefeng Bo
VLM
93
6
0
22 Nov 2024
GalaxyEdit: Large-Scale Image Editing Dataset with Enhanced Diffusion Adapter
Aniruddha Bala
Rohan Jaiswal
Loay Rashid
Siddharth Roheda
77
0
0
21 Nov 2024
DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving
Xianda Guo
Ruijun Zhang
Yiqun Duan
Yuhang He
Chenming Zhang
Shuai Liu
Long Chen
LRM
91
11
0
20 Nov 2024
Artificial Intelligence in Pediatric Echocardiography: Exploring Challenges, Opportunities, and Clinical Applications with Explainable AI and Federated Learning
M. Y. Jabarulla
T. Uden
Thomas Jack
P. Beerbaum
S. Oeltze-jafra
37
1
0
15 Nov 2024
Structured Pattern Expansion with Diffusion Models
Marzia Riso
Giuseppe Vecchio
Fabio Pellacini
DiffM
50
0
0
12 Nov 2024
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors
Jeongsoo Park
Andrew Owens
34
3
0
06 Nov 2024
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages
Youssef Mohamed
Runjia Li
Ibrahim Said Ahmad
Kilichbek Haydarov
Philip Torr
Kenneth Ward Church
Mohamed Elhoseiny
VLM
38
7
0
06 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIP
VLM
50
2
0
05 Nov 2024
HumanVLM: Foundation for Human-Scene Vision-Language Model
Dawei Dai
Xu Long
Li Yutang
Zhang YuanHui
Shuyin Xia
VLM
MLLM
39
1
0
05 Nov 2024
Membership Inference Attacks against Large Vision-Language Models
Zhan Li
Yongtao Wu
Yihang Chen
F. Tonin
Elias Abad Rocamora
V. Cevher
44
4
0
05 Nov 2024
ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy
Kian Kenyon-Dean
Zitong Jerry Wang
John Urbanik
Konstantin Donhauser
Jason Hartford
...
Safiye Celik
Marta Fay
Juan Sebastian Rodriguez Vera
I. Haque
Oren Z. Kraus
MedIm
39
4
0
04 Nov 2024
Identifying Implicit Social Biases in Vision-Language Models
Kimia Hamidieh
Haoran Zhang
Walter Gerych
Thomas Hartvigsen
Marzyeh Ghassemi
VLM
36
11
0
01 Nov 2024
SeafloorAI: A Large-scale Vision-Language Dataset for Seafloor Geological Survey
Kien X. Nguyen
Fengchun Qiao
Arthur Trembanis
Xi Peng
26
0
0
31 Oct 2024
Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel Governance Mechanisms
Jordan Meyer
Nick Padgett
Cullen Miller
Laura Exline
31
4
0
30 Oct 2024
PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement
Shutong Jin
Ruiyu Wang
Kuangyi Chen
Florian T. Pokorny
29
0
0
29 Oct 2024
Dreaming Out Loud: A Self-Synthesis Approach For Training Vision-Language Models With Developmentally Plausible Data
Badr AlKhamissi
Yingtian Tang
Abdülkadir Gökce
Johannes Mehrer
Martin Schrimpf
VLM
49
0
0
29 Oct 2024
Face-MLLM: A Large Face Perception Model
Haomiao Sun
Mingjie He
Tianheng Lian
Hu Han
Shiguang Shan
VLM
CVBM
LRM
25
5
0
28 Oct 2024
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Han Huang
Yuqi Huo
Zijia Zhao
Haoyu Lu
Shu Wu
Bin Wang
Qiang Liu
Weipeng Chen
Liang Wang
VLM
30
1
0
21 Oct 2024
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Shaozhe Hao
Xuantong Liu
Xianbiao Qi
Shihao Zhao
Bojia Zi
Rong Xiao
Kai Han
Kwan-Yee K. Wong
45
3
0
18 Oct 2024
Harnessing Webpage UIs for Text-Rich Visual Understanding
Junpeng Liu
Tianyue Ou
Yifan Song
Yuxiao Qu
Wai Lam
Chenyan Xiong
Wenhu Chen
Graham Neubig
Xiang Yue
82
6
0
17 Oct 2024
Previous
1
2
3
4
5
6
...
20
21
22
Next