Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.02114
Cited By
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
3 November 2021
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs"
50 / 1,100 papers shown
Title
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
130
109
0
08 Feb 2024
Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search
Yutaro Oguri
Yusuke Matsui
36
0
0
07 Feb 2024
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Quan-Sen Sun
Jinsheng Wang
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Xinlong Wang
VLM
CLIP
MLLM
100
42
0
06 Feb 2024
Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing
Yan Shu
Weichao Zeng
Zhenhang Li
Fangmin Zhao
Yu Zhou
37
3
0
05 Feb 2024
On Catastrophic Inheritance of Large Foundation Models
Hao Chen
Bhiksha Raj
Xing Xie
Jindong Wang
AI4CE
58
12
0
02 Feb 2024
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
Wei Zhang
Miaoxin Cai
Tong Zhang
Zhuang Yin
Xuerui Mao
42
90
0
30 Jan 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
89
245
0
29 Jan 2024
StableIdentity: Inserting Anybody into Anywhere at First Sight
Qinghe Wang
Xu Jia
Xiaomin Li
Taiqing Li
Liqian Ma
Yunzhi Zhuge
Huchuan Lu
48
20
0
29 Jan 2024
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
Yuhang Zang
Hanlin Goh
Josh Susskind
Chen Huang
VLM
42
11
0
29 Jan 2024
FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models
Feihong He
Gang Li
Mengyuan Zhang
Leilei Yan
Hui Xiong
Fanzhang Li
Li Shen
DiffM
40
15
0
28 Jan 2024
Multi-Trigger Backdoor Attacks: More Triggers, More Threats
Yige Li
Xingjun Ma
Jiabo He
Hanxun Huang
Yu-Gang Jiang
AAML
38
5
0
27 Jan 2024
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
Xiaojun Wu
Di Zhang
Ruyi Gan
Junyu Lu
Ziwei Wu
Renliang Sun
Jiaxing Zhang
Pingjian Zhang
Yan Song
VLM
34
6
0
26 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
56
182
0
24 Jan 2024
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
Siwei Wu
Yizhi Li
Kang Zhu
Ge Zhang
Yiming Liang
...
Wenhu Chen
Wenhao Huang
Noura Al Moubayed
Jie Fu
Chenghua Lin
46
12
0
24 Jan 2024
Small Language Model Meets with Reinforced Vision Vocabulary
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
En Yu
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
VLM
57
40
0
23 Jan 2024
The Neglected Tails in Vision-Language Models
Shubham Parashar
Zhiqiu Lin
Tian Liu
Xiangjue Dong
Yanan Li
Deva Ramanan
James Caverlee
Shu Kong
VLM
42
33
0
23 Jan 2024
Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang
Tianyu Pang
Chao Du
Yi Ren
Bo-wen Li
Min Lin
MLLM
35
14
0
22 Jan 2024
The Conversation is the Command: Interacting with Real-World Autonomous Robot Through Natural Language
Linus Nwankwo
Elmar Rueckert
LM&Ro
26
7
0
22 Jan 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Eric Wang
Xin Li
Luisa Verdoliva
Shu Hu
88
58
0
22 Jan 2024
Text-to-Image Cross-Modal Generation: A Systematic Review
Maciej Żelaszczyk
Jacek Mańdziuk
35
3
0
21 Jan 2024
STICKERCONV: Generating Multimodal Empathetic Responses from Scratch
Yiqun Zhang
Fanheng Kong
Peidong Wang
Shuang Sun
Lingshuai Wang
Shi Feng
Daling Wang
Yifei Zhang
Kaisong Song
36
10
0
20 Jan 2024
Understanding Video Transformers via Universal Concept Discovery
M. Kowal
Achal Dave
Rares Ambrus
Adrien Gaidon
Konstantinos G. Derpanis
P. Tokmakov
ViT
39
8
0
19 Jan 2024
One Step Learning, One Step Review
Xiaolong Huang
Qiankun Li
Xueran Li
Xuesong Gao
39
1
0
19 Jan 2024
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation
Kohei Uehara
Nabarun Goswami
Hanqin Wang
Toshiaki Baba
Kohtaro Tanaka
...
Takagi Naoya
Ryo Umagami
Yingyi Wen
Tanachai Anakewat
Tatsuya Harada
LRM
28
2
0
18 Jan 2024
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Zhao Wang
Aoxue Li
Lingting Zhu
Yong Guo
Qi Dou
Zhenguo Li
VGen
DiffM
35
41
0
18 Jan 2024
Veagle: Advancements in Multimodal Representation Learning
Rajat Chawla
Arkajit Datta
Tushar Verma
Adarsh Jha
Anmol Gautam
Ayush Vatsal
Sukrit Chaterjee
NS Mukunda
Ishaan Bhola
VLM
16
4
0
18 Jan 2024
CLIP Model for Images to Textual Prompts Based on Top-k Neighbors
Xin Zhang
Xin Zhang
Yeming Cai
Tianzhi Jia
VLM
28
0
0
18 Jan 2024
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang
Kunchang Li
Xinyuan Chen
Yaohui Wang
Ziwei Liu
Yu Qiao
Yali Wang
VGen
DiffM
43
35
0
17 Jan 2024
UniVG: Towards UNIfied-modal Video Generation
Ludan Ruan
Lei Tian
Chuanwei Huang
Xu Zhang
Xinyan Xiao
VGen
DiffM
34
3
0
17 Jan 2024
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Jonghyun Lee
Hansam Cho
Youngjoon Yoo
Seoung Bum Kim
Yonghyun Jeong
DiffM
23
7
0
17 Jan 2024
Fixed Point Diffusion Models
Xingjian Bai
Luke Melas-Kyriazi
18
3
0
16 Jan 2024
SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
Zhixuan Liu
Peter Schaldenbrand
Beverley-Claire Okogwu
Wenxuan Peng
Youngsik Yun
Andrew Hundt
Jihie Kim
Jean Oh
39
16
0
16 Jan 2024
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Bowen Shi
Peisen Zhao
Zichen Wang
Yuhang Zhang
Yaoming Wang
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Qi Tian
Xiaopeng Zhang
VLM
40
7
0
12 Jan 2024
PALP: Prompt Aligned Personalization of Text-to-Image Models
Moab Arar
Andrey Voynov
Amir Hertz
Omri Avrahami
Shlomi Fruchter
Yael Pritch
Daniel Cohen-Or
Ariel Shamir
DiffM
29
21
0
11 Jan 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
47
71
0
10 Jan 2024
SnapCap: Efficient Snapshot Compressive Video Captioning
Jianqiao Sun
Yudi Su
Hao Zhang
Ziheng Cheng
Zequn Zeng
Zhengjue Wang
Bo Chen
Xin Yuan
37
1
0
10 Jan 2024
Revisiting Adversarial Training at Scale
Zeyu Wang
Xianhang Li
Hongru Zhu
Cihang Xie
41
15
0
09 Jan 2024
Learning to Prompt Segment Anything Models
Jiaxing Huang
Kai Jiang
Jingyi Zhang
Han Qiu
Lewei Lu
Shijian Lu
Eric Xing
VLM
LRM
51
7
0
09 Jan 2024
Effective pruning of web-scale datasets based on complexity of concept clusters
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
34
22
0
09 Jan 2024
Representative Feature Extraction During Diffusion Process for Sketch Extraction with One Example
Kwan Yun
Youngseo Kim
Kwanggyoon Seo
Chang Wook Seo
Junyong Noh
DiffM
30
2
0
09 Jan 2024
Multimodal Data Curation via Object Detection and Filter Ensembles
Tzu-Heng Huang
Changho Shin
Sui Jiet Tay
Dyah Adila
Frederic Sala
34
5
0
05 Jan 2024
VASE: Object-Centric Appearance and Shape Manipulation of Real Videos
E. Peruzzo
Vidit Goel
Dejia Xu
Xingqian Xu
Yi Ding
Zhangyang Wang
Humphrey Shi
N. Sebe
LM&Ro
VGen
DiffM
66
9
0
04 Jan 2024
Improving Diffusion-Based Image Synthesis with Context Prediction
Ling Yang
Jingwei Liu
Shenda Hong
Zhilong Zhang
Zhilin Huang
Zheming Cai
Wentao Zhang
Bin Cui
DiffM
51
34
0
04 Jan 2024
GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised Learning
Aarash Feizi
Randall Balestriero
Adriana Romero Soriano
Reihaneh Rabbany
26
2
0
03 Jan 2024
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Alex Jinpeng Wang
Linjie Li
K. Lin
Jianfeng Wang
Kevin Lin
Zhengyuan Yang
Lijuan Wang
Mike Zheng Shou
VLM
VGen
35
12
0
01 Jan 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
67
84
0
29 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
40
147
0
28 Dec 2023
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo Zhang
Xiaolin Wei
Chunhua Shen
MLLM
44
35
0
28 Dec 2023
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
44
22
0
27 Dec 2023
LeanVec: Searching vectors faster by making them fit
Mariano Tepper
Ishwar Bhati
Cecilia Aguerrebere
Mark Hildebrand
Ted Willke
VLM
OODD
29
1
0
26 Dec 2023
Previous
1
2
3
...
9
10
11
...
20
21
22
Next