Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.08981
Cited By
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
17 February 2021
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts"
50 / 850 papers shown
Title
Composition and Deformance: Measuring Imageability with a Text-to-Image Model
Si Wu
David A. Smith
EGVM
CoGe
20
2
0
05 Jun 2023
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
Yonglong Tian
Lijie Fan
Phillip Isola
Huiwen Chang
Dilip Krishnan
VLM
DiffM
45
142
0
01 Jun 2023
Vocabulary-free Image Classification
Alessandro Conti
Enrico Fini
Massimiliano Mancini
Paolo Rota
Yiming Wang
Elisa Ricci
VLM
42
23
0
01 Jun 2023
Exploring Open-Vocabulary Semantic Segmentation without Human Labels
Jun Chen
Deyao Zhu
Guocheng Qian
Guohao Li
Zhicheng Yan
Chenchen Zhu
Fanyi Xiao
Mohamed Elhoseiny
Sean Culatana
VLM
46
11
0
01 Jun 2023
Improving CLIP Training with Language Rewrites
Lijie Fan
Dilip Krishnan
Phillip Isola
Dina Katabi
Yonglong Tian
BDL
VLM
CLIP
38
157
0
31 May 2023
Too Large; Data Reduction for Vision-Language Pre-Training
Alex Jinpeng Wang
Kevin Qinghong Lin
David Junhao Zhang
Stan Weixian Lei
Mike Zheng Shou
VLM
33
24
0
31 May 2023
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
34
0
0
31 May 2023
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting
R. Ramos
Bruno Martins
Desmond Elliott
VLM
18
16
0
31 May 2023
Improved Probabilistic Image-Text Representations
Sanghyuk Chun
VLM
39
27
0
29 May 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
42
97
0
29 May 2023
Image Captioning with Multi-Context Synthetic Data
Feipeng Ma
Y. Zhou
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
DiffM
42
7
0
29 May 2023
Conditional Score Guidance for Text-Driven Image-to-Image Translation
Hyunsoo Lee
Minsoo Kang
Bohyung Han
DiffM
24
14
0
29 May 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
26
27
0
28 May 2023
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval
Jiapeng Wang
Chengyu Wang
Xiaodan Wang
Jun Huang
Lianwen Jin
VLM
37
4
0
28 May 2023
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Jannik Kossen
Mark Collier
Basil Mustafa
Tianlin Li
Xiaohua Zhai
Lucas Beyer
Andreas Steiner
Jesse Berent
Rodolphe Jenatton
Efi Kokiopoulou
VLM
45
12
0
26 May 2023
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Zijia Zhao
Longteng Guo
Tongtian Yue
Si-Qing Chen
Shuai Shao
Xinxin Zhu
Zehuan Yuan
Jing Liu
MLLM
40
53
0
25 May 2023
PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology
Yuxuan Sun
Chenglu Zhu
S. Zheng
Kai Zhang
Xiaoxuan Yu
Zhongyi Shui
Yunlong Zhang
Honglin Li
Lin Yang
LM&MA
MedIm
16
44
0
24 May 2023
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
Dongxu Li
Junnan Li
Steven C. H. Hoi
42
303
0
24 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
32
5
0
23 May 2023
Can Language Models Understand Physical Concepts?
Lei Li
Jingjing Xu
Qingxiu Dong
Ce Zheng
Qi Liu
Lingpeng Kong
Xu Sun
ALM
33
18
0
23 May 2023
Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models
Y. Qu
Xinyue Shen
Xinlei He
Michael Backes
Savvas Zannettou
Yang Zhang
21
106
0
23 May 2023
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
Harman Singh
Pengchuan Zhang
Qifan Wang
Mengjiao MJ Wang
Wenhan Xiong
Jingfei Du
Yu Chen
CoGe
VLM
29
24
0
23 May 2023
Preconditioned Visual Language Inference with Weak Supervision
Ehsan Qasemi
Amani Maina-Kilaas
Devadutta Dash
Khalid Alsaggaf
Muhao Chen
38
0
0
22 May 2023
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
Xingjian He
Sihan Chen
Fan Ma
Zhicheng Huang
Xiaojie Jin
Zikang Liu
Dongmei Fu
Yi Yang
Qingbin Liu
Jiashi Feng
VLM
CLIP
23
17
0
22 May 2023
DreamWaltz: Make a Scene with Complex 3D Animatable Avatars
Yukun Huang
Jianan Wang
Ailing Zeng
He Cao
Xianbiao Qi
Yukai Shi
Zhengjun Zha
Lei Zhang
34
70
0
21 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Qingbin Liu
20
1
0
19 May 2023
Rethinking Multimodal Content Moderation from an Asymmetric Angle with Mixed-modality
Jialing Yuan
Ye Yu
Gaurav Mittal
Matthew Hall
Sandra Sajeev
Mei Chen
27
9
0
17 May 2023
IMAD: IMage-Augmented multi-modal Dialogue
Viktor Moskvoretskii
Anton Frolov
Denis Kuznetsov
32
4
0
17 May 2023
CLIP-GCD: Simple Language Guided Generalized Category Discovery
Rabah Ouldnoughi
Chia-Wen Kuo
Z. Kira
VLM
37
14
0
17 May 2023
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
Xiaoman Zhang
Chaoyi Wu
Ziheng Zhao
Weixiong Lin
Ya Zhang
Yanfeng Wang
Weidi Xie
LM&MA
48
157
0
17 May 2023
Improved baselines for vision-language pre-training
Enrico Fini
Pietro Astolfi
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
SSL
CLIP
VLM
55
22
0
15 May 2023
On the Hidden Mystery of OCR in Large Multimodal Models
Yuliang Liu
Zhang Li
Mingxin Huang
Chunyuan Li
Dezhi Peng
Mingyu Liu
Lianwen Jin
Xiang Bai
VLM
MLLM
34
55
0
13 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
36
22
0
12 May 2023
An Inverse Scaling Law for CLIP Training
Xianhang Li
Zeyu Wang
Cihang Xie
VLM
CLIP
48
55
0
11 May 2023
Continual Vision-Language Representation Learning with Off-Diagonal Information
Zixuan Ni
Longhui Wei
Siliang Tang
Yueting Zhuang
Qi Tian
VLM
CLL
33
25
0
11 May 2023
VideoChat: Chat-Centric Video Understanding
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
MLLM
69
534
0
10 May 2023
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Hassan Akbari
Dan Kondratyuk
Huayu Chen
Rachel Hornung
Haoran Wang
Hartwig Adam
VLM
MoE
35
11
0
10 May 2023
Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Liangliang Cao
Bowen Zhang
Chen Chen
Yinfei Yang
Xianzhi Du
Wen‐Cheng Zhang
Zhiyun Lu
Yantao Zheng
CLIP
VLM
32
15
0
08 May 2023
Otter: A Multi-Modal Model with In-Context Instruction Tuning
Bo-wen Li
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Jingkang Yang
Ziwei Liu
MLLM
48
505
0
05 May 2023
LMEye: An Interactive Perception Network for Large Language Models
Yunxin Li
Baotian Hu
Xinyu Chen
Lin Ma
Yong-mei Xu
Hao Fei
MLLM
VLM
33
24
0
05 May 2023
Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Chuhan Zhang
Antoine Miech
Jiajun Shen
Jean-Baptiste Alayrac
Pauline Luc
VLM
VPVLM
47
2
0
03 May 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Peng Gao
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
...
Pan Lu
Conghui He
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
47
560
0
28 Apr 2023
DataComp: In search of the next generation of multimodal datasets
S. Gadre
Gabriel Ilharco
Alex Fang
J. Hayase
Georgios Smyrnis
...
A. Dimakis
J. Jitsev
Y. Carmon
Vaishaal Shankar
Ludwig Schmidt
VLM
33
415
0
27 Apr 2023
Retrieval-based Knowledge Augmented Vision Language Pre-training
Jiahua Rao
Zifei Shan
Long Liu
Yao Zhou
Yuedong Yang
VLM
88
14
0
27 Apr 2023
Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining
Giacomo Nebbia
Adriana Kovashka
25
0
0
25 Apr 2023
SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models
Jonathan Roberts
Kai Han
Samuel Albanie
VLM
45
13
0
23 Apr 2023
OmniLabel: A Challenging Benchmark for Language-Based Object Detection
S. Schulter
G. VijayKumarB.
Yumin Suh
Konstantinos M. Dafnis
Zhixing Zhang
Shiyu Zhao
Dimitris N. Metaxas
ObjD
35
12
0
22 Apr 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
75
1,922
0
20 Apr 2023
Hyperbolic Image-Text Representations
Karan Desai
Maximilian Nickel
Tanmay Rajpurohit
Justin Johnson
Ramakrishna Vedantam
VLM
47
57
0
18 Apr 2023
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
160
4,325
0
17 Apr 2023
Previous
1
2
3
...
11
12
13
...
15
16
17
Next