Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.08981
Cited By
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
17 February 2021
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts"
50 / 850 papers shown
Title
DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training
Yihao Chen
Xianbiao Qi
Jianan Wang
Lei Zhang
23
16
0
17 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
34
104
0
17 Apr 2023
OPI at SemEval 2023 Task 1: Image-Text Embeddings and Multimodal Information Retrieval for Visual Word Sense Disambiguation
Slawomir Dadas
24
5
0
14 Apr 2023
MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation
Jie Guo
Qimeng Wang
Yan Gao
Xiaolong Jiang
Xu Tang
Yao Hu
Baochang Zhang
VLM
37
11
0
14 Apr 2023
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
Wanrong Zhu
Jack Hessel
Anas Awadalla
S. Gadre
Jesse Dodge
Alex Fang
Youngjae Yu
Ludwig Schmidt
William Yang Wang
Yejin Choi
VLM
37
166
0
14 Apr 2023
Automated Cardiovascular Record Retrieval by Multimodal Learning between Electrocardiogram and Clinical Report
Jielin Qiu
Jiacheng Zhu
Shiqi Liu
William Jongwon Han
Jingqi Zhang
Chaojing Duan
Michael A. Rosenberg
Emerson Liu
Douglas Weber
Ding Zhao
22
0
0
13 Apr 2023
MoMo: A shared encoder Model for text, image and multi-Modal representations
Rakesh Chada
Zhao-Heng Zheng
P. Natarajan
ViT
21
4
0
11 Apr 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
56
74
0
10 Apr 2023
Probing Conceptual Understanding of Large Visual-Language Models
Madeline Chantry Schiappa
Raiyaan Abdullah
Shehreen Azad
Jared Claypoole
Michael Cogswell
Ajay Divakaran
Yogesh S Rawat
58
14
0
07 Apr 2023
What's in a Name? Beyond Class Indices for Image Recognition
Kai Han
Yandong Li
S. Vaze
Jie Li
Xuhui Jia
VLM
32
7
0
05 Apr 2023
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Vladislav Lialin
Stephen Rawls
David M. Chan
Shalini Ghosh
Anna Rumshisky
Wael Hamza
VLM
AI4TS
28
6
0
04 Apr 2023
Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation
Yabo Zhang
Zihao Wang
Jun Hao Liew
Jingjia Huang
Manyu Zhu
Jiashi Feng
W. Zuo
VLM
27
4
0
03 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
54
496
0
03 Apr 2023
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
Ximeng Sun
Pengchuan Zhang
Peizhao Zhang
Hardik Shah
Kate Saenko
Xide Xia
VLM
30
20
0
31 Mar 2023
Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models
Eric Zhang
Kai Wang
Xingqian Xu
Zhangyang Wang
Humphrey Shi
DiffM
51
175
0
30 Mar 2023
SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger
Yuting Gao
Jinfeng Liu
Zi-Han Xu
Tong Wu
Wen Liu
Jie Yang
Keren Li
Xingen Sun
CLIP
VLM
25
42
0
30 Mar 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
55
196
0
30 Mar 2023
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang
Jiaming Han
Chris Liu
Peng Gao
Aojun Zhou
Xiangfei Hu
Shilin Yan
Pan Lu
Hongsheng Li
Yu Qiao
MLLM
74
747
0
28 Mar 2023
Variational Distribution Learning for Unsupervised Text-to-Image Generation
Minsoo Kang
Doyup Lee
Jiseob Kim
Saehoon Kim
Bohyung Han
DRL
OOD
35
3
0
28 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
57
156
0
28 Mar 2023
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
44
967
0
27 Mar 2023
Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
Fumiaki Sato
Ryo Hachiuma
Taiki Sekii
50
22
0
27 Mar 2023
Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
Yuxiao Chen
Jianbo Yuan
Yu Tian
Shijie Geng
Xinyu Li
Ding Zhou
Dimitris N. Metaxas
Hongxia Yang
14
34
0
27 Mar 2023
Accelerating Vision-Language Pretraining with Free Language Modeling
Teng Wang
Yixiao Ge
Feng Zheng
Ran Cheng
Ying Shan
Xiaohu Qie
Ping Luo
VLM
MLLM
93
9
0
24 Mar 2023
Three ways to improve feature alignment for open vocabulary detection
Relja Arandjelović
A. Andonian
A. Mensch
Olivier J. Hénaff
Jean-Baptiste Alayrac
Andrew Zisserman
VLM
ObjD
48
19
0
23 Mar 2023
Open-Vocabulary Object Detection using Pseudo Caption Labels
Han-Cheol Cho
Won Young Jhoo
Woohyun Kang
Byungseok Roh
VLM
ObjD
32
20
0
23 Mar 2023
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
Dohwan Ko
Joon-Young Choi
Hyeong Kyu Choi
Kyoung-Woon On
Byungseok Roh
Hyunwoo J. Kim
57
19
0
23 Mar 2023
MAGVLT: Masked Generative Vision-and-Language Transformer
Sungwoong Kim
DaeJin Jo
Donghoon Lee
Jongmin Kim
VLM
47
12
0
21 Mar 2023
Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu
Lin Li
Jiankai Sun
Jiachuan Peng
Peilun Shi
...
Bo Xiao
Wu Yuan
Ningli Wang
Dong Xu
Benny Lo
AI4MH
LM&MA
42
128
0
21 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
42
263
0
20 Mar 2023
VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for Weakly-Supervised Object Detection
Arushi Rai
Adriana Kovashka
29
0
0
16 Mar 2023
Enabling Calibration In The Zero-Shot Inference of Large Vision-Language Models
Will LeVine
Benjamin Pikus
P. Raj
Fernando Amat Gil
VLM
UQCV
23
11
0
11 Mar 2023
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIP
MLLM
VLM
3DV
69
74
0
10 Mar 2023
Weakly-Supervised HOI Detection from Interaction Labels Only and Language/Vision-Language Priors
Mesut Erhan Unal
Adriana Kovashka
VLM
24
5
0
09 Mar 2023
Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training
Lisai Zhang
Qingcai Chen
Zhijian Chen
Yunpeng Han
Zhonghua Li
Bo Zhao
VLM
33
1
0
09 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
50
86
0
06 Mar 2023
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng
Jianbo Yuan
Yu Tian
Yuxiao Chen
Yongfeng Zhang
CLIP
VLM
49
44
0
06 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
49
21
0
04 Mar 2023
The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning
Zhenmei Shi
Jiefeng Chen
Kunyang Li
Jayaram Raghuram
Xi Wu
Yingyu Liang
S. Jha
SSL
30
19
0
28 Feb 2023
Language Is Not All You Need: Aligning Perception with Language Models
Shaohan Huang
Li Dong
Wenhui Wang
Y. Hao
Saksham Singhal
...
Johan Bjorck
Vishrav Chaudhary
Subhojit Som
Xia Song
Furu Wei
VLM
LRM
MLLM
32
536
0
27 Feb 2023
The Role of Pre-training Data in Transfer Learning
R. Entezari
Mitchell Wortsman
O. Saukh
M. Shariatnia
Hanie Sedghi
Ludwig Schmidt
46
21
0
27 Feb 2023
Learning Visual Representations via Language-Guided Sampling
Mohamed El Banani
Karan Desai
Justin Johnson
SSL
VLM
26
28
0
23 Feb 2023
Entity-Level Text-Guided Image Manipulation
Yikai Wang
Jianan Wang
Guansong Lu
Hang Xu
Zhenguo Li
Wei Zhang
Yanwei Fu
VGen
34
3
0
22 Feb 2023
Poisoning Web-Scale Training Datasets is Practical
Nicholas Carlini
Matthew Jagielski
Christopher A. Choquette-Choo
Daniel Paleka
Will Pearce
Hyrum S. Anderson
Andreas Terzis
Kurt Thomas
Florian Tramèr
SILM
33
182
0
20 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
48
205
0
20 Feb 2023
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Zhuolin Yang
Ming-Yu Liu
Zihan Liu
V. Korthikanti
Weili Nie
...
Yuke Zhu
M. Shoeybi
Bryan Catanzaro
Chaowei Xiao
Anima Anandkumar
VLM
RALM
34
39
0
09 Feb 2023
Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models
Shawn Shan
Jenna Cryan
Emily Wenger
Haitao Zheng
Rana Hanocka
Ben Y. Zhao
WIGM
17
177
0
08 Feb 2023
SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation
Yash J. Patel
Yusheng Xie
Yi Zhu
Srikar Appalaraju
R. Manmatha
40
4
0
07 Feb 2023
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Retrieval
Ziyang Luo
Pu Zhao
Can Xu
Xiubo Geng
Tao Shen
Chongyang Tao
Jing Ma
Qingwen Lin
Daxin Jiang
VLM
CLIP
29
3
0
06 Feb 2023
Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification
Bohan Li
Xiao Xu
Xinghao Wang
Yutai Hou
Yunlong Feng
Feng Wang
Xuanliang Zhang
Qingfu Zhu
Wanxiang Che
DiffM
VLM
36
10
0
04 Feb 2023
Previous
1
2
3
...
12
13
14
15
16
17
Next