Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.08784
Cited By
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval
16 March 2021
Siqi Sun
Yen-Chun Chen
Linjie Li
Shuohang Wang
Yuwei Fang
Jingjing Liu
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval"
18 / 18 papers shown
Title
Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment
Jiamin Zhuang
Jing Yu
Yang Ding
Xiangyang Qu
Yue Hu
32
9
0
27 Aug 2023
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
Ding Jiang
Mang Ye
35
140
0
22 Mar 2023
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Retrieval
Ziyang Luo
Pu Zhao
Can Xu
Xiubo Geng
Tao Shen
Chongyang Tao
Jing Ma
Qingwen Lin
Daxin Jiang
VLM
CLIP
19
3
0
06 Feb 2023
HADA: A Graph-based Amalgamation Framework in Image-text Retrieval
Manh-Duy Nguyen
Binh T. Nguyen
C. Gurrin
VLM
28
4
0
11 Jan 2023
ContextCLIP: Contextual Alignment of Image-Text pairs on CLIP visual representations
Chanda Grover
Indra Deep Mastan
Debayan Gupta
VLM
CLIP
24
4
0
14 Nov 2022
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
Bin Shan
Weichong Yin
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
VLM
27
19
0
30 Sep 2022
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Zhi-Qi Cheng
Qianwen Dai
Siyao Li
Teruko Mitamura
Alexander G. Hauptmann
16
34
0
18 Aug 2022
HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text Retrieval
Feilong Chen
Xiuyi Chen
Jiaxin Shi
Duzhen Zhang
Jianlong Chang
Qi Tian
VLM
CLIP
34
6
0
24 May 2022
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Haoyu Lu
Nanyi Fei
Yuqi Huo
Yizhao Gao
Zhiwu Lu
Jiaxin Wen
CLIP
VLM
27
54
0
15 Apr 2022
Vision-and-Language Pretrained Models: A Survey
Siqu Long
Feiqi Cao
S. Han
Haiqing Yang
VLM
33
63
0
15 Apr 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
Jiaxi Gu
Xiaojun Meng
Guansong Lu
Lu Hou
Minzhe Niu
...
Runhu Huang
Wei Zhang
Xingda Jiang
Chunjing Xu
Hang Xu
VLM
43
88
0
14 Feb 2022
ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues
Hengcan Shi
Munawar Hayat
Yicheng Wu
Jianfei Cai
VLM
30
60
0
18 Jan 2022
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Luu Anh Tuan
Lijuan Wang
Zicheng Liu
VLM
51
216
0
24 Nov 2021
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Jianfeng Wang
Xiaowei Hu
Zhe Gan
Zhengyuan Yang
Xiyang Dai
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
29
57
0
19 Nov 2021
Text-Based Person Search with Limited Data
Xiaoping Han
Sen He
Li Zhang
Tao Xiang
18
88
0
20 Oct 2021
AliMe MKG: A Multi-modal Knowledge Graph for Live-streaming E-commerce
Guohai Xu
Hehong Chen
Feng-Lin Li
Fu Sun
Yunzhou Shi
Zhixiong Zeng
Wei Zhou
Zhongzhou Zhao
Ji Zhang
19
16
0
13 Sep 2021
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Chenyi Lei
Shixian Luo
Yong-jin Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
C. Miao
Houqiang Li
30
41
0
19 Apr 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
1