Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1511.06078
Cited By
v1
v2 (latest)
Learning Deep Structure-Preserving Image-Text Embeddings
19 November 2015
Liwei Wang
Yin Li
Svetlana Lazebnik
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning Deep Structure-Preserving Image-Text Embeddings"
50 / 222 papers shown
Title
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
K. Ueki
45
4
0
16 May 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
71
4
0
12 May 2021
Cross-Modal Generative Augmentation for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
70
11
0
11 May 2021
T-EMDE: Sketching-based global similarity for cross-modal retrieval
Barbara Rychalska
Mikolaj Wieczorek
Jacek Dąbrowski
54
0
0
10 May 2021
Multimodal Contrastive Training for Visual Representation Learning
Xin Yuan
Zhe Lin
Jason Kuen
Jianming Zhang
Yilin Wang
Michael Maire
Ajinkya Kale
Baldo Faieta
SSL
73
155
0
26 Apr 2021
Understanding Synonymous Referring Expressions via Contrastive Features
Yi-Wen Chen
Yi-Hsuan Tsai
Ming-Hsuan Yang
ObjD
74
4
0
20 Apr 2021
Detector-Free Weakly Supervised Grounding by Separation
Assaf Arbelle
Sivan Doveh
Amit Alfassy
J. Shtok
Guy Lev
...
Kate Saenko
S. Ullman
Raja Giryes
Rogerio Feris
Leonid Karlinsky
92
24
0
20 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
92
347
0
17 Apr 2021
Integrating Information Theory and Adversarial Learning for Cross-modal Retrieval
Wei Chen
Yu Liu
E. Bakker
M. Lew
GAN
41
27
0
11 Apr 2021
Multimodal Entity Linking for Tweets
Omar Adjali
Romaric Besançon
Olivier Ferret
Hervé Le Borgne
Brigitte Grau
71
49
0
07 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
183
1,193
0
01 Apr 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Antoine Miech
Jean-Baptiste Alayrac
Ivan Laptev
Josef Sivic
Andrew Zisserman
ViT
95
139
0
30 Mar 2021
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval
Rui Zhao
Kecheng Zheng
Zhengjun Zha
Hongtao Xie
Jiebo Luo
38
3
0
29 Mar 2021
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images
Haolin Liu
Anran Lin
Xiaoguang Han
Lei Yang
Yizhou Yu
Shuguang Cui
84
40
0
14 Mar 2021
Cross-modal Image Retrieval with Deep Mutual Information Maximization
Chunbin Gu
Jiajun Bu
Xixi Zhou
Chengwei Yao
Dongfang Ma
Zhi Yu
Xifeng Yan
52
16
0
10 Mar 2021
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning
Mingjie Sun
Jimin Xiao
Eng Gee Lim
ObjD
79
35
0
09 Mar 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
148
117
0
31 Jan 2021
Probabilistic Embeddings for Cross-Modal Retrieval
Sanghyuk Chun
Seong Joon Oh
Rafael Sampaio de Rezende
Yannis Kalantidis
Diane Larlus
UQCV
505
210
0
13 Jan 2021
Similarity Reasoning and Filtration for Image-Text Matching
Haiwen Diao
Ying Zhang
Lingyun Ma
Huchuan Lu
307
345
0
05 Jan 2021
One-shot Representational Learning for Joint Biometric and Device Authentication
Sudipta Banerjee
Arun Ross
CVBM
11
3
0
02 Jan 2021
StacMR: Scene-Text Aware Cross-Modal Retrieval
Andrés Mafla
Rafael Sampaio de Rezende
Lluís Gómez
Diane Larlus
Dimosthenis Karatzas
3DV
102
14
0
08 Dec 2020
Factorizing Perception and Policy for Interactive Instruction Following
Kunal Pratap Singh
Suvaansh Bhambri
Byeonghwi Kim
Roozbeh Mottaghi
Jonghyun Choi
LM&Ro
LRM
132
36
0
06 Dec 2020
Watch and Learn: Mapping Language and Noisy Real-world Videos with Self-supervision
Yujie Zhong
Linhai Xie
Sen Wang
Lucia Specia
Yishu Miao
SSL
24
0
0
19 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
77
174
0
01 Nov 2020
Multimodal Metric Learning for Tag-based Music Retrieval
Minz Won
Sergio Oramas
Oriol Nieto
F. Gouyon
Xavier Serra
141
45
0
30 Oct 2020
Learning Dual Semantic Relations with Graph Attention for Image-Text Matching
Keyu Wen
Xiaodong Gu
Qingrong Cheng
76
97
0
22 Oct 2020
Cosine meets Softmax: A tough-to-beat baseline for visual grounding
N. Rufus
U. R. Nair
K. M. Krishna
Vineet Gandhi
125
13
0
13 Sep 2020
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
Niluthpol Chowdhury Mithun
Karan Sikka
Han-Pang Chiu
S. Samarasekera
Rakesh Kumar
64
16
0
12 Sep 2020
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
Shengyu Zhang
Tan Jiang
Tan Wang
Kun Kuang
Zhou Zhao
Jianke Zhu
Jin Yu
Hongxia Yang
Leilei Gan
OOD
81
87
0
16 Aug 2020
Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation
Raul Gomez
Yahui Liu
Marco De Nadai
Dimosthenis Karatzas
Bruno Lepri
N. Sebe
26
9
0
11 Aug 2020
Fine-grained Iterative Attention Network for TemporalLanguage Localization in Videos
Xiaoye Qu
Peng Tang
Zhikang Zhou
Yu Cheng
Jianfeng Dong
Pan Zhou
84
92
0
06 Aug 2020
Learning Visual Representations with Caption Annotations
Mert Bulent Sariyildiz
J. Perez
Diane Larlus
VLM
SSL
113
162
0
04 Aug 2020
PhraseCut: Language-based Image Segmentation in the Wild
Chenyun Wu
Zhe Lin
Scott D. Cohen
Trung Bui
Subhransu Maji
VLM
69
115
0
03 Aug 2020
Symbiotic Adversarial Learning for Attribute-based Person Search
Yu-Tong Cao
Jingya Wang
Dacheng Tao
GAN
38
27
0
19 Jul 2020
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
Christopher Thomas
Adriana Kovashka
128
41
0
16 Jul 2020
VPN: Learning Video-Pose Embedding for Activities of Daily Living
Srijan Das
Saurav Sharma
Rui Dai
Francois Bremond
Monique Thonnat
ViT
102
127
0
06 Jul 2020
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
178
375
0
29 Jun 2020
Learning Colour Representations of Search Queries
Paridhi Maheshwari
Manoj Ghuhan Arivazhagan
Vishwa Vinay
OOD
24
4
0
17 Jun 2020
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
M. S. Saeed
Shah Nawaz
Pietro Morerio
Arif Mahmood
I. Gallo
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
84
26
0
28 Apr 2020
MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model
Han Fu
R. Wu
Chenghao Liu
Jianling Sun
68
50
0
02 Apr 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
Yuanen Zhou
Meng Wang
Daqing Liu
Zhenzhen Hu
Hanwang Zhang
76
126
0
01 Apr 2020
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Liujuan Cao
Chenglin Wu
Cheng Deng
Rongrong Ji
ObjD
274
297
0
19 Mar 2020
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
Hui Chen
Guiguang Ding
Xudong Liu
Zijia Lin
Ji Liu
Jungong Han
78
326
0
08 Mar 2020
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
Elad Amrani
Rami Ben-Ari
Daniel Rotman
A. Bronstein
117
126
0
06 Mar 2020
Deep Multimodal Image-Text Embeddings for Automatic Cross-Media Retrieval
Hadi Abdi Khojasteh
Ebrahim Ansari
Parvin Razzaghi
Akbar Karimi
VLM
43
4
0
23 Feb 2020
Incorporating Visual Semantics into Sentence Representations within a Grounded Space
Patrick Bordes
Éloi Zablocki
Laure Soulier
Benjamin Piwowarski
Patrick Gallinari
45
26
0
07 Feb 2020
Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding
Zhecheng Wang
Haoyuan Li
Ram Rajagopal
87
82
0
29 Jan 2020
Personalized Activity Recognition with Deep Triplet Embeddings
D. Burns
C. Whyne
HAI
93
25
0
15 Jan 2020
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Dave Zhenyu Chen
Angel X. Chang
Matthias Nießner
3DPC
106
379
0
18 Dec 2019
Fully-Convolutional Intensive Feature Flow Neural Network for Text Recognition
Zhao Zhang
Zemin Tang
Zheng Zhang
Yang Wang
Jie Qin
Meng Wang
75
7
0
13 Dec 2019
Previous
1
2
3
4
5
Next