ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.06078
  4. Cited By
Learning Deep Structure-Preserving Image-Text Embeddings
v1v2 (latest)

Learning Deep Structure-Preserving Image-Text Embeddings

19 November 2015
Liwei Wang
Yin Li
Svetlana Lazebnik
ArXiv (abs)PDFHTML

Papers citing "Learning Deep Structure-Preserving Image-Text Embeddings"

50 / 222 papers shown
Title
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image
  Retrieval
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
K. Ueki
45
4
0
16 May 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language
  Matching
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
71
4
0
12 May 2021
Cross-Modal Generative Augmentation for Visual Question Answering
Cross-Modal Generative Augmentation for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
70
11
0
11 May 2021
T-EMDE: Sketching-based global similarity for cross-modal retrieval
T-EMDE: Sketching-based global similarity for cross-modal retrieval
Barbara Rychalska
Mikolaj Wieczorek
Jacek Dąbrowski
54
0
0
10 May 2021
Multimodal Contrastive Training for Visual Representation Learning
Multimodal Contrastive Training for Visual Representation Learning
Xin Yuan
Zhe Lin
Jason Kuen
Jianming Zhang
Yilin Wang
Michael Maire
Ajinkya Kale
Baldo Faieta
SSL
73
155
0
26 Apr 2021
Understanding Synonymous Referring Expressions via Contrastive Features
Understanding Synonymous Referring Expressions via Contrastive Features
Yi-Wen Chen
Yi-Hsuan Tsai
Ming-Hsuan Yang
ObjD
74
4
0
20 Apr 2021
Detector-Free Weakly Supervised Grounding by Separation
Detector-Free Weakly Supervised Grounding by Separation
Assaf Arbelle
Sivan Doveh
Amit Alfassy
J. Shtok
Guy Lev
...
Kate Saenko
S. Ullman
Raja Giryes
Rogerio Feris
Leonid Karlinsky
92
24
0
20 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
92
347
0
17 Apr 2021
Integrating Information Theory and Adversarial Learning for Cross-modal
  Retrieval
Integrating Information Theory and Adversarial Learning for Cross-modal Retrieval
Wei Chen
Yu Liu
E. Bakker
M. Lew
GAN
41
27
0
11 Apr 2021
Multimodal Entity Linking for Tweets
Multimodal Entity Linking for Tweets
Omar Adjali
Romaric Besançon
Olivier Ferret
Hervé Le Borgne
Brigitte Grau
71
49
0
07 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
183
1,193
0
01 Apr 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
  Transformers
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Antoine Miech
Jean-Baptiste Alayrac
Ivan Laptev
Josef Sivic
Andrew Zisserman
ViT
95
139
0
30 Mar 2021
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval
Rui Zhao
Kecheng Zheng
Zhengjun Zha
Hongtao Xie
Jiebo Luo
38
3
0
29 Mar 2021
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
  Images
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images
Haolin Liu
Anran Lin
Xiaoguang Han
Lei Yang
Yizhou Yu
Shuguang Cui
84
40
0
14 Mar 2021
Cross-modal Image Retrieval with Deep Mutual Information Maximization
Cross-modal Image Retrieval with Deep Mutual Information Maximization
Chunbin Gu
Jiajun Bu
Xixi Zhou
Chengwei Yao
Dongfang Ma
Zhi Yu
Xifeng Yan
52
16
0
10 Mar 2021
Iterative Shrinking for Referring Expression Grounding Using Deep
  Reinforcement Learning
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning
Mingjie Sun
Jimin Xiao
Eng Gee Lim
ObjD
79
35
0
09 Mar 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal
  Transformers
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
148
117
0
31 Jan 2021
Probabilistic Embeddings for Cross-Modal Retrieval
Probabilistic Embeddings for Cross-Modal Retrieval
Sanghyuk Chun
Seong Joon Oh
Rafael Sampaio de Rezende
Yannis Kalantidis
Diane Larlus
UQCV
505
210
0
13 Jan 2021
Similarity Reasoning and Filtration for Image-Text Matching
Similarity Reasoning and Filtration for Image-Text Matching
Haiwen Diao
Ying Zhang
Lingyun Ma
Huchuan Lu
307
345
0
05 Jan 2021
One-shot Representational Learning for Joint Biometric and Device
  Authentication
One-shot Representational Learning for Joint Biometric and Device Authentication
Sudipta Banerjee
Arun Ross
CVBM
11
3
0
02 Jan 2021
StacMR: Scene-Text Aware Cross-Modal Retrieval
StacMR: Scene-Text Aware Cross-Modal Retrieval
Andrés Mafla
Rafael Sampaio de Rezende
Lluís Gómez
Diane Larlus
Dimosthenis Karatzas
3DV
102
14
0
08 Dec 2020
Factorizing Perception and Policy for Interactive Instruction Following
Factorizing Perception and Policy for Interactive Instruction Following
Kunal Pratap Singh
Suvaansh Bhambri
Byeonghwi Kim
Roozbeh Mottaghi
Jonghyun Choi
LM&RoLRM
132
36
0
06 Dec 2020
Watch and Learn: Mapping Language and Noisy Real-world Videos with
  Self-supervision
Watch and Learn: Mapping Language and Noisy Real-world Videos with Self-supervision
Yujie Zhong
Linhai Xie
Sen Wang
Lucia Specia
Yishu Miao
SSL
24
0
0
19 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation
  Learning
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViTCLIP
77
174
0
01 Nov 2020
Multimodal Metric Learning for Tag-based Music Retrieval
Multimodal Metric Learning for Tag-based Music Retrieval
Minz Won
Sergio Oramas
Oriol Nieto
F. Gouyon
Xavier Serra
141
45
0
30 Oct 2020
Learning Dual Semantic Relations with Graph Attention for Image-Text
  Matching
Learning Dual Semantic Relations with Graph Attention for Image-Text Matching
Keyu Wen
Xiaodong Gu
Qingrong Cheng
76
97
0
22 Oct 2020
Cosine meets Softmax: A tough-to-beat baseline for visual grounding
Cosine meets Softmax: A tough-to-beat baseline for visual grounding
N. Rufus
U. R. Nair
K. M. Krishna
Vineet Gandhi
125
13
0
13 Sep 2020
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
Niluthpol Chowdhury Mithun
Karan Sikka
Han-Pang Chiu
S. Samarasekera
Rakesh Kumar
64
16
0
12 Sep 2020
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
Shengyu Zhang
Tan Jiang
Tan Wang
Kun Kuang
Zhou Zhao
Jianke Zhu
Jin Yu
Hongxia Yang
Leilei Gan
OOD
81
87
0
16 Aug 2020
Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation
Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation
Raul Gomez
Yahui Liu
Marco De Nadai
Dimosthenis Karatzas
Bruno Lepri
N. Sebe
26
9
0
11 Aug 2020
Fine-grained Iterative Attention Network for TemporalLanguage
  Localization in Videos
Fine-grained Iterative Attention Network for TemporalLanguage Localization in Videos
Xiaoye Qu
Peng Tang
Zhikang Zhou
Yu Cheng
Jianfeng Dong
Pan Zhou
84
92
0
06 Aug 2020
Learning Visual Representations with Caption Annotations
Learning Visual Representations with Caption Annotations
Mert Bulent Sariyildiz
J. Perez
Diane Larlus
VLMSSL
113
162
0
04 Aug 2020
PhraseCut: Language-based Image Segmentation in the Wild
PhraseCut: Language-based Image Segmentation in the Wild
Chenyun Wu
Zhe Lin
Scott D. Cohen
Trung Bui
Subhransu Maji
VLM
69
115
0
03 Aug 2020
Symbiotic Adversarial Learning for Attribute-based Person Search
Symbiotic Adversarial Learning for Attribute-based Person Search
Yu-Tong Cao
Jingya Wang
Dacheng Tao
GAN
38
27
0
19 Jul 2020
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
Christopher Thomas
Adriana Kovashka
128
41
0
16 Jul 2020
VPN: Learning Video-Pose Embedding for Activities of Daily Living
VPN: Learning Video-Pose Embedding for Activities of Daily Living
Srijan Das
Saurav Sharma
Rui Dai
Francois Bremond
Monique Thonnat
ViT
102
127
0
06 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
178
375
0
29 Jun 2020
Learning Colour Representations of Search Queries
Learning Colour Representations of Search Queries
Paridhi Maheshwari
Manoj Ghuhan Arivazhagan
Vishwa Vinay
OOD
24
4
0
17 Jun 2020
Cross-modal Speaker Verification and Recognition: A Multilingual
  Perspective
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
M. S. Saeed
Shah Nawaz
Pietro Morerio
Arif Mahmood
I. Gallo
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
84
26
0
28 Apr 2020
MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images
  with Latent Variable Model
MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model
Han Fu
R. Wu
Chenghao Liu
Jianling Sun
68
50
0
02 Apr 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
More Grounded Image Captioning by Distilling Image-Text Matching Model
Yuanen Zhou
Meng Wang
Daqing Liu
Zhenzhen Hu
Hanwang Zhang
76
126
0
01 Apr 2020
Multi-task Collaborative Network for Joint Referring Expression
  Comprehension and Segmentation
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Liujuan Cao
Chenglin Wu
Cheng Deng
Rongrong Ji
ObjD
274
297
0
19 Mar 2020
IMRAM: Iterative Matching with Recurrent Attention Memory for
  Cross-Modal Image-Text Retrieval
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
Hui Chen
Guiguang Ding
Xudong Liu
Zijia Lin
Ji Liu
Jungong Han
78
326
0
08 Mar 2020
Noise Estimation Using Density Estimation for Self-Supervised Multimodal
  Learning
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
Elad Amrani
Rami Ben-Ari
Daniel Rotman
A. Bronstein
117
126
0
06 Mar 2020
Deep Multimodal Image-Text Embeddings for Automatic Cross-Media
  Retrieval
Deep Multimodal Image-Text Embeddings for Automatic Cross-Media Retrieval
Hadi Abdi Khojasteh
Ebrahim Ansari
Parvin Razzaghi
Akbar Karimi
VLM
43
4
0
23 Feb 2020
Incorporating Visual Semantics into Sentence Representations within a
  Grounded Space
Incorporating Visual Semantics into Sentence Representations within a Grounded Space
Patrick Bordes
Éloi Zablocki
Laure Soulier
Benjamin Piwowarski
Patrick Gallinari
45
26
0
07 Feb 2020
Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal
  Urban Neighborhood Embedding
Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding
Zhecheng Wang
Haoyuan Li
Ram Rajagopal
87
82
0
29 Jan 2020
Personalized Activity Recognition with Deep Triplet Embeddings
Personalized Activity Recognition with Deep Triplet Embeddings
D. Burns
C. Whyne
HAI
93
25
0
15 Jan 2020
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Dave Zhenyu Chen
Angel X. Chang
Matthias Nießner
3DPC
106
379
0
18 Dec 2019
Fully-Convolutional Intensive Feature Flow Neural Network for Text
  Recognition
Fully-Convolutional Intensive Feature Flow Neural Network for Text Recognition
Zhao Zhang
Zemin Tang
Zheng Zhang
Yang Wang
Jie Qin
Meng Wang
75
7
0
13 Dec 2019
Previous
12345
Next