Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1511.06078
Cited By
v1
v2 (latest)
Learning Deep Structure-Preserving Image-Text Embeddings
19 November 2015
Liwei Wang
Yin Li
Svetlana Lazebnik
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning Deep Structure-Preserving Image-Text Embeddings"
50 / 222 papers shown
Title
Cross and Learn: Cross-Modal Self-Supervision
Nawid Sayed
Biagio Brattoli
Bjorn Ommer
SSL
94
78
0
09 Nov 2018
Learning from Large-scale Noisy Web Data with Ubiquitous Reweighting for Image Classification
Jia Li
Yafei Song
Jian-Qi Zhu
Lele Cheng
Ying Su
Lin Ye
P. Yuan
Shumin Han
NoLa
64
23
0
02 Nov 2018
Engaging Image Captioning Via Personality
Kurt Shuster
Samuel Humeau
Hexiang Hu
Antoine Bordes
Jason Weston
87
152
0
25 Oct 2018
Cross-Modal and Hierarchical Modeling of Video and Text
Bowen Zhang
Hexiang Hu
Fei Sha
BDL
AI4TS
75
191
0
16 Oct 2018
Learning Inward Scaled Hypersphere Embedding: Exploring Projections in Higher Dimensions
Muhammad Kamran Janjua
Shah Nawaz
Alessandro Calefati
I. Gallo
23
0
0
16 Oct 2018
Image and Encoded Text Fusion for Multi-Modal Classification
I. Gallo
Alessandro Calefati
Shah Nawaz
Muhammad Kamran Janjua
23
37
0
03 Oct 2018
Exploration on Grounded Word Embedding: Matching Words and Images with Image-Enhanced Skip-Gram Model
Ruixuan Luo
24
0
0
08 Sep 2018
Visual Coreference Resolution in Visual Dialog using Neural Module Networks
Satwik Kottur
José M. F. Moura
Devi Parikh
Dhruv Batra
Marcus Rohrbach
74
165
0
06 Sep 2018
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
116
643
0
05 Sep 2018
Learning to Describe Differences Between Pairs of Similar Images
Harsh Jhamtani
Taylor Berg-Kirkpatrick
80
155
0
31 Aug 2018
Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval
Niluthpol Chowdhury Mithun
Yikang Shen
Evangelos E. Papalexakis
Amit K. Roy-Chowdhury
67
77
0
23 Aug 2018
Learning to Learn from Web Data through Deep Semantic Embeddings
Raul Gomez
Lluís Gómez
J. Gibert
Dimosthenis Karatzas
114
22
0
20 Aug 2018
Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition
Huajie Jiang
Ruiping Wang
Shiguang Shan
Xilin Chen
VLM
61
118
0
24 Jul 2018
Learning Multimodal Representations for Unseen Activities
A. Piergiovanni
Michael S. Ryoo
SSL
39
4
0
21 Jun 2018
Bilinear Attention Networks
Jin-Hwa Kim
Jaehyun Jun
Byoung-Tak Zhang
AIMat
104
879
0
21 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
138
141
0
02 May 2018
Dialog-based Interactive Image Retrieval
Xiaoxiao Guo
Hui Wu
Yu Cheng
Steven J. Rennie
Gerald Tesauro
Rogerio Feris
139
208
0
01 May 2018
Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch
S. Dey
Anjan Dutta
Suman K. Ghosh
Ernest Valveny
Josep Lladós
Umapada Pal
116
24
0
28 Apr 2018
Large-Scale Visual Relationship Understanding
Ji Zhang
Yannis Kalantidis
Marcus Rohrbach
Manohar Paluri
Ahmed Elgammal
Mohamed Elhoseiny
67
169
0
27 Apr 2018
Cross-Modal Retrieval with Implicit Concept Association
Yale Song
M. Soleymani
47
4
0
12 Apr 2018
Imagine This! Scripts to Compositions to Videos
Tanmay Gupta
Dustin Schwenk
Ali Farhadi
Derek Hoiem
Aniruddha Kembhavi
CoGe
VGen
148
91
0
10 Apr 2018
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Antoine Miech
Ivan Laptev
Josef Sivic
80
235
0
07 Apr 2018
Finding beans in burgers: Deep semantic-visual embedding with localization
Martin Engilberge
Louis Chevallier
P. Pérez
Matthieu Cord
81
96
0
05 Apr 2018
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
Raymond A. Yeh
Jinjun Xiong
Wen-mei W. Hwu
Minh Do
Alex Schwing
51
57
0
29 Mar 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts
Raymond A. Yeh
Minh Do
Alex Schwing
54
40
0
29 Mar 2018
Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
Kevin Chen
Chris Choy
Manolis Savva
Angel X. Chang
Thomas Funkhouser
Silvio Savarese
3DV
67
253
0
22 Mar 2018
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
116
1,163
0
21 Mar 2018
Learning Unsupervised Visual Grounding Through Semantic Self-Supervision
Syed Ashar Javed
Shreyas Saxena
Vineet Gandhi
SSL
67
25
0
17 Mar 2018
Discriminability objective for training descriptive captions
Ruotian Luo
Brian L. Price
Scott D. Cohen
Gregory Shakhnarovich
115
203
0
12 Mar 2018
MAttNet: Modular Attention Network for Referring Expression Comprehension
Licheng Yu
Zhe Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Joey Tianyi Zhou
Tamara L. Berg
ObjD
125
833
0
24 Jan 2018
Cross-modal Embeddings for Video and Audio Retrieval
Dídac Surís
A. Duarte
Amaia Salvador
Jordi Torres
Xavier Giró-i-Nieto
SSL
71
69
0
07 Jan 2018
Object Referring in Videos with Language and Human Gaze
A. Vasudevan
Dengxin Dai
Luc Van Gool
VOS
104
76
0
04 Jan 2018
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
118
530
0
18 Dec 2017
Learning Semantic Concepts and Order for Image and Sentence Matching
Yan Huang
Qi Wu
Liang Wang
VLM
85
305
0
06 Dec 2017
Conditional Image-Text Embedding Networks
Bryan A. Plummer
Paige Kordas
M. Kiapour
Shuai Zheng
Robinson Piramuthu
Svetlana Lazebnik
85
118
0
22 Nov 2017
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space
Liwei Wang
Alex Schwing
Svetlana Lazebnik
CoGe
111
175
0
19 Nov 2017
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
Jiuxiang Gu
Jianfei Cai
Shafiq Joty
Li Niu
G. Wang
VLM
108
361
0
17 Nov 2017
Deep Matching Autoencoders
Tanmoy Mukherjee
M. Yamada
Timothy M. Hospedales
66
17
0
16 Nov 2017
Dual-Path Convolutional Image-Text Embeddings with Instance Loss
Zhedong Zheng
Liang Zheng
Michael Garrett
Yi Yang
Mingliang Xu
Yi-Dong Shen
176
479
0
15 Nov 2017
Content-based Representations of audio using Siamese neural networks
Pranay Manocha
Rohan Badlani
Anurag Kumar
Ankit Parag Shah
Benjamin Elizalde
Bhiksha Raj
72
33
0
30 Oct 2017
Predicting Visual Features from Text for Image and Video Caption Retrieval
Jianfeng Dong
Xirong Li
Cees G. M. Snoek
70
226
0
05 Sep 2017
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Chuang Gan
Yandong Li
Haoxiang Li
Chen Sun
Boqing Gong
101
127
0
15 Aug 2017
Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval
Yuming Shen
Li Liu
Ling Shao
Jingkuan Song
65
49
0
08 Aug 2017
Identity-Aware Textual-Visual Matching with Latent Co-attention
Shuang Li
Tong Xiao
Hongsheng Li
Wei Yang
Xiaogang Wang
103
229
0
07 Aug 2017
Automatic Spatially-aware Fashion Concept Discovery
Xintong Han
Zuxuan Wu
Phoenix X. Huang
Xiao Zhang
Menglong Zhu
Yuan Li
Yang Zhao
L. Davis
85
272
0
03 Aug 2017
SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set
William N. Havard
Laurent Besacier
O. Rosec
83
28
0
26 Jul 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
136
2,953
0
26 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
82
139
0
03 May 2017
Spatio-temporal Person Retrieval via Natural Language Queries
Masataka Yamaguchi
Kuniaki Saito
Yoshitaka Ushiku
Tatsuya Harada
94
58
0
26 Apr 2017
Deep Extreme Multi-label Learning
Wenjie Zhang
Junchi Yan
Xiangfeng Wang
H. Zha
56
123
0
12 Apr 2017
Previous
1
2
3
4
5
Next