Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1511.06078
Cited By
v1
v2 (latest)
Learning Deep Structure-Preserving Image-Text Embeddings
19 November 2015
Liwei Wang
Yin Li
Svetlana Lazebnik
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning Deep Structure-Preserving Image-Text Embeddings"
50 / 222 papers shown
Title
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGen
SSL
154
713
0
13 Dec 2019
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
78
63
0
07 Dec 2019
Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen
Bei Liu
Jianlong Fu
Ruihua Song
Qin Jin
Pingping Lin
Xiaoyu Qi
Chunting Wang
Jin Zhou
DiffM
69
33
0
24 Nov 2019
Learning Cross-modal Context Graph for Visual Grounding
Yongfei Liu
Bo Wan
Xiao-Dan Zhu
Xuming He
91
91
0
20 Nov 2019
Ladder Loss for Coherent Visual-Semantic Embedding
Mo Zhou
Zhenxing Niu
Le Wang
Zhanning Gao
Qilin Zhang
G. Hua
86
40
0
18 Nov 2019
HUSE: Hierarchical Universal Semantic Embeddings
P. Narayana
Aniket Pednekar
A. Krishnamoorthy
Kazoo Sone
Sugato Basu
42
10
0
14 Nov 2019
Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis
Zhongkai Sun
P. Sarma
W. Sethares
Yingyu Liang
89
328
0
13 Nov 2019
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
Fuwen Tan
Paola Cascante-Bonilla
Xiaoxiao Guo
Hui Wu
Song Feng
Vicente Ordonez
60
30
0
10 Nov 2019
A Graph-Based Framework to Bridge Movies and Synopses
Yu Xiong
Chengyi Zhang
Lingfeng Guo
Hang Zhou
Bolei Zhou
Dahua Lin
79
63
0
24 Oct 2019
Target-Oriented Deformation of Visual-Semantic Embedding Space
Takashi Matsubara
55
7
0
15 Oct 2019
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Sijin Wang
Ruiping Wang
Ziwei Yao
Shiguang Shan
Xilin Chen
3DV
88
213
0
11 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
138
25
0
30 Sep 2019
Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints
David Semedo
João Magalhães
40
11
0
30 Sep 2019
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLM
OT
129
448
0
25 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
67
37
0
22 Sep 2019
Video Skimming: Taxonomy and Comprehensive Survey
Vivekraj V. K.
Debashis Sen
Balasubramanian Raman
66
10
0
21 Sep 2019
Dynamic Graph Attention for Referring Expression Comprehension
Sibei Yang
Guanbin Li
Yizhou Yu
OCL
86
222
0
18 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
91
308
0
12 Sep 2019
Picture What you Read
I. Gallo
Shah Nawaz
Alessandro Calefati
Riccardo La Grassa
Nicola Landro
DiffM
64
0
0
09 Sep 2019
Adversarial Representation Learning for Text-to-Image Matching
N. Sarafianos
Xiang Xu
I. Kakadiaris
GAN
117
188
0
28 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
60
366
0
18 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
216
907
0
16 Aug 2019
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Tan Wang
Xing Xu
Yang Yang
Alan Hanjalic
Heng Tao Shen
Jingkuan Song
56
149
0
12 Aug 2019
Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings
Michael Wray
Diane Larlus
G. Csurka
Dima Damen
117
154
0
09 Aug 2019
Semi Supervised Phrase Localization in a Bidirectional Caption-Image Retrieval Framework
Deepan Das
Noor Mohammed Ghouse
Shashank Verma
Yin Li
23
0
0
08 Aug 2019
Task-Driven Common Representation Learning via Bridge Neural Network
Yao Xu
Xueshuang Xiang
Meiyu Huang
SSL
41
5
0
26 Jun 2019
ParNet: Position-aware Aggregated Relation Network for Image-Text matching
Yaxian Xia
Lun Huang
Wenmin Wang
Xiao-Yong Wei
Jie Chen
121
1
0
17 Jun 2019
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Yale Song
M. Soleymani
72
247
0
11 Jun 2019
Joint Visual Grounding with Language Scene Graphs
Daqing Liu
Hanwang Zhang
Zhengjun Zha
Meng Wang
Qianru Sun
69
6
0
09 Jun 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
130
1,211
0
07 Jun 2019
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
Zhenfang Chen
Lin Ma
Wenhan Luo
Kwan-Yee K. Wong
95
103
0
06 Jun 2019
Learning to Compose and Reason with Language Tree Structures for Visual Grounding
Richang Hong
Daqing Liu
Xiaoyu Mo
Xiangnan He
Hanwang Zhang
ReLM
LRM
98
165
0
05 Jun 2019
Saliency-Guided Attention Network for Image-Sentence Matching
Zhong Ji
Haoran Wang
Jiawei Han
Yanwei Pang
69
89
0
20 Apr 2019
Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions
Peratham Wiriyathammabhum
Abhinav Shrivastava
Vlad I. Morariu
L. Davis
60
5
0
08 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
Amit K. Roy-Chowdhury
135
194
0
05 Apr 2019
Aiding Intra-Text Representations with Visual Context for Multimodal Named Entity Recognition
Omer Arshad
I. Gallo
Shah Nawaz
Alessandro Calefati
44
43
0
02 Apr 2019
Neural Sequential Phrase Grounding (SeqGROUND)
Pelin Dogan
Leonid Sigal
Markus Gross
ObjD
76
52
0
18 Mar 2019
Graphical Contrastive Losses for Scene Graph Parsing
Ji Zhang
Kevin J. Shih
Ahmed Elgammal
Andrew Tao
Bryan Catanzaro
99
233
0
07 Mar 2019
Geometric Matrix Completion with Deep Conditional Random Fields
Duc Minh Nguyen
A. Calderbank
Nikos Deligiannis
67
7
0
29 Jan 2019
Learning Shared Semantic Space with Correlation Alignment for Cross-modal Event Retrieval
Zhenguo Yang
Zehang Lin
Peipei Kang
Jianming Lv
Qing Li
Wenyin Liu
3DPC
91
26
0
14 Jan 2019
Action2Vec: A Crossmodal Embedding Approach to Action Learning
Meera Hahn
Andrew Silva
James M. Rehg
78
58
0
02 Jan 2019
Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering
Zhuoqian Yang
Zengchang Qin
Jing Yu
Yue Hu
GNN
80
16
0
23 Dec 2018
Sequential Attention GAN for Interactive Image Editing
Yu Cheng
Zhe Gan
Yitong Li
Jingjing Liu
Jianfeng Gao
77
98
0
20 Dec 2018
Composing Text and Image for Image Retrieval - An Empirical Odyssey
Nam S. Vo
Lu Jiang
Chen Sun
Kevin Patrick Murphy
Li Li
Li Fei-Fei
James Hays
CoGe
71
370
0
18 Dec 2018
Detecting unseen visual relations using analogies
Julia Peyre
Ivan Laptev
Cordelia Schmid
Josef Sivic
58
18
0
13 Dec 2018
Adversarial Learning of Semantic Relevance in Text to Image Synthesis
Miriam Cha
Youngjune Gwon
H. T. Kung
GAN
75
54
0
12 Dec 2018
Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks
Peng Wang
Qi Wu
Jiewei Cao
Chunhua Shen
Lianli Gao
Anton Van Den Hengel
ObjD
95
256
0
12 Dec 2018
Domain-Aware SE Network for Sketch-based Image Retrieval with Multiplicative Euclidean Margin Softmax
Peng Lu
Gao Huang
Hangyu Lin
Wenming Yang
G. Guo
Yanwei Fu
58
16
0
11 Dec 2018
PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding
Rama Kovvuri
Ram Nevatia
ObjD
77
17
0
07 Dec 2018
Multi-task Learning of Hierarchical Vision-Language Representation
Duy-Kien Nguyen
Takayuki Okatani
105
52
0
03 Dec 2018
Previous
1
2
3
4
5
Next