ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.06078
  4. Cited By
Learning Deep Structure-Preserving Image-Text Embeddings
v1v2 (latest)

Learning Deep Structure-Preserving Image-Text Embeddings

19 November 2015
Liwei Wang
Yin Li
Svetlana Lazebnik
ArXiv (abs)PDFHTML

Papers citing "Learning Deep Structure-Preserving Image-Text Embeddings"

50 / 222 papers shown
Title
End-to-End Learning of Visual Representations from Uncurated
  Instructional Videos
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGenSSL
154
713
0
13 Dec 2019
A Real-time Global Inference Network for One-stage Referring Expression
  Comprehension
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
78
63
0
07 Dec 2019
Neural Storyboard Artist: Visualizing Stories with Coherent Image
  Sequences
Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen
Bei Liu
Jianlong Fu
Ruihua Song
Qin Jin
Pingping Lin
Xiaoyu Qi
Chunting Wang
Jin Zhou
DiffM
69
33
0
24 Nov 2019
Learning Cross-modal Context Graph for Visual Grounding
Learning Cross-modal Context Graph for Visual Grounding
Yongfei Liu
Bo Wan
Xiao-Dan Zhu
Xuming He
91
91
0
20 Nov 2019
Ladder Loss for Coherent Visual-Semantic Embedding
Ladder Loss for Coherent Visual-Semantic Embedding
Mo Zhou
Zhenxing Niu
Le Wang
Zhanning Gao
Qilin Zhang
G. Hua
86
40
0
18 Nov 2019
HUSE: Hierarchical Universal Semantic Embeddings
HUSE: Hierarchical Universal Semantic Embeddings
P. Narayana
Aniket Pednekar
A. Krishnamoorthy
Kazoo Sone
Sugato Basu
42
10
0
14 Nov 2019
Learning Relationships between Text, Audio, and Video via Deep Canonical
  Correlation for Multimodal Language Analysis
Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis
Zhongkai Sun
P. Sarma
W. Sethares
Yingyu Liang
89
328
0
13 Nov 2019
Drill-down: Interactive Retrieval of Complex Scenes using Natural
  Language Queries
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
Fuwen Tan
Paola Cascante-Bonilla
Xiaoxiao Guo
Hui Wu
Song Feng
Vicente Ordonez
60
30
0
10 Nov 2019
A Graph-Based Framework to Bridge Movies and Synopses
A Graph-Based Framework to Bridge Movies and Synopses
Yu Xiong
Chengyi Zhang
Lingfeng Guo
Hang Zhou
Bolei Zhou
Dahua Lin
79
63
0
24 Oct 2019
Target-Oriented Deformation of Visual-Semantic Embedding Space
Target-Oriented Deformation of Visual-Semantic Embedding Space
Takashi Matsubara
55
7
0
15 Oct 2019
Cross-modal Scene Graph Matching for Relationship-aware Image-Text
  Retrieval
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Sijin Wang
Ruiping Wang
Ziwei Yao
Shiguang Shan
Xilin Chen
3DV
88
213
0
11 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual
  Multimodal Representations
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
138
25
0
30 Sep 2019
Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints
Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints
David Semedo
João Magalhães
40
11
0
30 Sep 2019
UNITER: UNiversal Image-TExt Representation Learning
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLMOT
129
448
0
25 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image
  Captioning with Neural Scene Graph Generators
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
67
37
0
22 Sep 2019
Video Skimming: Taxonomy and Comprehensive Survey
Video Skimming: Taxonomy and Comprehensive Survey
Vivekraj V. K.
Debashis Sen
Balasubramanian Raman
66
10
0
21 Sep 2019
Dynamic Graph Attention for Referring Expression Comprehension
Dynamic Graph Attention for Referring Expression Comprehension
Sibei Yang
Guanbin Li
Yizhou Yu
OCL
86
222
0
18 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
91
308
0
12 Sep 2019
Picture What you Read
Picture What you Read
I. Gallo
Shah Nawaz
Alessandro Calefati
Riccardo La Grassa
Nicola Landro
DiffM
64
0
0
09 Sep 2019
Adversarial Representation Learning for Text-to-Image Matching
Adversarial Representation Learning for Text-to-Image Matching
N. Sarafianos
Xiang Xu
I. Kakadiaris
GAN
117
188
0
28 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
A Fast and Accurate One-Stage Approach to Visual Grounding
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
60
366
0
18 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSLVLMMLLM
216
907
0
16 Aug 2019
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Tan Wang
Xing Xu
Yang Yang
Alan Hanjalic
Heng Tao Shen
Jingkuan Song
56
149
0
12 Aug 2019
Fine-Grained Action Retrieval Through Multiple Parts-of-Speech
  Embeddings
Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings
Michael Wray
Diane Larlus
G. Csurka
Dima Damen
117
154
0
09 Aug 2019
Semi Supervised Phrase Localization in a Bidirectional Caption-Image
  Retrieval Framework
Semi Supervised Phrase Localization in a Bidirectional Caption-Image Retrieval Framework
Deepan Das
Noor Mohammed Ghouse
Shashank Verma
Yin Li
23
0
0
08 Aug 2019
Task-Driven Common Representation Learning via Bridge Neural Network
Task-Driven Common Representation Learning via Bridge Neural Network
Yao Xu
Xueshuang Xiang
Meiyu Huang
SSL
41
5
0
26 Jun 2019
ParNet: Position-aware Aggregated Relation Network for Image-Text
  matching
ParNet: Position-aware Aggregated Relation Network for Image-Text matching
Yaxian Xia
Lun Huang
Wenmin Wang
Xiao-Yong Wei
Jie Chen
121
1
0
17 Jun 2019
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Yale Song
M. Soleymani
72
247
0
11 Jun 2019
Joint Visual Grounding with Language Scene Graphs
Joint Visual Grounding with Language Scene Graphs
Daqing Liu
Hanwang Zhang
Zhengjun Zha
Meng Wang
Qianru Sun
69
6
0
09 Jun 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
130
1,211
0
07 Jun 2019
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
Zhenfang Chen
Lin Ma
Wenhan Luo
Kwan-Yee K. Wong
95
103
0
06 Jun 2019
Learning to Compose and Reason with Language Tree Structures for Visual
  Grounding
Learning to Compose and Reason with Language Tree Structures for Visual Grounding
Richang Hong
Daqing Liu
Xiaoyu Mo
Xiangnan He
Hanwang Zhang
ReLMLRM
98
165
0
05 Jun 2019
Saliency-Guided Attention Network for Image-Sentence Matching
Saliency-Guided Attention Network for Image-Sentence Matching
Zhong Ji
Haoran Wang
Jiawei Han
Yanwei Pang
69
89
0
20 Apr 2019
Referring to Objects in Videos using Spatio-Temporal Identifying
  Descriptions
Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions
Peratham Wiriyathammabhum
Abhinav Shrivastava
Vlad I. Morariu
L. Davis
60
5
0
08 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
Amit K. Roy-Chowdhury
135
194
0
05 Apr 2019
Aiding Intra-Text Representations with Visual Context for Multimodal
  Named Entity Recognition
Aiding Intra-Text Representations with Visual Context for Multimodal Named Entity Recognition
Omer Arshad
I. Gallo
Shah Nawaz
Alessandro Calefati
44
43
0
02 Apr 2019
Neural Sequential Phrase Grounding (SeqGROUND)
Neural Sequential Phrase Grounding (SeqGROUND)
Pelin Dogan
Leonid Sigal
Markus Gross
ObjD
76
52
0
18 Mar 2019
Graphical Contrastive Losses for Scene Graph Parsing
Graphical Contrastive Losses for Scene Graph Parsing
Ji Zhang
Kevin J. Shih
Ahmed Elgammal
Andrew Tao
Bryan Catanzaro
99
233
0
07 Mar 2019
Geometric Matrix Completion with Deep Conditional Random Fields
Geometric Matrix Completion with Deep Conditional Random Fields
Duc Minh Nguyen
A. Calderbank
Nikos Deligiannis
67
7
0
29 Jan 2019
Learning Shared Semantic Space with Correlation Alignment for
  Cross-modal Event Retrieval
Learning Shared Semantic Space with Correlation Alignment for Cross-modal Event Retrieval
Zhenguo Yang
Zehang Lin
Peipei Kang
Jianming Lv
Qing Li
Wenyin Liu
3DPC
91
26
0
14 Jan 2019
Action2Vec: A Crossmodal Embedding Approach to Action Learning
Action2Vec: A Crossmodal Embedding Approach to Action Learning
Meera Hahn
Andrew Silva
James M. Rehg
78
58
0
02 Jan 2019
Scene Graph Reasoning with Prior Visual Relationship for Visual Question
  Answering
Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering
Zhuoqian Yang
Zengchang Qin
Jing Yu
Yue Hu
GNN
80
16
0
23 Dec 2018
Sequential Attention GAN for Interactive Image Editing
Sequential Attention GAN for Interactive Image Editing
Yu Cheng
Zhe Gan
Yitong Li
Jingjing Liu
Jianfeng Gao
77
98
0
20 Dec 2018
Composing Text and Image for Image Retrieval - An Empirical Odyssey
Composing Text and Image for Image Retrieval - An Empirical Odyssey
Nam S. Vo
Lu Jiang
Chen Sun
Kevin Patrick Murphy
Li Li
Li Fei-Fei
James Hays
CoGe
71
370
0
18 Dec 2018
Detecting unseen visual relations using analogies
Detecting unseen visual relations using analogies
Julia Peyre
Ivan Laptev
Cordelia Schmid
Josef Sivic
58
18
0
13 Dec 2018
Adversarial Learning of Semantic Relevance in Text to Image Synthesis
Adversarial Learning of Semantic Relevance in Text to Image Synthesis
Miriam Cha
Youngjune Gwon
H. T. Kung
GAN
75
54
0
12 Dec 2018
Neighbourhood Watch: Referring Expression Comprehension via
  Language-guided Graph Attention Networks
Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks
Peng Wang
Qi Wu
Jiewei Cao
Chunhua Shen
Lianli Gao
Anton Van Den Hengel
ObjD
95
256
0
12 Dec 2018
Domain-Aware SE Network for Sketch-based Image Retrieval with
  Multiplicative Euclidean Margin Softmax
Domain-Aware SE Network for Sketch-based Image Retrieval with Multiplicative Euclidean Margin Softmax
Peng Lu
Gao Huang
Hangyu Lin
Wenming Yang
G. Guo
Yanwei Fu
58
16
0
11 Dec 2018
PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase
  Grounding
PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding
Rama Kovvuri
Ram Nevatia
ObjD
77
17
0
07 Dec 2018
Multi-task Learning of Hierarchical Vision-Language Representation
Multi-task Learning of Hierarchical Vision-Language Representation
Duy-Kien Nguyen
Takayuki Okatani
105
52
0
03 Dec 2018
Previous
12345
Next