ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.06078
  4. Cited By
Learning Deep Structure-Preserving Image-Text Embeddings
v1v2 (latest)

Learning Deep Structure-Preserving Image-Text Embeddings

19 November 2015
Liwei Wang
Yin Li
Svetlana Lazebnik
ArXiv (abs)PDFHTML

Papers citing "Learning Deep Structure-Preserving Image-Text Embeddings"

50 / 222 papers shown
Title
Cross and Learn: Cross-Modal Self-Supervision
Cross and Learn: Cross-Modal Self-Supervision
Nawid Sayed
Biagio Brattoli
Bjorn Ommer
SSL
94
78
0
09 Nov 2018
Learning from Large-scale Noisy Web Data with Ubiquitous Reweighting for
  Image Classification
Learning from Large-scale Noisy Web Data with Ubiquitous Reweighting for Image Classification
Jia Li
Yafei Song
Jian-Qi Zhu
Lele Cheng
Ying Su
Lin Ye
P. Yuan
Shumin Han
NoLa
64
23
0
02 Nov 2018
Engaging Image Captioning Via Personality
Engaging Image Captioning Via Personality
Kurt Shuster
Samuel Humeau
Hexiang Hu
Antoine Bordes
Jason Weston
87
152
0
25 Oct 2018
Cross-Modal and Hierarchical Modeling of Video and Text
Cross-Modal and Hierarchical Modeling of Video and Text
Bowen Zhang
Hexiang Hu
Fei Sha
BDLAI4TS
75
191
0
16 Oct 2018
Learning Inward Scaled Hypersphere Embedding: Exploring Projections in
  Higher Dimensions
Learning Inward Scaled Hypersphere Embedding: Exploring Projections in Higher Dimensions
Muhammad Kamran Janjua
Shah Nawaz
Alessandro Calefati
I. Gallo
23
0
0
16 Oct 2018
Image and Encoded Text Fusion for Multi-Modal Classification
Image and Encoded Text Fusion for Multi-Modal Classification
I. Gallo
Alessandro Calefati
Shah Nawaz
Muhammad Kamran Janjua
23
37
0
03 Oct 2018
Exploration on Grounded Word Embedding: Matching Words and Images with
  Image-Enhanced Skip-Gram Model
Exploration on Grounded Word Embedding: Matching Words and Images with Image-Enhanced Skip-Gram Model
Ruixuan Luo
24
0
0
08 Sep 2018
Visual Coreference Resolution in Visual Dialog using Neural Module
  Networks
Visual Coreference Resolution in Visual Dialog using Neural Module Networks
Satwik Kottur
José M. F. Moura
Devi Parikh
Dhruv Batra
Marcus Rohrbach
74
165
0
06 Sep 2018
TVQA: Localized, Compositional Video Question Answering
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
116
643
0
05 Sep 2018
Learning to Describe Differences Between Pairs of Similar Images
Learning to Describe Differences Between Pairs of Similar Images
Harsh Jhamtani
Taylor Berg-Kirkpatrick
80
155
0
31 Aug 2018
Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval
Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval
Niluthpol Chowdhury Mithun
Yikang Shen
Evangelos E. Papalexakis
Amit K. Roy-Chowdhury
67
77
0
23 Aug 2018
Learning to Learn from Web Data through Deep Semantic Embeddings
Learning to Learn from Web Data through Deep Semantic Embeddings
Raul Gomez
Lluís Gómez
J. Gibert
Dimosthenis Karatzas
114
22
0
20 Aug 2018
Learning Class Prototypes via Structure Alignment for Zero-Shot
  Recognition
Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition
Huajie Jiang
Ruiping Wang
Shiguang Shan
Xilin Chen
VLM
61
118
0
24 Jul 2018
Learning Multimodal Representations for Unseen Activities
Learning Multimodal Representations for Unseen Activities
A. Piergiovanni
Michael S. Ryoo
SSL
39
4
0
21 Jun 2018
Bilinear Attention Networks
Bilinear Attention Networks
Jin-Hwa Kim
Jaehyun Jun
Byoung-Tak Zhang
AIMat
104
879
0
21 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
Learnable PINs: Cross-Modal Embeddings for Person Identity
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
138
141
0
02 May 2018
Dialog-based Interactive Image Retrieval
Dialog-based Interactive Image Retrieval
Xiaoxiao Guo
Hui Wu
Yu Cheng
Steven J. Rennie
Gerald Tesauro
Rogerio Feris
139
208
0
01 May 2018
Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval
  using Text and Sketch
Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch
S. Dey
Anjan Dutta
Suman K. Ghosh
Ernest Valveny
Josep Lladós
Umapada Pal
116
24
0
28 Apr 2018
Large-Scale Visual Relationship Understanding
Large-Scale Visual Relationship Understanding
Ji Zhang
Yannis Kalantidis
Marcus Rohrbach
Manohar Paluri
Ahmed Elgammal
Mohamed Elhoseiny
67
169
0
27 Apr 2018
Cross-Modal Retrieval with Implicit Concept Association
Cross-Modal Retrieval with Implicit Concept Association
Yale Song
M. Soleymani
47
4
0
12 Apr 2018
Imagine This! Scripts to Compositions to Videos
Imagine This! Scripts to Compositions to Videos
Tanmay Gupta
Dustin Schwenk
Ali Farhadi
Derek Hoiem
Aniruddha Kembhavi
CoGeVGen
148
91
0
10 Apr 2018
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Antoine Miech
Ivan Laptev
Josef Sivic
80
235
0
07 Apr 2018
Finding beans in burgers: Deep semantic-visual embedding with
  localization
Finding beans in burgers: Deep semantic-visual embedding with localization
Martin Engilberge
Louis Chevallier
P. Pérez
Matthieu Cord
81
96
0
05 Apr 2018
Interpretable and Globally Optimal Prediction for Textual Grounding
  using Image Concepts
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
Raymond A. Yeh
Jinjun Xiong
Wen-mei W. Hwu
Minh Do
Alex Schwing
51
57
0
29 Mar 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts
Unsupervised Textual Grounding: Linking Words to Image Concepts
Raymond A. Yeh
Minh Do
Alex Schwing
54
40
0
29 Mar 2018
Text2Shape: Generating Shapes from Natural Language by Learning Joint
  Embeddings
Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
Kevin Chen
Chris Choy
Manolis Savva
Angel X. Chang
Thomas Funkhouser
Silvio Savarese
3DV
67
253
0
22 Mar 2018
Stacked Cross Attention for Image-Text Matching
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
116
1,163
0
21 Mar 2018
Learning Unsupervised Visual Grounding Through Semantic Self-Supervision
Learning Unsupervised Visual Grounding Through Semantic Self-Supervision
Syed Ashar Javed
Shreyas Saxena
Vineet Gandhi
SSL
67
25
0
17 Mar 2018
Discriminability objective for training descriptive captions
Discriminability objective for training descriptive captions
Ruotian Luo
Brian L. Price
Scott D. Cohen
Gregory Shakhnarovich
115
203
0
12 Mar 2018
MAttNet: Modular Attention Network for Referring Expression
  Comprehension
MAttNet: Modular Attention Network for Referring Expression Comprehension
Licheng Yu
Zhe Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Joey Tianyi Zhou
Tamara L. Berg
ObjD
125
833
0
24 Jan 2018
Cross-modal Embeddings for Video and Audio Retrieval
Cross-modal Embeddings for Video and Audio Retrieval
Dídac Surís
A. Duarte
Amaia Salvador
Jordi Torres
Xavier Giró-i-Nieto
SSL
71
69
0
07 Jan 2018
Object Referring in Videos with Language and Human Gaze
Object Referring in Videos with Language and Human Gaze
A. Vasudevan
Dengxin Dai
Luc Van Gool
VOS
104
76
0
04 Jan 2018
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjDVOS
118
530
0
18 Dec 2017
Learning Semantic Concepts and Order for Image and Sentence Matching
Learning Semantic Concepts and Order for Image and Sentence Matching
Yan Huang
Qi Wu
Liang Wang
VLM
85
305
0
06 Dec 2017
Conditional Image-Text Embedding Networks
Conditional Image-Text Embedding Networks
Bryan A. Plummer
Paige Kordas
M. Kiapour
Shuai Zheng
Robinson Piramuthu
Svetlana Lazebnik
85
118
0
22 Nov 2017
Diverse and Accurate Image Description Using a Variational Auto-Encoder
  with an Additive Gaussian Encoding Space
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space
Liwei Wang
Alex Schwing
Svetlana Lazebnik
CoGe
111
175
0
19 Nov 2017
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval
  with Generative Models
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
Jiuxiang Gu
Jianfei Cai
Shafiq Joty
Li Niu
G. Wang
VLM
108
361
0
17 Nov 2017
Deep Matching Autoencoders
Deep Matching Autoencoders
Tanmoy Mukherjee
M. Yamada
Timothy M. Hospedales
66
17
0
16 Nov 2017
Dual-Path Convolutional Image-Text Embeddings with Instance Loss
Dual-Path Convolutional Image-Text Embeddings with Instance Loss
Zhedong Zheng
Liang Zheng
Michael Garrett
Yi Yang
Mingliang Xu
Yi-Dong Shen
176
479
0
15 Nov 2017
Content-based Representations of audio using Siamese neural networks
Content-based Representations of audio using Siamese neural networks
Pranay Manocha
Rohan Badlani
Anurag Kumar
Ankit Parag Shah
Benjamin Elizalde
Bhiksha Raj
72
33
0
30 Oct 2017
Predicting Visual Features from Text for Image and Video Caption
  Retrieval
Predicting Visual Features from Text for Image and Video Caption Retrieval
Jianfeng Dong
Xirong Li
Cees G. M. Snoek
70
226
0
05 Sep 2017
VQS: Linking Segmentations to Questions and Answers for Supervised
  Attention in VQA and Question-Focused Semantic Segmentation
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Chuang Gan
Yandong Li
Haoxiang Li
Chen Sun
Boqing Gong
101
127
0
15 Aug 2017
Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual
  Cross Retrieval
Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval
Yuming Shen
Li Liu
Ling Shao
Jingkuan Song
65
49
0
08 Aug 2017
Identity-Aware Textual-Visual Matching with Latent Co-attention
Identity-Aware Textual-Visual Matching with Latent Co-attention
Shuang Li
Tong Xiao
Hongsheng Li
Wei Yang
Xiaogang Wang
103
229
0
07 Aug 2017
Automatic Spatially-aware Fashion Concept Discovery
Automatic Spatially-aware Fashion Concept Discovery
Xintong Han
Zuxuan Wu
Phoenix X. Huang
Xiao Zhang
Menglong Zhu
Yuan Li
Yang Zhao
L. Davis
85
272
0
03 Aug 2017
SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO
  Data Set
SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set
William N. Havard
Laurent Besacier
O. Rosec
83
28
0
26 Jul 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
136
2,953
0
26 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
82
139
0
03 May 2017
Spatio-temporal Person Retrieval via Natural Language Queries
Spatio-temporal Person Retrieval via Natural Language Queries
Masataka Yamaguchi
Kuniaki Saito
Yoshitaka Ushiku
Tatsuya Harada
94
58
0
26 Apr 2017
Deep Extreme Multi-label Learning
Deep Extreme Multi-label Learning
Wenjie Zhang
Junchi Yan
Xiangfeng Wang
H. Zha
56
123
0
12 Apr 2017
Previous
12345
Next