v1v2 (latest)

Learning Deep Structure-Preserving Image-Text Embeddings

19 November 2015

Liwei Wang

Yin Li

Svetlana Lazebnik

ArXiv (abs)PDF HTML

Papers citing "Learning Deep Structure-Preserving Image-Text Embeddings"

50 / 222 papers shown

Title
Cross and Learn: Cross-Modal Self-Supervision Nawid Sayed Biagio Brattoli Bjorn Ommer SSL 94 78 0 09 Nov 2018
Learning from Large-scale Noisy Web Data with Ubiquitous Reweighting for Image Classification Jia Li Yafei Song Jian-Qi Zhu Lele Cheng Ying Su Lin Ye P. Yuan Shumin Han NoLa 64 23 0 02 Nov 2018
Engaging Image Captioning Via Personality Kurt Shuster Samuel Humeau Hexiang Hu Antoine Bordes Jason Weston 87 152 0 25 Oct 2018
Cross-Modal and Hierarchical Modeling of Video and Text Bowen Zhang Hexiang Hu Fei Sha BDL AI4TS 75 191 0 16 Oct 2018
Learning Inward Scaled Hypersphere Embedding: Exploring Projections in Higher Dimensions Muhammad Kamran Janjua Shah Nawaz Alessandro Calefati I. Gallo 23 0 0 16 Oct 2018
Image and Encoded Text Fusion for Multi-Modal Classification I. Gallo Alessandro Calefati Shah Nawaz Muhammad Kamran Janjua 23 37 0 03 Oct 2018
Exploration on Grounded Word Embedding: Matching Words and Images with Image-Enhanced Skip-Gram Model Ruixuan Luo 24 0 0 08 Sep 2018
Visual Coreference Resolution in Visual Dialog using Neural Module Networks Satwik Kottur José M. F. Moura Devi Parikh Dhruv Batra Marcus Rohrbach 74 165 0 06 Sep 2018
TVQA: Localized, Compositional Video Question Answering Muhammad Abdul Wahab Licheng Yu Mounir Nasr Allah Tamara L. Berg 116 643 0 05 Sep 2018
Learning to Describe Differences Between Pairs of Similar Images Harsh Jhamtani Taylor Berg-Kirkpatrick 80 155 0 31 Aug 2018
Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval Niluthpol Chowdhury Mithun Yikang Shen Evangelos E. Papalexakis Amit K. Roy-Chowdhury 67 77 0 23 Aug 2018
Learning to Learn from Web Data through Deep Semantic Embeddings Raul Gomez Lluís Gómez J. Gibert Dimosthenis Karatzas 114 22 0 20 Aug 2018
Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition Huajie Jiang Ruiping Wang Shiguang Shan Xilin Chen VLM 61 118 0 24 Jul 2018
Learning Multimodal Representations for Unseen Activities A. Piergiovanni Michael S. Ryoo SSL 39 4 0 21 Jun 2018
Bilinear Attention Networks Jin-Hwa Kim Jaehyun Jun Byoung-Tak Zhang AIMat 104 879 0 21 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity Arsha Nagrani Samuel Albanie Andrew Zisserman SSL 138 141 0 02 May 2018
Dialog-based Interactive Image Retrieval Xiaoxiao Guo Hui Wu Yu Cheng Steven J. Rennie Gerald Tesauro Rogerio Feris 139 208 0 01 May 2018
Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch S. Dey Anjan Dutta Suman K. Ghosh Ernest Valveny Josep Lladós Umapada Pal 116 24 0 28 Apr 2018
Large-Scale Visual Relationship Understanding Ji Zhang Yannis Kalantidis Marcus Rohrbach Manohar Paluri Ahmed Elgammal Mohamed Elhoseiny 67 169 0 27 Apr 2018
Cross-Modal Retrieval with Implicit Concept Association Yale Song M. Soleymani 47 4 0 12 Apr 2018
Imagine This! Scripts to Compositions to Videos Tanmay Gupta Dustin Schwenk Ali Farhadi Derek Hoiem Aniruddha Kembhavi CoGe VGen 148 91 0 10 Apr 2018
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data Antoine Miech Ivan Laptev Josef Sivic 80 235 0 07 Apr 2018
Finding beans in burgers: Deep semantic-visual embedding with localization Martin Engilberge Louis Chevallier P. Pérez Matthieu Cord 81 96 0 05 Apr 2018
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts Raymond A. Yeh Jinjun Xiong Wen-mei W. Hwu Minh Do Alex Schwing 51 57 0 29 Mar 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts Raymond A. Yeh Minh Do Alex Schwing 54 40 0 29 Mar 2018
Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings Kevin Chen Chris Choy Manolis Savva Angel X. Chang Thomas Funkhouser Silvio Savarese 3DV 67 253 0 22 Mar 2018
Stacked Cross Attention for Image-Text Matching Kuang-Huei Lee Xi Chen G. Hua Houdong Hu Xiaodong He 116 1,163 0 21 Mar 2018
Learning Unsupervised Visual Grounding Through Semantic Self-Supervision Syed Ashar Javed Shreyas Saxena Vineet Gandhi SSL 67 25 0 17 Mar 2018
Discriminability objective for training descriptive captions Ruotian Luo Brian L. Price Scott D. Cohen Gregory Shakhnarovich 115 203 0 12 Mar 2018
MAttNet: Modular Attention Network for Referring Expression Comprehension Licheng Yu Zhe Lin Xiaohui Shen Jimei Yang Xin Lu Joey Tianyi Zhou Tamara L. Berg ObjD 125 833 0 24 Jan 2018
Cross-modal Embeddings for Video and Audio Retrieval Dídac Surís A. Duarte Amaia Salvador Jordi Torres Xavier Giró-i-Nieto SSL 71 69 0 07 Jan 2018
Object Referring in Videos with Language and Human Gaze A. Vasudevan Dengxin Dai Luc Van Gool VOS 104 76 0 04 Jan 2018
Objects that Sound Relja Arandjelović Andrew Zisserman ObjD VOS 118 530 0 18 Dec 2017
Learning Semantic Concepts and Order for Image and Sentence Matching Yan Huang Qi Wu Liang Wang VLM 85 305 0 06 Dec 2017
Conditional Image-Text Embedding Networks Bryan A. Plummer Paige Kordas M. Kiapour Shuai Zheng Robinson Piramuthu Svetlana Lazebnik 85 118 0 22 Nov 2017
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space Liwei Wang Alex Schwing Svetlana Lazebnik CoGe 111 175 0 19 Nov 2017
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models Jiuxiang Gu Jianfei Cai Shafiq Joty Li Niu G. Wang VLM 108 361 0 17 Nov 2017
Deep Matching Autoencoders Tanmoy Mukherjee M. Yamada Timothy M. Hospedales 66 17 0 16 Nov 2017
Dual-Path Convolutional Image-Text Embeddings with Instance Loss Zhedong Zheng Liang Zheng Michael Garrett Yi Yang Mingliang Xu Yi-Dong Shen 176 479 0 15 Nov 2017
Content-based Representations of audio using Siamese neural networks Pranay Manocha Rohan Badlani Anurag Kumar Ankit Parag Shah Benjamin Elizalde Bhiksha Raj 72 33 0 30 Oct 2017
Predicting Visual Features from Text for Image and Video Caption Retrieval Jianfeng Dong Xirong Li Cees G. M. Snoek 70 226 0 05 Sep 2017
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation Chuang Gan Yandong Li Haoxiang Li Chen Sun Boqing Gong 101 127 0 15 Aug 2017
Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval Yuming Shen Li Liu Ling Shao Jingkuan Song 65 49 0 08 Aug 2017
Identity-Aware Textual-Visual Matching with Latent Co-attention Shuang Li Tong Xiao Hongsheng Li Wei Yang Xiaogang Wang 103 229 0 07 Aug 2017
Automatic Spatially-aware Fashion Concept Discovery Xintong Han Zuxuan Wu Phoenix X. Huang Xiao Zhang Menglong Zhu Yuan Li Yang Zhao L. Davis 85 272 0 03 Aug 2017
SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set William N. Havard Laurent Besacier O. Rosec 83 28 0 26 Jul 2017
Multimodal Machine Learning: A Survey and Taxonomy T. Baltrušaitis Chaitanya Ahuja Louis-Philippe Morency 136 2,953 0 26 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures Fanyi Xiao Leonid Sigal Yong Jae Lee 82 139 0 03 May 2017
Spatio-temporal Person Retrieval via Natural Language Queries Masataka Yamaguchi Kuniaki Saito Yoshitaka Ushiku Tatsuya Harada 94 58 0 26 Apr 2017
Deep Extreme Multi-label Learning Wenjie Zhang Junchi Yan Xiangfeng Wang H. Zha 56 123 0 12 Apr 2017