v1v2 (latest)

Learning Deep Structure-Preserving Image-Text Embeddings

19 November 2015

Liwei Wang

Yin Li

Svetlana Lazebnik

ArXiv (abs)PDF HTML

Papers citing "Learning Deep Structure-Preserving Image-Text Embeddings"

50 / 222 papers shown

Title
End-to-End Learning of Visual Representations from Uncurated Instructional Videos Antoine Miech Jean-Baptiste Alayrac Lucas Smaira Ivan Laptev Josef Sivic Andrew Zisserman VGen SSL 154 713 0 13 Dec 2019
A Real-time Global Inference Network for One-stage Referring Expression Comprehension Yiyi Zhou Rongrong Ji Gen Luo Xiaoshuai Sun Jinsong Su Xinghao Ding Chia-Wen Lin Q. Tian ObjD 78 63 0 07 Dec 2019
Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences Shizhe Chen Bei Liu Jianlong Fu Ruihua Song Qin Jin Pingping Lin Xiaoyu Qi Chunting Wang Jin Zhou DiffM 69 33 0 24 Nov 2019
Learning Cross-modal Context Graph for Visual Grounding Yongfei Liu Bo Wan Xiao-Dan Zhu Xuming He 91 91 0 20 Nov 2019
Ladder Loss for Coherent Visual-Semantic Embedding Mo Zhou Zhenxing Niu Le Wang Zhanning Gao Qilin Zhang G. Hua 86 40 0 18 Nov 2019
HUSE: Hierarchical Universal Semantic Embeddings P. Narayana Aniket Pednekar A. Krishnamoorthy Kazoo Sone Sugato Basu 42 10 0 14 Nov 2019
Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis Zhongkai Sun P. Sarma W. Sethares Yingyu Liang 89 328 0 13 Nov 2019
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries Fuwen Tan Paola Cascante-Bonilla Xiaoxiao Guo Hui Wu Song Feng Vicente Ordonez 60 30 0 10 Nov 2019
A Graph-Based Framework to Bridge Movies and Synopses Yu Xiong Chengyi Zhang Lingfeng Guo Hang Zhou Bolei Zhou Dahua Lin 79 63 0 24 Oct 2019
Target-Oriented Deformation of Visual-Semantic Embedding Space Takashi Matsubara 55 7 0 15 Oct 2019
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval Sijin Wang Ruiping Wang Ziwei Yao Shiguang Shan Xilin Chen 3DV 88 213 0 11 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations Po-Yao (Bernie) Huang Xiaojun Chang Alexander G. Hauptmann 138 25 0 30 Sep 2019
Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints David Semedo João Magalhães 40 11 0 30 Sep 2019
UNITER: UNiversal Image-TExt Representation Learning Yen-Chun Chen Linjie Li Licheng Yu Ahmed El Kholy Faisal Ahmed Zhe Gan Yu Cheng Jingjing Liu VLM OT 129 448 0 25 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators Kuang-Huei Lee Hamid Palangi Xi Chen Houdong Hu Jianfeng Gao VLM 67 37 0 22 Sep 2019
Video Skimming: Taxonomy and Comprehensive Survey Vivekraj V. K. Debashis Sen Balasubramanian Raman 66 10 0 21 Sep 2019
Dynamic Graph Attention for Referring Expression Comprehension Sibei Yang Guanbin Li Yizhou Yu OCL 86 222 0 18 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval Zihao Wang Xihui Liu Hongsheng Li Lu Sheng Junjie Yan Xiaogang Wang Jing Shao VLM 91 308 0 12 Sep 2019
Picture What you Read I. Gallo Shah Nawaz Alessandro Calefati Riccardo La Grassa Nicola Landro DiffM 64 0 0 09 Sep 2019
Adversarial Representation Learning for Text-to-Image Matching N. Sarafianos Xiang Xu I. Kakadiaris GAN 117 188 0 28 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding Zhengyuan Yang Boqing Gong Liwei Wang Wenbing Huang Dong Yu Jiebo Luo ObjD 60 366 0 18 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training Gen Li Nan Duan Yuejian Fang Ming Gong Daxin Jiang Ming Zhou SSL VLM MLLM 216 907 0 16 Aug 2019
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking Tan Wang Xing Xu Yang Yang Alan Hanjalic Heng Tao Shen Jingkuan Song 56 149 0 12 Aug 2019
Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings Michael Wray Diane Larlus G. Csurka Dima Damen 117 154 0 09 Aug 2019
Semi Supervised Phrase Localization in a Bidirectional Caption-Image Retrieval Framework Deepan Das Noor Mohammed Ghouse Shashank Verma Yin Li 23 0 0 08 Aug 2019
Task-Driven Common Representation Learning via Bridge Neural Network Yao Xu Xueshuang Xiang Meiyu Huang SSL 41 5 0 26 Jun 2019
ParNet: Position-aware Aggregated Relation Network for Image-Text matching Yaxian Xia Lun Huang Wenmin Wang Xiao-Yong Wei Jie Chen 121 1 0 17 Jun 2019
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval Yale Song M. Soleymani 72 247 0 11 Jun 2019
Joint Visual Grounding with Language Scene Graphs Daqing Liu Hanwang Zhang Zhengjun Zha Meng Wang Qianru Sun 69 6 0 09 Jun 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips Antoine Miech Dimitri Zhukov Jean-Baptiste Alayrac Makarand Tapaswi Ivan Laptev Josef Sivic VGen 130 1,211 0 07 Jun 2019
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video Zhenfang Chen Lin Ma Wenhan Luo Kwan-Yee K. Wong 95 103 0 06 Jun 2019
Learning to Compose and Reason with Language Tree Structures for Visual Grounding Richang Hong Daqing Liu Xiaoyu Mo Xiangnan He Hanwang Zhang ReLM LRM 98 165 0 05 Jun 2019
Saliency-Guided Attention Network for Image-Sentence Matching Zhong Ji Haoran Wang Jiawei Han Yanwei Pang 69 89 0 20 Apr 2019
Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions Peratham Wiriyathammabhum Abhinav Shrivastava Vlad I. Morariu L. Davis 60 5 0 08 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries Niluthpol Chowdhury Mithun S. Paul Amit K. Roy-Chowdhury 135 194 0 05 Apr 2019
Aiding Intra-Text Representations with Visual Context for Multimodal Named Entity Recognition Omer Arshad I. Gallo Shah Nawaz Alessandro Calefati 44 43 0 02 Apr 2019
Neural Sequential Phrase Grounding (SeqGROUND) Pelin Dogan Leonid Sigal Markus Gross ObjD 76 52 0 18 Mar 2019
Graphical Contrastive Losses for Scene Graph Parsing Ji Zhang Kevin J. Shih Ahmed Elgammal Andrew Tao Bryan Catanzaro 99 233 0 07 Mar 2019
Geometric Matrix Completion with Deep Conditional Random Fields Duc Minh Nguyen A. Calderbank Nikos Deligiannis 67 7 0 29 Jan 2019
Learning Shared Semantic Space with Correlation Alignment for Cross-modal Event Retrieval Zhenguo Yang Zehang Lin Peipei Kang Jianming Lv Qing Li Wenyin Liu 3DPC 91 26 0 14 Jan 2019
Action2Vec: A Crossmodal Embedding Approach to Action Learning Meera Hahn Andrew Silva James M. Rehg 78 58 0 02 Jan 2019
Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering Zhuoqian Yang Zengchang Qin Jing Yu Yue Hu GNN 80 16 0 23 Dec 2018
Sequential Attention GAN for Interactive Image Editing Yu Cheng Zhe Gan Yitong Li Jingjing Liu Jianfeng Gao 77 98 0 20 Dec 2018
Composing Text and Image for Image Retrieval - An Empirical Odyssey Nam S. Vo Lu Jiang Chen Sun Kevin Patrick Murphy Li Li Li Fei-Fei James Hays CoGe 71 370 0 18 Dec 2018
Detecting unseen visual relations using analogies Julia Peyre Ivan Laptev Cordelia Schmid Josef Sivic 58 18 0 13 Dec 2018
Adversarial Learning of Semantic Relevance in Text to Image Synthesis Miriam Cha Youngjune Gwon H. T. Kung GAN 75 54 0 12 Dec 2018
Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks Peng Wang Qi Wu Jiewei Cao Chunhua Shen Lianli Gao Anton Van Den Hengel ObjD 95 256 0 12 Dec 2018
Domain-Aware SE Network for Sketch-based Image Retrieval with Multiplicative Euclidean Margin Softmax Peng Lu Gao Huang Hangyu Lin Wenming Yang G. Guo Yanwei Fu 58 16 0 11 Dec 2018
PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding Rama Kovvuri Ram Nevatia ObjD 77 17 0 07 Dec 2018
Multi-task Learning of Hierarchical Vision-Language Representation Duy-Kien Nguyen Takayuki Okatani 105 52 0 03 Dec 2018