Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

10 November 2014

Papers citing "Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models"

50 / 263 papers shown

Title
Interactive Image Manipulation with Natural Language Instruction Commands Seitaro Shinagawa Koichiro Yoshino S. Sakti Yu Suzuki Satoshi Nakamura 38 14 0 23 Feb 2018
Zero-Shot Question Generation from Knowledge Graphs for Unseen Predicates and Entity Types Hady ElSahar Christophe Gravier F. Laforest BDL 22 80 0 19 Feb 2018
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH) Pelin Dogan Boyang Albert Li Leonid Sigal Markus Gross AI4TS 30 19 0 19 Feb 2018
Describing Semantic Representations of Brain Activity Evoked by Visual Stimuli Eri Matsuo Ichiro Kobayashi Shinji Nishimoto S. Nishida H. Asoh 21 14 0 19 Jan 2018
DeepStyle: Multimodal Search Engine for Fashion and Interior Design Ivona Tautkute Tomasz Trzciñski Aleksander P. Skorupa Łukasz Brocki K. Marasek 27 55 0 08 Jan 2018
Cross-modal Embeddings for Video and Audio Retrieval Dídac Surís A. Duarte Amaia Salvador Jordi Torres Xavier Giró-i-Nieto SSL 21 69 0 07 Jan 2018
Video Object Detection with an Aligned Spatial-Temporal Memory Fanyi Xiao Yong Jae Lee 49 189 0 18 Dec 2017
HP-GAN: Probabilistic 3D human motion prediction via GAN Emad Barsoum J. Kender Zicheng Liu 3DH 56 330 0 27 Nov 2017
A Neural-Symbolic Approach to Design of CAPTCHA Qiuyuan Huang P. Smolensky Xiaodong He Li Deng D. Wu AAML 36 1 0 29 Oct 2017
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning Yang Xian Yingli Tian VLM 30 22 0 15 Sep 2017
Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision Mohamed Elhoseiny Yizhe Zhu Han Zhang Ahmed Elgammal VLM 38 132 0 04 Sep 2017
Reasoning about Fine-grained Attribute Phrases using Reference Games Jong-Chyi Su Chenyun Wu Huaizu Jiang Subhransu Maji 34 16 0 29 Aug 2017
Open-World Visual Recognition Using Knowledge Graphs V. Lonij Ambrish Rawat Maria-Irina Nicolae 37 15 0 28 Aug 2017
Fluency-Guided Cross-Lingual Image Captioning Weiyu Lan Xirong Li Jianfeng Dong 19 93 0 15 Aug 2017
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator? Marc Tanti Albert Gatt K. Camilleri 24 56 0 07 Aug 2017
Automatic Spatially-aware Fashion Concept Discovery Xintong Han Zuxuan Wu Phoenix X. Huang Xiao Zhang Menglong Zhu Yuan Li Yang Zhao L. Davis 47 267 0 03 Aug 2017
Learning Audio - Sheet Music Correspondences for Score Identification and Offline Alignment Matthias Dorfer A. Arzt Gerhard Widmer 41 43 0 31 Jul 2017
Deep Interactive Region Segmentation and Captioning Ali Sharifi Boroujerdi M. Khanian M. Breuß 24 7 0 26 Jul 2017
Semantic Image Synthesis via Adversarial Learning Hao Dong Simiao Yu Chao Wu Yike Guo GAN 20 265 0 21 Jul 2017
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives Fartash Faghri David J. Fleet J. Kiros Sanja Fidler VLM 11 181 0 18 Jul 2017
DeepStory: Video Story QA by Deep Embedded Memory Networks Kyung-Min Kim Min-Oh Heo Seongho Choi Byoung-Tak Zhang 26 174 0 04 Jul 2017
Multimodal Machine Learning: A Survey and Taxonomy T. Baltrušaitis Chaitanya Ahuja Louis-Philippe Morency 15 2,868 0 26 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures Fanyi Xiao Leonid Sigal Yong Jae Lee 35 139 0 03 May 2017
Query-adaptive Video Summarization via Quality-aware Relevance Estimation A. Vasudevan Michael Gygli Anna Volokitin Luc Van Gool 40 93 0 01 May 2017
Deep Reinforcement Learning-based Image Captioning with Embedding Reward Zhou Ren Xiaoyu Wang Ning Zhang Xutao Lv Li Li 34 324 0 12 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks Liwei Wang Yin Li Jing-ling Huang Svetlana Lazebnik VLM 27 494 0 11 Apr 2017
AMC: Attention guided Multi-modal Correlation Learning for Image Search Kan Chen Trung Bui Chen Fang Zhaowen Wang Ram Nevatia 37 38 0 03 Apr 2017
Where to put the Image in an Image Caption Generator Marc Tanti Albert Gatt K. Camilleri 47 96 0 27 Mar 2017
I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation Hao Dong Jingqing Zhang Douglas McIlwraith Yike Guo 35 58 0 20 Mar 2017
Gated Multimodal Units for Information Fusion John Arevalo Thamar Solorio Manuel Montes-y-Gómez Fabio Gonzalez 33 373 0 07 Feb 2017
Multilingual Multi-modal Embeddings for Natural Language Processing Iacer Calixto Qun Liu N. Campbell 24 19 0 03 Feb 2017
Incorporating Global Visual Features into Attention-Based Neural Machine Translation Iacer Calixto Qun Liu Nick Campbell 32 154 0 23 Jan 2017
Learning Visual N-Grams from Web Data Ang Li Allan Jabri Armand Joulin Laurens van der Maaten VLM 20 136 0 29 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Yash Goyal Tejas Khot D. Summers-Stay Dhruv Batra Devi Parikh CoGe 164 3,136 0 02 Dec 2016
Video Captioning with Multi-Faceted Attention Xiang Long Chuang Gan Gerard de Melo 30 88 0 01 Dec 2016
Dense Captioning with Joint Inference and Visual Context L. Yang K. Tang Jianchao Yang Li Li VLM 30 169 0 21 Nov 2016
Recurrent Memory Addressing for describing videos A. Jain Abhinav Agarwalla Kumar Krishna Agrawal Pabitra Mitra 38 10 0 20 Nov 2016
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM Yan Huang Wei Wang Liang Wang 26 222 0 17 Nov 2016
Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot Hideki Nakayama Noriki Nishida 32 62 0 14 Nov 2016
Dual Attention Networks for Multimodal Reasoning and Matching Hyeonseob Nam Jung-Woo Ha Jeonghee Kim 45 664 0 02 Nov 2016
Learning What and Where to Draw Scott E. Reed Zeynep Akata S. Mohan Samuel Tenka Bernt Schiele Honglak Lee DRL GAN 30 618 0 08 Oct 2016
A Survey of Multi-View Representation Learning Yingming Li Ming Yang Zhongfei Zhang AI4TS 3DV 37 509 0 03 Oct 2016
Learning Language-Visual Embedding for Movie Understanding with Natural-Language Atousa Torabi Niket Tandon Leonid Sigal 22 97 0 26 Sep 2016
Image-embodied Knowledge Representation Learning Ruobing Xie Zhiyuan Liu Huanbo Luan Maosong Sun 122 211 0 22 Sep 2016
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge Oriol Vinyals Alexander Toshev Samy Bengio D. Erhan 30 851 0 21 Sep 2016
Measuring Machine Intelligence Through Visual Question Answering C. L. Zitnick Aishwarya Agrawal Stanislaw Antol Margaret Mitchell Dhruv Batra Devi Parikh 27 37 0 31 Aug 2016
Linking Image and Text with 2-Way Nets Aviv Eisenschtat Lior Wolf 27 176 0 29 Aug 2016
Learning to generalize to new compositions in image understanding Yuval Atzmon Jonathan Berant Vahid Kezami Amir Globerson Gal Chechik 26 67 0 27 Aug 2016
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning Y. Tan Chee Seng Chan VLM 22 29 0 20 Aug 2016
Modeling Context in Referring Expressions Licheng Yu Patrick Poirson Shan Yang Alexander C. Berg Tamara L. Berg 53 1,233 0 31 Jul 2016