Title
Learning to Guide Decoding for Image Captioning Wenhao Jiang Lin Ma Xinpeng Chen Hanwang Zhang Wen Liu 16 69 0 03 Apr 2018
Neural Baby Talk Jiasen Lu Jianwei Yang Dhruv Batra Devi Parikh VLM 200 434 0 27 Mar 2018
Stacked Cross Attention for Image-Text Matching Kuang-Huei Lee Xi Chen G. Hua Houdong Hu Xiaodong He 30 1,140 0 21 Mar 2018
Attentive Tensor Product Learning Qiuyuan Huang Li Deng D. Wu Chang Liu Xiaodong He 24 23 0 20 Feb 2018
Netizen-Style Commenting on Fashion Photos: Dataset and Diversity Measures Wen Hua Lin Kuan-Ting Chen HungYueh Chiang Winston H. Hsu 34 10 0 31 Jan 2018
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions Qing Li Jianlong Fu D. Yu Tao Mei Jiebo Luo FAtt XAI CoGe 51 60 0 27 Jan 2018
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning Hongge Chen Huan Zhang Pin-Yu Chen Jinfeng Yi Cho-Jui Hsieh GAN AAML 35 49 0 06 Dec 2017
On the Automatic Generation of Medical Imaging Reports Baoyu Jing P. Xie Eric P. Xing MedIm 35 503 0 22 Nov 2017
AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding Jiahong Wu He Zheng Bo Zhao Yixin Li Baoming Yan ... Shipei Zhou G. Lin Yanwei Fu Yizhou Wang Yonggang Wang VLM 38 149 0 17 Nov 2017
Semantic speech retrieval with a visually grounded model of untranscribed speech Herman Kamper Gregory Shakhnarovich Karen Livescu 29 53 0 05 Oct 2017
Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning Yang Xian Yingli Tian VLM 25 22 0 15 Sep 2017
Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision Mohamed Elhoseiny Yizhe Zhu Han Zhang Ahmed Elgammal VLM 38 132 0 04 Sep 2017
Fluency-Guided Cross-Lingual Image Captioning Weiyu Lan Xirong Li Jianfeng Dong 19 93 0 15 Aug 2017
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge Damien Teney Peter Anderson Xiaodong He Anton Van Den Hengel 50 380 0 09 Aug 2017
Dual-Glance Model for Deciphering Social Relationships Junnan Li Yongkang Wong Qi Zhao Mohan Kankanhalli 16 78 0 02 Aug 2017
Scene Graph Generation from Objects, Phrases and Region Captions Yikang Li Wanli Ouyang Bolei Zhou Kun Wang Xiaogang Wang 21 499 0 31 Jul 2017
Deep Interactive Region Segmentation and Captioning Ali Sharifi Boroujerdi M. Khanian M. Breuß 24 7 0 26 Jul 2017
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts Xuwang Yin Vicente Ordonez VLM 40 55 0 22 Jul 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model Jiasen Lu A. Kannan Jianwei Yang Devi Parikh Dhruv Batra BDL 38 136 0 05 Jun 2017
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning Jingkuan Song Zhao Guo Lianli Gao Wu Liu Dongxiang Zhang Heng Tao Shen 42 166 0 05 Jun 2017
Adversarial Ranking for Language Generation Kevin Qinghong Lin Dianqi Li Xiaodong He Zhengyou Zhang Ming-Ting Sun GAN 26 331 0 31 May 2017
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning Q. Sun Stefan Lee Dhruv Batra BDL 33 43 0 24 May 2017
Query-adaptive Video Summarization via Quality-aware Relevance Estimation A. Vasudevan Michael Gygli Anna Volokitin Luc Van Gool 35 93 0 01 May 2017
Detecting Visual Relationships with Deep Relational Networks Bo Dai Yuqi Zhang Dahua Lin GNN 59 500 0 11 Apr 2017
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation Albert Gatt E. Krahmer LM&MA ELM 27 810 0 29 Mar 2017
Visually grounded learning of keyword prediction from untranscribed speech Herman Kamper Shane Settle Gregory Shakhnarovich Karen Livescu 19 63 0 23 Mar 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das Satwik Kottur J. M. F. Moura Stefan Lee Dhruv Batra OffRL 31 423 0 20 Mar 2017
VQABQ: Visual Question Answering by Basic Questions Jia-Hong Huang Modar Alfadly Guohao Li 27 24 0 19 Mar 2017
Person Search with Natural Language Description Shuang Li Tong Xiao Hongsheng Li Bolei Zhou Dayu Yue Xiaogang Wang 24 386 0 19 Feb 2017
MAT: A Multimodal Attentive Translator for Image Captioning Chang Liu F. Sun Changhu Wang Feng Wang Alan Yuille 20 58 0 18 Feb 2017
Context-aware Captions from Context-agnostic Supervision Ramakrishna Vedantam Samy Bengio Kevin Patrick Murphy Devi Parikh Gal Chechik 22 152 0 11 Jan 2017
Learning Visual N-Grams from Web Data Ang Li Allan Jabri Armand Joulin L. V. D. van der Maaten VLM 20 136 0 29 Dec 2016
An Empirical Study of Language CNN for Image Captioning Jiuxiang Gu G. Wang Jianfei Cai Tsuhan Chen 31 132 0 21 Dec 2016
Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale F. Iandola 3DV 26 18 0 20 Dec 2016
Multiple Instance Learning: A Survey of Problem Characteristics and Applications M. Carbonneau V. Cheplygina Eric Granger G. Gagnon 14 612 0 11 Dec 2016
Areas of Attention for Image Captioning M. Pedersoli Thomas Lucas Cordelia Schmid Jakob Verbeek 33 205 0 03 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Yash Goyal Tejas Khot D. Summers-Stay Dhruv Batra Devi Parikh CoGe 122 3,126 0 02 Dec 2016
Guided Open Vocabulary Image Captioning with Constrained Beam Search Peter Anderson Basura Fernando Mark Johnson Stephen Gould 21 232 0 02 Dec 2016
Visual Dialog Abhishek Das Satwik Kottur Khushi Gupta Avi Singh Deshraj Yadav José M. F. Moura Devi Parikh Dhruv Batra 69 990 0 26 Nov 2016
On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems Besmira Nushi Ece Kamar Eric Horvitz Donald Kossmann 42 77 0 24 Nov 2016
Recurrent Attention Models for Depth-Based Person Identification Albert Haque Alexandre Alahi Li Fei-Fei 3DH 44 142 0 22 Nov 2016
Dense Captioning with Joint Inference and Visual Context L. Yang K. Tang Jianchao Yang Li-Jia Li VLM 30 169 0 21 Nov 2016
Recurrent Memory Addressing for describing videos A. Jain Abhinav Agarwalla Kumar Krishna Agrawal Pabitra Mitra 38 10 0 20 Nov 2016
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM Yan Huang Wei Wang Liang Wang 26 222 0 17 Nov 2016
Semantic Regularisation for Recurrent Image Annotation Feng Liu Tao Xiang Timothy M. Hospedales Wankou Yang Changyin Sun 31 103 0 16 Nov 2016
Boosting Image Captioning with Attributes Ting Yao Yingwei Pan Yehao Li Zhaofan Qiu Tao Mei VLM 48 620 0 05 Nov 2016
End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering Youngjae Yu Hyungjin Ko Jongwook Choi Gunhee Kim 14 230 0 10 Oct 2016
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Ramprasaath R. Selvaraju Michael Cogswell Abhishek Das Ramakrishna Vedantam Devi Parikh Dhruv Batra FAtt 44 19,576 0 07 Oct 2016
A Survey of Multi-View Representation Learning Yingming Li Ming Yang Zhongfei Zhang AI4TS 3DV 37 509 0 03 Oct 2016
Variational Autoencoder for Deep Learning of Images, Labels and Captions Yunchen Pu Zhe Gan Ricardo Henao Xin Yuan Chunyuan Li Andrew Stevens Lawrence Carin BDL CoGe 28 746 0 28 Sep 2016