Show and Tell: A Neural Image Caption Generator

17 November 2014

Papers citing "Show and Tell: A Neural Image Caption Generator"

50 / 2,023 papers shown

Title
Semantic Refinement GRU-based Neural Language Generation for Spoken Dialogue Systems Van-Khanh Tran Le-Minh Nguyen 28 20 0 01 Jun 2017
Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols Serhii Havrylov Ivan Titov LLMAG 41 286 0 31 May 2017
Listen, Interact and Talk: Learning to Speak via Interaction Haichao Zhang Haonan Yu Wenyuan Xu 31 13 0 28 May 2017
Human Trajectory Prediction using Spatially aware Deep Attention Models Daksh Varshneya G. Srinivasaraghavan HAI 40 91 0 26 May 2017
Multimodal Machine Learning: A Survey and Taxonomy T. Baltrušaitis Chaitanya Ahuja Louis-Philippe Morency 15 2,867 0 26 May 2017
Who Will Share My Image? Predicting the Content Diffusion Path in Online Social Networks Wenjian Hu Krishna Kumar Singh Fanyi Xiao Jinyoung Han Chen-Nee Chuah Yong Jae Lee GNN DiffM 19 1 0 25 May 2017
Neural Attribute Machines for Program Generation Matthew Amodio Swarat Chaudhuri Thomas W. Reps 19 35 0 25 May 2017
Deep image representations using caption generators Konda Reddy Mopuri Vishal B. Athreya R. Venkatesh Babu VLM SSL 21 1 0 25 May 2017
How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval Rodrigo Toro Icarte Jorge A. Baier Cristian Ruz Á. Soto 9 24 0 24 May 2017
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning Q. Sun Stefan Lee Dhruv Batra BDL 33 43 0 24 May 2017
Better Text Understanding Through Image-To-Text Transfer Karol Kurach Sylvain Gelly M. Jastrzebski Philip Häusser O. Teytaud Damien Vincent Olivier Bousquet VLM 17 6 0 23 May 2017
pix2code: Generating Code from a Graphical User Interface Screenshot Tony Beltramelli 33 267 0 22 May 2017
MUTAN: Multimodal Tucker Fusion for Visual Question Answering H. Ben-younes Rémi Cadène Matthieu Cord Nicolas Thome 67 578 0 18 May 2017
Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks Matthias Plappert Christian Mandery Tamim Asfour 3DH 32 129 0 18 May 2017
Re3 : Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects Daniel Gordon Ali Farhadi Dieter Fox VOT 21 48 0 17 May 2017
Object-Level Context Modeling For Scene Classification with Context-CNN Syed Ashar Javed A. Nelakanti VLM 30 10 0 11 May 2017
You said that? Joon Son Chung A. Jamaludin Andrew Zisserman CVBM 23 258 0 08 May 2017
Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks Haiyang Yu Zhihai Wu Shuqin Wang Yunpeng Wang Xiaolei Ma AI4TS GNN 30 540 0 07 May 2017
Image Annotation using Multi-Layer Sparse Coding Amara Tariq H. Foroosh 14 2 0 06 May 2017
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases Xiaosong Wang Yifan Peng Le Lu Zhiyong Lu M. Bagheri Ronald M. Summers LM&MA 66 2,474 0 05 May 2017
FOIL it! Find One mismatch between Image and Language caption Ravi Shekhar Sandro Pezzelle Yauhen Klimovich Aurélie Herbelot Moin Nabi E. Sangineto Raffaella Bernardi 25 137 0 03 May 2017
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner Tseng-Hung Chen Yuan-Hong Liao Ching-Yao Chuang W. Hsu Jianlong Fu Min Sun 31 141 0 02 May 2017
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset Yuya Yoshikawa Yutaro Shigeto A. Takeuchi 3DV 30 118 0 02 May 2017
Speech-Based Visual Question Answering Ted Zhang Dengxin Dai Tinne Tuytelaars Marie-Francine Moens Luc Van Gool 40 24 0 01 May 2017
Punny Captions: Witty Wordplay in Image Descriptions Arjun Chandrasekaran Devi Parikh Joey Tianyi Zhou 13 13 0 26 Apr 2017
Paying Attention to Descriptions Generated by Image Captioning Models Hamed R. Tavakoli Rakshith Shetty Ali Borji Jorma T. Laaksonen 29 79 0 24 Apr 2017
Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition Yufei Wang Zhe-nan Lin Xiaohui Shen Scott D. Cohen G. Cottrell 21 105 0 23 Apr 2017
Affect-LM: A Neural Language Model for Customizable Affective Text Generation Sayan Ghosh Mathieu Chollet Eugene Laksana Louis-Philippe Morency Stefan Scherer KELM CVBM 24 190 0 22 Apr 2017
AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching David Novotny Diane Larlus Andrea Vedaldi 3DPC 31 65 0 16 Apr 2017
Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions Amir Mazaheri Dong-Ming Zhang M. Shah 17 12 0 15 Apr 2017
Spatial Memory for Context Reasoning in Object Detection Xinlei Chen Abhinav Gupta ObjD 25 164 0 13 Apr 2017
Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries Y. Zhang Luyao Yuan Yijie Guo Zhiyuan He I-An Huang Honglak Lee ObjD 28 57 0 12 Apr 2017
Deep Reinforcement Learning-based Image Captioning with Embedding Reward Zhou Ren Xiaoyu Wang Ning Zhang Xutao Lv Li-Jia Li 34 324 0 12 Apr 2017
What's in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju Olga Russakovsky Abhinav Gupta 19 16 0 12 Apr 2017
Creativity: Generating Diverse Questions using Variational Autoencoders Unnat Jain Ziyu Zhang Alex Schwing 25 152 0 11 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks Liwei Wang Yin Li Jing-ling Huang Svetlana Lazebnik VLM 27 494 0 11 Apr 2017
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering V. Kazemi Ali Elqursh OOD 28 183 0 11 Apr 2017
Action Unit Detection with Region Adaptation, Multi-labeling Learning and Optimal Temporal Fusing Wei Li Farnaz Abtahi Zhigang Zhu 8 155 0 10 Apr 2017
Learning Human Motion Models for Long-term Predictions Partha Ghosh Mingli Song Emre Aksan Otmar Hilliges 3DH 28 239 0 10 Apr 2017
Generating Descriptions with Grounded and Co-Referenced People Anna Rohrbach Marcus Rohrbach Siyu Tang Seong Joon Oh Bernt Schiele 330 72 0 05 Apr 2017
Weakly Supervised Dense Video Captioning Zhiqiang Shen Jianguo Li Zhou Su Minjun Li Yurong Chen Yu-Gang Jiang Xiangyang Xue 32 134 0 05 Apr 2017
A Genetic Programming Approach to Designing Convolutional Neural Network Architectures Masanori Suganuma Shinichi Shirakawa T. Nagao 27 587 0 03 Apr 2017
Towards Building Large Scale Multimodal Domain-Aware Conversation Systems Amrita Saha Mitesh Khapra Karthik Sankaranarayanan 26 8 0 01 Apr 2017
Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images Rakshith Shetty Bernt Schiele Mario Fritz 35 223 0 30 Mar 2017
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training Rakshith Shetty Marcus Rohrbach Lisa Anne Hendricks Mario Fritz Bernt Schiele 19 142 0 30 Mar 2017
Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding Will Monroe Robert D. Hawkins Noah D. Goodman Christopher Potts 37 122 0 29 Mar 2017
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation Albert Gatt E. Krahmer LM&MA ELM 27 810 0 29 Mar 2017
Towards Automatic Learning of Procedures from Web Instructional Videos Luowei Zhou Chenliang Xu Jason J. Corso EgoV 36 804 0 28 Mar 2017
Where to put the Image in an Image Caption Generator Marc Tanti Albert Gatt K. Camilleri 47 96 0 27 Mar 2017
Sequence-to-Sequence Models Can Directly Translate Foreign Speech Ron J. Weiss J. Chorowski Navdeep Jaitly Yonghui Wu Zhehuai Chen 33 341 0 24 Mar 2017