Explain Images with Multimodal Recurrent Neural Networks

4 October 2014

Yi Yang

Papers citing "Explain Images with Multimodal Recurrent Neural Networks"

50 / 116 papers shown

Title
User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning Xuan Wang Guanhong Wang Wenhao Chai Jiayu Zhou Gaoang Wang 37 4 0 08 Dec 2023
Object Recognition as Next Token Prediction Kaiyu Yue Borchun Chen Jonas Geiping Hengduo Li Tom Goldstein Ser-Nam Lim 40 9 0 04 Dec 2023
A Survey on Image-text Multimodal Models Ruifeng Guo Jingxuan Wei Linzhuang Sun Khai Le-Duc Guiyong Chang Dawei Liu Sibo Zhang Zhengbing Yao Mingjun Xu Liping Bu VLM 31 5 0 23 Sep 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification Sai Suprabhanu Nallapaneni Subrahmanyam Konakanchi 30 2 0 05 Aug 2023
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning Mozhgan Pourkeshavarz Shahabedin Nabavi Mohsen Moghaddam M. Shamsfard 31 4 0 08 Feb 2023
Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors R. Burgert Kanchana Ranasinghe Xiang Li Michael S. Ryoo DiffM VLM 34 37 0 23 Nov 2022
Language Models Can See: Plugging Visual Controls in Text Generation Yixuan Su Tian Lan Yahui Liu Fangyu Liu Dani Yogatama Yan Wang Lingpeng Kong Nigel Collier VLM MLLM 46 97 0 05 May 2022
Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling Tengpeng Li Hanli Wang Bin He Changan Chen DiffM 21 9 0 10 Mar 2022
Geometry-Entangled Visual Semantic Transformer for Image Captioning Ling Cheng Wei Wei Feida Zhu Yong-jin Liu Chunyan Miao ViT 21 3 0 29 Sep 2021
Heterogeneous Contrastive Learning Lecheng Zheng Jinjun Xiong Yada Zhu Jingrui He 40 21 0 19 May 2021
Characterization and recognition of handwritten digits using Julia Md Asifuzzaman Jishan M. Alam A. Islam I. R. Mazumder K. Mahmud A. K. Azad 19 0 0 24 Feb 2021
Image to Bengali Caption Generation Using Deep CNN and Bidirectional Gated Recurrent Unit Albay Faruk Hasan Al Faraby M. M. Azad Md. Riduyan Fedous Md. Kishor Morol 17 15 0 22 Dec 2020
Intrinsic Image Captioning Evaluation Chao Zeng Sam Kwong 21 0 0 14 Dec 2020
Watch and Learn: Mapping Language and Noisy Real-world Videos with Self-supervision Yujie Zhong Linhai Xie Sen Wang Lucia Specia Yishu Miao SSL 11 0 0 19 Nov 2020
TextMage: The Automated Bangla Caption Generator Based On Deep Learning Abrar Hasin Kamal Md Asifuzzaman Jishan N. Mansoor VLM 8 17 0 15 Oct 2020
X-Linear Attention Networks for Image Captioning Yingwei Pan Ting Yao Yehao Li Tao Mei 20 509 0 31 Mar 2020
Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis Alexander Schindler 15 8 0 01 Feb 2020
On Architectures for Including Visual Information in Neural Language Models for Image Description Marc Tanti Albert Gatt K. Camilleri VLM 30 2 0 09 Nov 2019
Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style Hongwei Ge Zehang Yan Kai Zhang Mingde Zhao Liang Sun 30 24 0 15 Oct 2019
Visuallly Grounded Generation of Entailments from Premises Somayeh Jafaritazehjani Albert Gatt Marc Tanti LRM 27 1 0 21 Sep 2019
Using Clinical Notes with Time Series Data for ICU Management Swaraj Khadanga Karan Aggarwal Chenyu You Jaideep Srivastava 8 55 0 12 Sep 2019
Conditional Text Generation for Harmonious Human-Machine Interaction Bin Guo Hao Wang Yasan Ding Wei Wu Shaoyang Hao Yueqi Sun Zhiwen Yu 21 4 0 08 Sep 2019
Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation Wei Wei Ling Cheng Xian-Ling Mao Guangyou Zhou Feida Zhu DiffM 22 19 0 05 Sep 2019
A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling Haoran Chen Ke Lin A. Maye Jianmin Li Xiaoling Hu 25 47 0 31 Aug 2019
Modeling question asking using neural program generation ZiYun Wang Brenden M. Lake 16 7 0 23 Jul 2019
Kite: Automatic speech recognition for unmanned aerial vehicles Dan Oneaţă H. Cucu 21 13 0 02 Jul 2019
AI-Powered Text Generation for Harmonious Human-Machine Interaction: Current State and Future Directions Qiuyun Zhang Bin Guo Hao Wang Yunji Liang Shaoyang Hao Zhiwen Yu 15 6 0 01 May 2019
Improving Image Captioning by Leveraging Knowledge Graphs Yimin Zhou Yiwei Sun Vasant Honavar VLM 14 54 0 25 Jan 2019
Transfer learning from language models to image caption generators: Better models may not transfer better Marc Tanti Albert Gatt K. Camilleri VLM 23 3 0 01 Jan 2019
A Comprehensive Survey of Deep Learning for Image Captioning Md Zakir Hossain Ferdous Sohel M. Shiratuddin Hamid Laga VLM 3DV 45 760 0 06 Oct 2018
LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts Shuming Ma Lei Cui Damai Dai Furu Wei Xu Sun VGen 23 61 0 13 Sep 2018
Diverse and Coherent Paragraph Generation from Images Moitreya Chatterjee A. Schwing 19 66 0 03 Sep 2018
Live Video Comment Generation Based on Surrounding Frames and Live Comments Damai Dai VGen 8 0 0 13 Aug 2018
Doubly Attentive Transformer Machine Translation Hasan Sait Arslan Mark Fishel G. Anbarjafari 32 13 0 30 Jul 2018
Improving Image Captioning with Conditional Generative Adversarial Nets Chen Chen Shuai Mu Wanpeng Xiao Zexiong Ye Liesi Wu Qi Ju GAN 29 90 0 18 May 2018
Video Object Detection with an Aligned Spatial-Temporal Memory Fanyi Xiao Yong Jae Lee 49 189 0 18 Dec 2017
Learning Semantic Concepts and Order for Image and Sentence Matching Yan Huang Qi Wu Liang Wang VLM 8 302 0 06 Dec 2017
Learning Functional Causal Models with Generative Neural Networks Hugo Jair Escalante Sergio Escalera Xavier Baro Isabelle M Guyon Umut Güçlü Marcel van Gerven CML BDL 20 107 0 15 Sep 2017
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator? Marc Tanti Albert Gatt K. Camilleri 21 56 0 07 Aug 2017
Identity-Aware Textual-Visual Matching with Latent Co-attention Shuang Li Tong Xiao Hongsheng Li Wei Yang Xiaogang Wang 22 227 0 07 Aug 2017
Deep Interactive Region Segmentation and Captioning Ali Sharifi Boroujerdi M. Khanian M. Breuß 24 7 0 26 Jul 2017
Multimedia Semantic Integrity Assessment Using Joint Embedding Of Images And Text Ayush Jaiswal Ekraam Sabir Wael Abd-Almageed Premkumar Natarajan 16 44 0 06 Jul 2017
MirBot: A collaborative object recognition system for smartphones using convolutional neural networks A. Pertusa Antonio Javier Gallego Marisa Bernabeu ObjD 14 13 0 09 Jun 2017
Query-adaptive Video Summarization via Quality-aware Relevance Estimation A. Vasudevan Michael Gygli Anna Volokitin Luc Van Gool 32 93 0 01 May 2017
Where to put the Image in an Image Caption Generator Marc Tanti Albert Gatt K. Camilleri 47 96 0 27 Mar 2017
A New Evaluation Protocol and Benchmarking Results for Extendable Cross-media Retrieval Ruoyu Liu Yao Zhao Liang Zheng Shikui Wei Yi Yang 25 12 0 10 Mar 2017
Gated Multimodal Units for Information Fusion John Arevalo Thamar Solorio Manuel Montes-y-Gómez Fabio Gonzalez 33 371 0 07 Feb 2017
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation Iacer Calixto Qun Liu N. Campbell 40 179 0 04 Feb 2017
Incorporating Global Visual Features into Attention-Based Neural Machine Translation Iacer Calixto Qun Liu Nick Campbell 32 154 0 23 Jan 2017
Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering Hao Liu Yang Yang Fumin Shen Lixin Duan Heng Tao Shen 30 9 0 15 Dec 2016