v1v2v3v4v5 (latest)

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

20 December 2014

Yi Yang

Papers citing "Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)"

50 / 417 papers shown

Title
Automatic Rule Induction for Interpretable Semi-Supervised Learning Reid Pryzant Ziyi Yang Yichong Xu Chenguang Zhu Michael Zeng 81 10 0 18 May 2022
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning Chia-Wen Kuo Z. Kira 97 55 0 09 May 2022
Diverse Image Captioning with Grounded Style Franz Klein Shweta Mahajan S. Roth 72 7 0 03 May 2022
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks Zhaowei Cai Gukyeong Kwon Avinash Ravichandran Erhan Bas Zhuowen Tu Rahul Bhotika Stefano Soatto ObjD MLLM VLM 67 50 0 12 Apr 2022
On Distinctive Image Captioning via Comparing and Reweighting Jiuniu Wang Wenjia Xu Qingzhong Wang Antoni B. Chan 87 16 0 08 Apr 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition Peipei Zhu Tianlin Li Yong Luo Zhenglong Sun Wei-Shi Zheng Yaowei Wang Chen Chen 102 12 0 07 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models Feng Li Hao Zhang Yi-Fan Zhang Shixuan Liu Jian Guo L. Ni Pengchuan Zhang Lei Zhang AI4TS VLM 79 37 0 03 Mar 2022
Inference of captions from histopathological patches M. Tsuneki F. Kanavati 84 32 0 07 Feb 2022
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators Lois Orosa Skanda Koppula Yaman Umuroglu Konstantinos Kanellopoulos Juan Gómez Luna Michaela Blott K. Vissers O. Mutlu 82 4 0 04 Feb 2022
Multi-Label Classification on Remote-Sensing Images A. Singh B. Uma Shankar 38 0 0 06 Jan 2022
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Yoad Tewel Yoav Shalev Idan Schwartz Lior Wolf VLM 122 197 0 29 Nov 2021
Contrastive Learning of Visual-Semantic Embeddings Anurag Jain Yashaswi Verma SSL 66 1 0 17 Oct 2021
Geometry Attention Transformer with Position-aware LSTMs for Image Captioning Chi-Yin Wang Yulin Shen Luping Ji ViT 106 53 0 01 Oct 2021
Cross Modification Attention Based Deliberation Model for Image Captioning Zheng Lian Yanan Zhang Haichang Li Rui Wang Xiaohui Hu 64 5 0 17 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention Katsuyuki Nakamura Hiroki Ohashi Mitsuhiro Okada EgoV 94 13 0 07 Sep 2021
Group-based Distinctive Image Captioning with Memory Attention Jiuniu Wang Wenjia Xu Qingzhong Wang Antoni B. Chan 100 18 0 20 Aug 2021
Caption Generation on Scenes with Seen and Unseen Object Categories B. Demirel R. G. Cinbis VLM 115 1 0 13 Aug 2021
A Better Loss for Visual-Textual Grounding Davide Rigoni Luciano Serafini A. Sperduti ObjD 60 3 0 11 Aug 2021
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning Bryan Wang Gang Li Xin Zhou Zhourong Chen Tovi Grossman Yang Li 207 160 0 07 Aug 2021
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval Xuri Ge Fuhai Chen J. Jose Zhilong Ji Zhongqin Wu Xiao-Chang Liu 72 57 0 05 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning Matteo Stefanini Marcella Cornia Lorenzo Baraldi S. Cascianelli G. Fiameni Rita Cucchiara 3DV VLM MLLM 153 270 0 14 Jul 2021
A comparison of LSTM and GRU networks for learning symbolic sequences Roberto Cahuantzi Xinye Chen S. Güttel 96 143 0 05 Jul 2021
Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching between Parts and Words Chuan Tang Xi Yang Bojian Wu Zhizhong Han Yi Chang 3DPC 91 13 0 05 Jul 2021
Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions Motonari Kambara K. Sugiura ViT 62 6 0 02 Jul 2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation Jing Liu Xinxin Zhu Fei Liu Longteng Guo Zijia Zhao ... Weining Wang Hanqing Lu Shiyu Zhou Jiajun Zhang Jinqiao Wang 82 38 0 01 Jul 2021
New Encoder Learning for Captioning Heavy Rain Images via Semantic Visual Feature Matching Chang-Hwan Son Pung-Hwi Ye 123 3 0 28 May 2021
Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation Xingyi Yang Muchao Ye Quanzeng You Fenglong Ma MedIm 57 38 0 25 May 2021
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval K. Ueki 45 4 0 16 May 2021
End-to-End Attention-based Image Captioning Carola Sundaramoorthy Lin Ziwen Kelvin Mahak Sarin Shubham Gupta ViT 57 6 0 30 Apr 2021
Multi-view Deep One-class Classification: A Systematic Exploration Siqi Wang Jiyuan Liu Guang Yu Xinwang Liu Sihang Zhou En Zhu Yuexiang Yang Jianping Yin 24 1 0 27 Apr 2021
Towards Open-World Text-Guided Face Image Generation and Manipulation Weihao Xia Yujiu Yang Jing-Hao Xue Baoyuan Wu DiffM 69 42 0 18 Apr 2021
Integrating Information Theory and Adversarial Learning for Cross-modal Retrieval Wei Chen Yu Liu E. Bakker M. Lew GAN 41 27 0 11 Apr 2021
A Comprehensive Review of the Video-to-Text Problem Jesus Perez-Martin B. Bustos S. Guimarães I. Sipiran Jorge A. Pérez Grethel Coello Said 71 17 0 27 Mar 2021
Sequential Learning on Liver Tumor Boundary Semantics and Prognostic Biomarker Mining Jieneng Chen K. Yan Yu-Dong Zhang Youbao Tang Xun Xu ... Lingyun Huang Jing Xiao Alan Yuille Ya Zhang Le Lu 30 2 0 09 Mar 2021
Analysis of Convolutional Decoder for Image Caption Generation Sulabh Katiyar S. Borgohain 52 0 0 08 Mar 2021
A Universal Model for Cross Modality Mapping by Relational Reasoning Zun Li Congyan Lang Liqian Liang Tao Wang Songhe Feng Jun Wu Yidong Li 56 2 0 26 Feb 2021
Comparative evaluation of CNN architectures for Image Caption Generation Sulabh Katiyar S. Borgohain 74 24 0 23 Feb 2021
Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation Sulabh Katiyar S. Borgohain VLM 59 14 0 22 Feb 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units Wei-Ning Hsu David Harwath Christopher Song James R. Glass CLIP 90 67 0 31 Dec 2020
SubICap: Towards Subword-informed Image Captioning Naeha Sharif Bennamoun Wei Liu Syed Afaq Ali Shah 45 2 0 24 Dec 2020
AutoCaption: Image Captioning with Neural Architecture Search Xinxin Zhu Weining Wang Longteng Guo Jing Liu 102 9 0 16 Dec 2020
StacMR: Scene-Text Aware Cross-Modal Retrieval Andrés Mafla Rafael Sampaio de Rezende Lluís Gómez Diane Larlus Dimosthenis Karatzas 3DV 102 14 0 08 Dec 2020
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation Weihao Xia Yujiu Yang Jing-Hao Xue Baoyuan Wu DiffM 118 23 0 06 Dec 2020
Robust Image Captioning Daniel Yarnell Xian Wang 46 0 0 06 Dec 2020
Understanding Guided Image Captioning Performance across Domains Edwin G. Ng Bo Pang P. Sharma Radu Soricut 118 25 0 04 Dec 2020
BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling Jing Su Qingyun Dai Frank Guerin Mian Zhou 70 24 0 03 Dec 2020
Diverse Image Captioning with Context-Object Split Latent Spaces Shweta Mahajan Stefan Roth 64 42 0 02 Nov 2020
Personalized Multimodal Feedback Generation in Education Haochen Liu Zitao Liu Zhongqin Wu Jiliang Tang 54 9 0 31 Oct 2020
DialogueTRM: Exploring the Intra- and Inter-Modal Emotional Behaviors in the Conversation Yuzhao Mao Qi Sun Guang Liu Xiaojie Wang Weiguo Gao Xuan Li Jianping Shen 75 26 0 15 Oct 2020
Spatial Attention as an Interface for Image Captioning Models P. Sadler 53 0 0 29 Sep 2020