Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1412.6632
Cited By
v1
v2
v3
v4
v5 (latest)
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
20 December 2014
Junhua Mao
Wenyuan Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)"
50 / 417 papers shown
Title
Automatic Rule Induction for Interpretable Semi-Supervised Learning
Reid Pryzant
Ziyi Yang
Yichong Xu
Chenguang Zhu
Michael Zeng
81
10
0
18 May 2022
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
Chia-Wen Kuo
Z. Kira
97
55
0
09 May 2022
Diverse Image Captioning with Grounded Style
Franz Klein
Shweta Mahajan
S. Roth
72
7
0
03 May 2022
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Zhaowei Cai
Gukyeong Kwon
Avinash Ravichandran
Erhan Bas
Zhuowen Tu
Rahul Bhotika
Stefano Soatto
ObjD
MLLM
VLM
67
50
0
12 Apr 2022
On Distinctive Image Captioning via Comparing and Reweighting
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
87
16
0
08 Apr 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
Peipei Zhu
Tianlin Li
Yong Luo
Zhenglong Sun
Wei-Shi Zheng
Yaowei Wang
Chen Chen
102
12
0
07 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TS
VLM
79
37
0
03 Mar 2022
Inference of captions from histopathological patches
M. Tsuneki
F. Kanavati
84
32
0
07 Feb 2022
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators
Lois Orosa
Skanda Koppula
Yaman Umuroglu
Konstantinos Kanellopoulos
Juan Gómez Luna
Michaela Blott
K. Vissers
O. Mutlu
82
4
0
04 Feb 2022
Multi-Label Classification on Remote-Sensing Images
A. Singh
B. Uma Shankar
38
0
0
06 Jan 2022
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
122
197
0
29 Nov 2021
Contrastive Learning of Visual-Semantic Embeddings
Anurag Jain
Yashaswi Verma
SSL
66
1
0
17 Oct 2021
Geometry Attention Transformer with Position-aware LSTMs for Image Captioning
Chi-Yin Wang
Yulin Shen
Luping Ji
ViT
106
53
0
01 Oct 2021
Cross Modification Attention Based Deliberation Model for Image Captioning
Zheng Lian
Yanan Zhang
Haichang Li
Rui Wang
Xiaohui Hu
64
5
0
17 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
94
13
0
07 Sep 2021
Group-based Distinctive Image Captioning with Memory Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
100
18
0
20 Aug 2021
Caption Generation on Scenes with Seen and Unseen Object Categories
B. Demirel
R. G. Cinbis
VLM
115
1
0
13 Aug 2021
A Better Loss for Visual-Textual Grounding
Davide Rigoni
Luciano Serafini
A. Sperduti
ObjD
60
3
0
11 Aug 2021
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
207
160
0
07 Aug 2021
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval
Xuri Ge
Fuhai Chen
J. Jose
Zhilong Ji
Zhongqin Wu
Xiao-Chang Liu
72
57
0
05 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
153
270
0
14 Jul 2021
A comparison of LSTM and GRU networks for learning symbolic sequences
Roberto Cahuantzi
Xinye Chen
S. Güttel
96
143
0
05 Jul 2021
Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching between Parts and Words
Chuan Tang
Xi Yang
Bojian Wu
Zhizhong Han
Yi Chang
3DPC
91
13
0
05 Jul 2021
Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions
Motonari Kambara
K. Sugiura
ViT
62
6
0
02 Jul 2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Jing Liu
Xinxin Zhu
Fei Liu
Longteng Guo
Zijia Zhao
...
Weining Wang
Hanqing Lu
Shiyu Zhou
Jiajun Zhang
Jinqiao Wang
82
38
0
01 Jul 2021
New Encoder Learning for Captioning Heavy Rain Images via Semantic Visual Feature Matching
Chang-Hwan Son
Pung-Hwi Ye
123
3
0
28 May 2021
Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation
Xingyi Yang
Muchao Ye
Quanzeng You
Fenglong Ma
MedIm
57
38
0
25 May 2021
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
K. Ueki
45
4
0
16 May 2021
End-to-End Attention-based Image Captioning
Carola Sundaramoorthy
Lin Ziwen Kelvin
Mahak Sarin
Shubham Gupta
ViT
57
6
0
30 Apr 2021
Multi-view Deep One-class Classification: A Systematic Exploration
Siqi Wang
Jiyuan Liu
Guang Yu
Xinwang Liu
Sihang Zhou
En Zhu
Yuexiang Yang
Jianping Yin
24
1
0
27 Apr 2021
Towards Open-World Text-Guided Face Image Generation and Manipulation
Weihao Xia
Yujiu Yang
Jing-Hao Xue
Baoyuan Wu
DiffM
69
42
0
18 Apr 2021
Integrating Information Theory and Adversarial Learning for Cross-modal Retrieval
Wei Chen
Yu Liu
E. Bakker
M. Lew
GAN
41
27
0
11 Apr 2021
A Comprehensive Review of the Video-to-Text Problem
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
71
17
0
27 Mar 2021
Sequential Learning on Liver Tumor Boundary Semantics and Prognostic Biomarker Mining
Jieneng Chen
K. Yan
Yu-Dong Zhang
Youbao Tang
Xun Xu
...
Lingyun Huang
Jing Xiao
Alan Yuille
Ya Zhang
Le Lu
30
2
0
09 Mar 2021
Analysis of Convolutional Decoder for Image Caption Generation
Sulabh Katiyar
S. Borgohain
52
0
0
08 Mar 2021
A Universal Model for Cross Modality Mapping by Relational Reasoning
Zun Li
Congyan Lang
Liqian Liang
Tao Wang
Songhe Feng
Jun Wu
Yidong Li
56
2
0
26 Feb 2021
Comparative evaluation of CNN architectures for Image Caption Generation
Sulabh Katiyar
S. Borgohain
74
24
0
23 Feb 2021
Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation
Sulabh Katiyar
S. Borgohain
VLM
59
14
0
22 Feb 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
90
67
0
31 Dec 2020
SubICap: Towards Subword-informed Image Captioning
Naeha Sharif
Bennamoun
Wei Liu
Syed Afaq Ali Shah
45
2
0
24 Dec 2020
AutoCaption: Image Captioning with Neural Architecture Search
Xinxin Zhu
Weining Wang
Longteng Guo
Jing Liu
102
9
0
16 Dec 2020
StacMR: Scene-Text Aware Cross-Modal Retrieval
Andrés Mafla
Rafael Sampaio de Rezende
Lluís Gómez
Diane Larlus
Dimosthenis Karatzas
3DV
102
14
0
08 Dec 2020
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
Weihao Xia
Yujiu Yang
Jing-Hao Xue
Baoyuan Wu
DiffM
118
23
0
06 Dec 2020
Robust Image Captioning
Daniel Yarnell
Xian Wang
46
0
0
06 Dec 2020
Understanding Guided Image Captioning Performance across Domains
Edwin G. Ng
Bo Pang
P. Sharma
Radu Soricut
118
25
0
04 Dec 2020
BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling
Jing Su
Qingyun Dai
Frank Guerin
Mian Zhou
70
24
0
03 Dec 2020
Diverse Image Captioning with Context-Object Split Latent Spaces
Shweta Mahajan
Stefan Roth
64
42
0
02 Nov 2020
Personalized Multimodal Feedback Generation in Education
Haochen Liu
Zitao Liu
Zhongqin Wu
Jiliang Tang
54
9
0
31 Oct 2020
DialogueTRM: Exploring the Intra- and Inter-Modal Emotional Behaviors in the Conversation
Yuzhao Mao
Qi Sun
Guang Liu
Xiaojie Wang
Weiguo Gao
Xuan Li
Jianping Shen
75
26
0
15 Oct 2020
Spatial Attention as an Interface for Image Captioning Models
P. Sadler
53
0
0
29 Sep 2020
Previous
1
2
3
4
5
6
7
8
9
Next