Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1411.5654
Cited By
Learning a Recurrent Visual Representation for Image Caption Generation
20 November 2014
Xinlei Chen
C. L. Zitnick
SSL
GAN
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning a Recurrent Visual Representation for Image Caption Generation"
41 / 41 papers shown
Title
Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation
Junrong Yue
Wenjie Qu
Chuan Qin
Jing Chen
Xiaomin Lie
Xinlei Yu
Wenxin Zhang
Zhendong Zhao
54
1
0
23 Apr 2025
AutoAD: Movie Description in Context
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
29
34
0
29 Mar 2023
Multi-modal gated recurrent units for image description
Xuelong Li
Aihong Yuan
Xiaoqiang Lu
GAN
21
26
0
20 Apr 2019
Semantically Invariant Text-to-Image Generation
Shagan Sah
D. Peri
Ameya Shringi
Chi Zhang
Miguel Domínguez
Andreas E. Savakis
R. Ptucha
EGVM
25
9
0
27 Sep 2018
LUCSS: Language-based User-customized Colourization of Scene Sketches
C. Zou
Haoran Mo
Ruofei Du
Xing Wu
Chengying Gao
Hongbo Fu
30
8
0
30 Aug 2018
DeepSIC: Deep Semantic Image Compression
Sihui Luo
Yezhou Yang
Xiuming Zhang
33
46
0
29 Jan 2018
From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning
Jingkuan Song
Yuyu Guo
Lianli Gao
Xuelong Li
Alan Hanjalic
Heng Tao Shen
40
219
0
08 Aug 2017
Scene Graph Generation from Objects, Phrases and Region Captions
Yikang Li
Wanli Ouyang
Bolei Zhou
Kun Wang
Xiaogang Wang
21
499
0
31 Jul 2017
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
Jingkuan Song
Zhao Guo
Lianli Gao
Wu Liu
Dongxiang Zhang
Heng Tao Shen
48
166
0
05 Jun 2017
Query-adaptive Video Summarization via Quality-aware Relevance Estimation
A. Vasudevan
Michael Gygli
Anna Volokitin
Luc Van Gool
38
93
0
01 May 2017
Where to put the Image in an Image Caption Generator
Marc Tanti
Albert Gatt
K. Camilleri
47
96
0
27 Mar 2017
Person Search with Natural Language Description
Shuang Li
Tong Xiao
Hongsheng Li
Bolei Zhou
Dayu Yue
Xiaogang Wang
30
386
0
19 Feb 2017
Video Captioning with Multi-Faceted Attention
Xiang Long
Chuang Gan
Gerard de Melo
27
88
0
01 Dec 2016
A Survey of Multi-View Representation Learning
Yingming Li
Ming Yang
Zhongfei Zhang
AI4TS
3DV
37
509
0
03 Oct 2016
Learning to generalize to new compositions in image understanding
Yuval Atzmon
Jonathan Berant
Vahid Kezami
Amir Globerson
Gal Chechik
26
67
0
27 Aug 2016
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning
Y. Tan
Chee Seng Chan
VLM
22
29
0
20 Aug 2016
Learning Visual Storylines with Skipping Recurrent Neural Networks
Gunnar A. Sigurdsson
Xinlei Chen
Abhinav Gupta
29
38
0
14 Apr 2016
Dynamic Memory Networks for Visual and Textual Question Answering
Caiming Xiong
Stephen Merity
R. Socher
34
753
0
04 Mar 2016
Natural Language Understanding with Distributed Representation
Kyunghyun Cho
GNN
BDL
21
55
0
24 Nov 2015
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
74
1,160
0
24 Nov 2015
Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes
Satwik Kottur
Ramakrishna Vedantam
José M. F. Moura
Devi Parikh
VLM
38
85
0
22 Nov 2015
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
Haonan Yu
Jiang Wang
Zhiheng Huang
Yi Yang
Wenyuan Xu
44
560
0
26 Oct 2015
Learning Contextual Dependencies with Convolutional Hierarchical Recurrent Neural Networks
Zhen Zuo
Bing Shuai
G. Wang
Xiao Liu
Xingxing Wang
Bernie Wang
19
93
0
13 Sep 2015
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
A. Kumar
Ozan Irsoy
Peter Ondruska
Mohit Iyyer
James Bradbury
Ishaan Gulrajani
Victor Zhong
Romain Paulus
R. Socher
54
1,176
0
24 Jun 2015
Learning language through pictures
Grzegorz Chrupała
Ákos Kádár
A. Alishahi
VLM
SSL
35
65
0
11 Jun 2015
What value do explicit high level concepts have in vision to language problems?
Qi Wu
Chunhua Shen
Lingqiao Liu
A. Dick
Anton Van Den Hengel
33
443
0
03 Jun 2015
Learning to Answer Questions From Image Using Convolutional Neural Network
Lin Ma
Zhengdong Lu
Hang Li
27
261
0
01 Jun 2015
Visual Madlibs: Fill in the blank Image Generation and Question Answering
Licheng Yu
Eunbyung Park
Alexander C. Berg
Tamara L. Berg
VLM
MLLM
32
97
0
31 May 2015
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering
Haoyuan Gao
Junhua Mao
Jie Zhou
Zhiheng Huang
Lei Wang
Wenyuan Xu
32
496
0
21 May 2015
Language Models for Image Captioning: The Quirks and What Works
Jacob Devlin
Hao Cheng
Hao Fang
Saurabh Gupta
Li Deng
Xiaodong He
Geoffrey Zweig
Margaret Mitchell
32
281
0
07 May 2015
Sequence to Sequence -- Video to Text
Subhashini Venugopalan
Marcus Rohrbach
Jeff Donahue
Raymond J. Mooney
Trevor Darrell
Kate Saenko
57
1,416
0
03 May 2015
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
Junhua Mao
Xu Wei
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
25
154
0
25 Apr 2015
Multimodal Convolutional Neural Networks for Matching Image and Sentence
Lin Ma
Zhengdong Lu
Lifeng Shang
Hang Li
38
337
0
23 Apr 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
97
2,434
0
01 Apr 2015
Image Specificity
M. Jas
Devi Parikh
32
40
0
16 Feb 2015
Phrase-based Image Captioning
R. Lebret
Pedro H. O. Pinheiro
R. Collobert
VLM
31
120
0
12 Feb 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
142
10,011
0
10 Feb 2015
A Dataset for Movie Description
Anna Rohrbach
Marcus Rohrbach
Niket Tandon
Bernt Schiele
VGen
54
497
0
12 Jan 2015
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Junhua Mao
Wenyuan Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
VLM
86
1,235
0
20 Dec 2014
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
24
5,559
0
07 Dec 2014
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
103
4,412
0
20 Nov 2014
1