ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5654
  4. Cited By
Learning a Recurrent Visual Representation for Image Caption Generation

Learning a Recurrent Visual Representation for Image Caption Generation

20 November 2014
Xinlei Chen
C. L. Zitnick
    SSL
    GAN
ArXivPDFHTML

Papers citing "Learning a Recurrent Visual Representation for Image Caption Generation"

41 / 41 papers shown
Title
Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation
Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation
Junrong Yue
Wenjie Qu
Chuan Qin
Jing Chen
Xiaomin Lie
Xinlei Yu
Wenxin Zhang
Zhendong Zhao
54
1
0
23 Apr 2025
AutoAD: Movie Description in Context
AutoAD: Movie Description in Context
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
29
34
0
29 Mar 2023
Multi-modal gated recurrent units for image description
Multi-modal gated recurrent units for image description
Xuelong Li
Aihong Yuan
Xiaoqiang Lu
GAN
21
26
0
20 Apr 2019
Semantically Invariant Text-to-Image Generation
Semantically Invariant Text-to-Image Generation
Shagan Sah
D. Peri
Ameya Shringi
Chi Zhang
Miguel Domínguez
Andreas E. Savakis
R. Ptucha
EGVM
25
9
0
27 Sep 2018
LUCSS: Language-based User-customized Colourization of Scene Sketches
LUCSS: Language-based User-customized Colourization of Scene Sketches
C. Zou
Haoran Mo
Ruofei Du
Xing Wu
Chengying Gao
Hongbo Fu
30
8
0
30 Aug 2018
DeepSIC: Deep Semantic Image Compression
DeepSIC: Deep Semantic Image Compression
Sihui Luo
Yezhou Yang
Xiuming Zhang
33
46
0
29 Jan 2018
From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video
  Captioning
From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning
Jingkuan Song
Yuyu Guo
Lianli Gao
Xuelong Li
Alan Hanjalic
Heng Tao Shen
40
219
0
08 Aug 2017
Scene Graph Generation from Objects, Phrases and Region Captions
Scene Graph Generation from Objects, Phrases and Region Captions
Yikang Li
Wanli Ouyang
Bolei Zhou
Kun Wang
Xiaogang Wang
21
499
0
31 Jul 2017
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
Jingkuan Song
Zhao Guo
Lianli Gao
Wu Liu
Dongxiang Zhang
Heng Tao Shen
48
166
0
05 Jun 2017
Query-adaptive Video Summarization via Quality-aware Relevance
  Estimation
Query-adaptive Video Summarization via Quality-aware Relevance Estimation
A. Vasudevan
Michael Gygli
Anna Volokitin
Luc Van Gool
38
93
0
01 May 2017
Where to put the Image in an Image Caption Generator
Where to put the Image in an Image Caption Generator
Marc Tanti
Albert Gatt
K. Camilleri
47
96
0
27 Mar 2017
Person Search with Natural Language Description
Person Search with Natural Language Description
Shuang Li
Tong Xiao
Hongsheng Li
Bolei Zhou
Dayu Yue
Xiaogang Wang
30
386
0
19 Feb 2017
Video Captioning with Multi-Faceted Attention
Video Captioning with Multi-Faceted Attention
Xiang Long
Chuang Gan
Gerard de Melo
24
88
0
01 Dec 2016
A Survey of Multi-View Representation Learning
A Survey of Multi-View Representation Learning
Yingming Li
Ming Yang
Zhongfei Zhang
AI4TS
3DV
37
509
0
03 Oct 2016
Learning to generalize to new compositions in image understanding
Learning to generalize to new compositions in image understanding
Yuval Atzmon
Jonathan Berant
Vahid Kezami
Amir Globerson
Gal Chechik
26
67
0
27 Aug 2016
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning
Y. Tan
Chee Seng Chan
VLM
22
29
0
20 Aug 2016
Learning Visual Storylines with Skipping Recurrent Neural Networks
Learning Visual Storylines with Skipping Recurrent Neural Networks
Gunnar A. Sigurdsson
Xinlei Chen
Abhinav Gupta
29
38
0
14 Apr 2016
Dynamic Memory Networks for Visual and Textual Question Answering
Dynamic Memory Networks for Visual and Textual Question Answering
Caiming Xiong
Stephen Merity
R. Socher
34
753
0
04 Mar 2016
Natural Language Understanding with Distributed Representation
Natural Language Understanding with Distributed Representation
Kyunghyun Cho
GNN
BDL
21
55
0
24 Nov 2015
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
74
1,160
0
24 Nov 2015
Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings
  Using Abstract Scenes
Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes
Satwik Kottur
Ramakrishna Vedantam
José M. F. Moura
Devi Parikh
VLM
35
85
0
22 Nov 2015
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
Haonan Yu
Jiang Wang
Zhiheng Huang
Yi Yang
Wenyuan Xu
44
560
0
26 Oct 2015
Learning Contextual Dependencies with Convolutional Hierarchical
  Recurrent Neural Networks
Learning Contextual Dependencies with Convolutional Hierarchical Recurrent Neural Networks
Zhen Zuo
Bing Shuai
G. Wang
Xiao Liu
Xingxing Wang
Bernie Wang
19
93
0
13 Sep 2015
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
A. Kumar
Ozan Irsoy
Peter Ondruska
Mohit Iyyer
James Bradbury
Ishaan Gulrajani
Victor Zhong
Romain Paulus
R. Socher
54
1,176
0
24 Jun 2015
Learning language through pictures
Learning language through pictures
Grzegorz Chrupała
Ákos Kádár
A. Alishahi
VLM
SSL
35
65
0
11 Jun 2015
What value do explicit high level concepts have in vision to language
  problems?
What value do explicit high level concepts have in vision to language problems?
Qi Wu
Chunhua Shen
Lingqiao Liu
A. Dick
Anton Van Den Hengel
33
443
0
03 Jun 2015
Learning to Answer Questions From Image Using Convolutional Neural
  Network
Learning to Answer Questions From Image Using Convolutional Neural Network
Lin Ma
Zhengdong Lu
Hang Li
27
261
0
01 Jun 2015
Visual Madlibs: Fill in the blank Image Generation and Question
  Answering
Visual Madlibs: Fill in the blank Image Generation and Question Answering
Licheng Yu
Eunbyung Park
Alexander C. Berg
Tamara L. Berg
VLM
MLLM
32
97
0
31 May 2015
Are You Talking to a Machine? Dataset and Methods for Multilingual Image
  Question Answering
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering
Haoyuan Gao
Junhua Mao
Jie Zhou
Zhiheng Huang
Lei Wang
Wenyuan Xu
32
496
0
21 May 2015
Language Models for Image Captioning: The Quirks and What Works
Language Models for Image Captioning: The Quirks and What Works
Jacob Devlin
Hao Cheng
Hao Fang
Saurabh Gupta
Li Deng
Xiaodong He
Geoffrey Zweig
Margaret Mitchell
32
281
0
07 May 2015
Sequence to Sequence -- Video to Text
Sequence to Sequence -- Video to Text
Subhashini Venugopalan
Marcus Rohrbach
Jeff Donahue
Raymond J. Mooney
Trevor Darrell
Kate Saenko
57
1,416
0
03 May 2015
Learning like a Child: Fast Novel Visual Concept Learning from Sentence
  Descriptions of Images
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
Junhua Mao
Xu Wei
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
25
154
0
25 Apr 2015
Multimodal Convolutional Neural Networks for Matching Image and Sentence
Multimodal Convolutional Neural Networks for Matching Image and Sentence
Lin Ma
Zhengdong Lu
Lifeng Shang
Hang Li
38
337
0
23 Apr 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
97
2,434
0
01 Apr 2015
Image Specificity
Image Specificity
M. Jas
Devi Parikh
32
40
0
16 Feb 2015
Phrase-based Image Captioning
Phrase-based Image Captioning
R. Lebret
Pedro H. O. Pinheiro
R. Collobert
VLM
31
120
0
12 Feb 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
142
10,011
0
10 Feb 2015
A Dataset for Movie Description
A Dataset for Movie Description
Anna Rohrbach
Marcus Rohrbach
Niket Tandon
Bernt Schiele
VGen
54
497
0
12 Jan 2015
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)
Junhua Mao
Wenyuan Xu
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
VLM
86
1,235
0
20 Dec 2014
Deep Visual-Semantic Alignments for Generating Image Descriptions
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
24
5,559
0
07 Dec 2014
CIDEr: Consensus-based Image Description Evaluation
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
100
4,412
0
20 Nov 2014
1