ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1410.1090
  4. Cited By
Explain Images with Multimodal Recurrent Neural Networks

Explain Images with Multimodal Recurrent Neural Networks

4 October 2014
Junhua Mao
Wenyuan Xu
Yi Yang
Jiang Wang
Alan Yuille
    VLM
    GAN
ArXivPDFHTML

Papers citing "Explain Images with Multimodal Recurrent Neural Networks"

50 / 116 papers shown
Title
Commonly Uncommon: Semantic Sparsity in Situation Recognition
Commonly Uncommon: Semantic Sparsity in Situation Recognition
Mark Yatskar
Vicente Ordonez
Luke Zettlemoyer
Ali Farhadi
VLM
17
42
0
03 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
155
3,136
0
02 Dec 2016
Video Captioning with Transferred Semantic Attributes
Video Captioning with Transferred Semantic Attributes
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
27
329
0
23 Nov 2016
Dense Captioning with Joint Inference and Visual Context
Dense Captioning with Joint Inference and Visual Context
L. Yang
K. Tang
Jianchao Yang
Li-Jia Li
VLM
30
169
0
21 Nov 2016
Instance-aware Image and Sentence Matching with Selective Multimodal
  LSTM
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
Yan Huang
Wei Wang
Liang Wang
26
222
0
17 Nov 2016
A Semi-supervised Framework for Image Captioning
A Semi-supervised Framework for Image Captioning
Wenhu Chen
Aurelien Lucchi
Thomas Hofmann
37
9
0
16 Nov 2016
Boosting Image Captioning with Attributes
Boosting Image Captioning with Attributes
Ting Yao
Yingwei Pan
Yehao Li
Zhaofan Qiu
Tao Mei
VLM
48
620
0
05 Nov 2016
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning
  Challenge
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
30
851
0
21 Sep 2016
GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for
  Multimodal Information Fusion
GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for Multimodal Information Fusion
Ankit Gandhi
Arjun Sharma
Arijit Biswas
Om Deshmukh
AI4TS
21
12
0
17 Sep 2016
Linking Image and Text with 2-Way Nets
Linking Image and Text with 2-Way Nets
Aviv Eisenschtat
Lior Wolf
27
176
0
29 Aug 2016
Learning to generalize to new compositions in image understanding
Learning to generalize to new compositions in image understanding
Yuval Atzmon
Jonathan Berant
Vahid Kezami
Amir Globerson
Gal Chechik
26
67
0
27 Aug 2016
DeepDiary: Automatic Caption Generation for Lifelogging Image Streams
DeepDiary: Automatic Caption Generation for Lifelogging Image Streams
Chenyou Fan
David J. Crandall
DiffM
14
5
0
12 Aug 2016
Multilingual Visual Sentiment Concept Matching
Multilingual Visual Sentiment Concept Matching
Nikolaos Pappas
Miriam Redi
Mercan Topkara
Brendan Jou
Hongyi Liu
Tao Chen
Shih-Fu Chang
CVBM
26
14
0
07 Jun 2016
Automated Image Captioning for Rapid Prototyping and Resource
  Constrained Environments
Automated Image Captioning for Rapid Prototyping and Resource Constrained Environments
Karan Sharma
Arun C. S. Kumar
S. Bhandarkar
20
0
0
04 Jun 2016
Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length
  Image Tagging
Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length Image Tagging
Jiren Jin
Hideki Nakayama
3DV
VLM
30
69
0
18 Apr 2016
Generating Visual Explanations
Generating Visual Explanations
Lisa Anne Hendricks
Zeynep Akata
Marcus Rohrbach
Jeff Donahue
Bernt Schiele
Trevor Darrell
VLM
FAtt
47
618
0
28 Mar 2016
Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for
  Automated Image Annotation
Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation
Hoo-Chang Shin
Kirk Roberts
Le Lu
Dina Demner-Fushman
Jianhua Yao
Ronald M. Summers
24
347
0
28 Mar 2016
Content-based Video Indexing and Retrieval Using Corr-LDA
Content-based Video Indexing and Retrieval Using Corr-LDA
R. Iyer
Sanjeel Parekh
Vikas Mohandoss
Anush Ramsurat
Bhiksha Raj
Rita Singh
16
22
0
27 Feb 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li-Jia Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
108
5,663
0
23 Feb 2016
Generate Image Descriptions based on Deep RNN and Memory Cells for
  Images Features
Generate Image Descriptions based on Deep RNN and Memory Cells for Images Features
Shijian Tang
Song Han
VLM
20
1
0
05 Feb 2016
Event Specific Multimodal Pattern Mining with Image-Caption Pairs
Event Specific Multimodal Pattern Mining with Image-Caption Pairs
Hongzhi Li
Joseph G. Ellis
Shih-Fu Chang
6
2
0
31 Dec 2015
RNN Fisher Vectors for Action Recognition and Image Annotation
RNN Fisher Vectors for Action Recognition and Image Annotation
Guy Lev
Gil Sadeh
Benjamin Klein
Lior Wolf
19
163
0
12 Dec 2015
Neural Self Talk: Image Understanding via Continuous Questioning and
  Answering
Neural Self Talk: Image Understanding via Continuous Questioning and Answering
Yezhou Yang
Yi Li
Cornelia Fermuller
Yiannis Aloimonos
19
24
0
10 Dec 2015
Natural Language Understanding with Distributed Representation
Natural Language Understanding with Distributed Representation
Kyunghyun Cho
GNN
BDL
21
55
0
24 Nov 2015
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
74
1,160
0
24 Nov 2015
Where To Look: Focus Regions for Visual Question Answering
Where To Look: Focus Regions for Visual Question Answering
Kevin J. Shih
Saurabh Singh
Derek Hoiem
34
456
0
23 Nov 2015
Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings
  Using Abstract Scenes
Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes
Satwik Kottur
Ramakrishna Vedantam
José M. F. Moura
Devi Parikh
VLM
38
85
0
22 Nov 2015
Asymmetrically Weighted CCA And Hierarchical Kernel Sentence Embedding
  For Image & Text Retrieval
Asymmetrically Weighted CCA And Hierarchical Kernel Sentence Embedding For Image & Text Retrieval
Youssef Mroueh
E. Marcheret
Vaibhava Goel
21
3
0
19 Nov 2015
Recurrent Neural Networks Hardware Implementation on FPGA
Recurrent Neural Networks Hardware Implementation on FPGA
Andre Xian Ming Chang
B. Martini
Eugenio Culurciello
27
126
0
17 Nov 2015
Yin and Yang: Balancing and Answering Binary Visual Questions
Yin and Yang: Balancing and Answering Binary Visual Questions
Peng Zhang
Yash Goyal
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
37
349
0
16 Nov 2015
From Images to Sentences through Scene Description Graphs using
  Commonsense Reasoning and Knowledge
From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge
Somak Aditya
Yezhou Yang
Chitta Baral
Cornelia Fermuller
Yiannis Aloimonos
3DV
19
69
0
10 Nov 2015
Automatic Concept Discovery from Parallel Text and Visual Corpora
Automatic Concept Discovery from Parallel Text and Visual Corpora
Chen Sun
Chuang Gan
Ram Nevatia
CoGe
12
107
0
24 Sep 2015
Image Representations and New Domains in Neural Image Captioning
Image Representations and New Domains in Neural Image Captioning
Jack Hessel
Nicolas Savva
Michael J. Wilber
VLM
30
16
0
09 Aug 2015
Describing Multimedia Content using Attention-based Encoder--Decoder
  Networks
Describing Multimedia Content using Attention-based Encoder--Decoder Networks
Kyunghyun Cho
Aaron Courville
Yoshua Bengio
32
411
0
04 Jul 2015
Aligning Books and Movies: Towards Story-like Visual Explanations by
  Watching Movies and Reading Books
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
Yukun Zhu
Ryan Kiros
R. Zemel
Ruslan Salakhutdinov
R. Urtasun
Antonio Torralba
Sanja Fidler
60
2,517
0
22 Jun 2015
Aligning where to see and what to tell: image caption with region-based
  attention and scene factorization
Aligning where to see and what to tell: image caption with region-based attention and scene factorization
Junqi Jin
Kun Fu
Runpeng Cui
Fei Sha
Changshui Zhang
34
117
0
20 Jun 2015
Learning language through pictures
Learning language through pictures
Grzegorz Chrupała
Ákos Kádár
A. Alishahi
VLM
SSL
35
65
0
11 Jun 2015
Learning to Answer Questions From Image Using Convolutional Neural
  Network
Learning to Answer Questions From Image Using Convolutional Neural Network
Lin Ma
Zhengdong Lu
Hang Li
27
261
0
01 Jun 2015
A Multi-scale Multiple Instance Video Description Network
A Multi-scale Multiple Instance Video Description Network
Huijuan Xu
Subhashini Venugopalan
Vasili Ramanishka
Marcus Rohrbach
Kate Saenko
40
64
0
21 May 2015
Are You Talking to a Machine? Dataset and Methods for Multilingual Image
  Question Answering
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering
Haoyuan Gao
Junhua Mao
Jie Zhou
Zhiheng Huang
Lei Wang
Wenyuan Xu
32
496
0
21 May 2015
Visual Semantic Role Labeling
Visual Semantic Role Labeling
Saurabh Gupta
Jitendra Malik
29
404
0
17 May 2015
Exploring Nearest Neighbor Approaches for Image Captioning
Exploring Nearest Neighbor Approaches for Image Captioning
Jacob Devlin
Saurabh Gupta
Ross B. Girshick
Margaret Mitchell
C. L. Zitnick
27
195
0
17 May 2015
Exploring Models and Data for Image Question Answering
Exploring Models and Data for Image Question Answering
Mengye Ren
Ryan Kiros
R. Zemel
44
711
0
08 May 2015
Jointly Modeling Embedding and Translation to Bridge Video and Language
Jointly Modeling Embedding and Translation to Bridge Video and Language
Yingwei Pan
Tao Mei
Ting Yao
Houqiang Li
Y. Rui
41
535
0
07 May 2015
Language Models for Image Captioning: The Quirks and What Works
Language Models for Image Captioning: The Quirks and What Works
Jacob Devlin
Hao Cheng
Hao Fang
Saurabh Gupta
Li Deng
Xiaodong He
Geoffrey Zweig
Margaret Mitchell
32
281
0
07 May 2015
VQA: Visual Question Answering
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
96
5,383
0
03 May 2015
Learning like a Child: Fast Novel Visual Concept Learning from Sentence
  Descriptions of Images
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
Junhua Mao
Xu Wei
Yi Yang
Jiang Wang
Zhiheng Huang
Alan Yuille
25
154
0
25 Apr 2015
Multimodal Convolutional Neural Networks for Matching Image and Sentence
Multimodal Convolutional Neural Networks for Matching Image and Sentence
Lin Ma
Zhengdong Lu
Lifeng Shang
Hang Li
38
337
0
23 Apr 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
97
2,434
0
01 Apr 2015
Generating Multi-Sentence Lingual Descriptions of Indoor Scenes
Generating Multi-Sentence Lingual Descriptions of Indoor Scenes
Dahua Lin
Chen Kong
Sanja Fidler
R. Urtasun
3DV
18
27
0
28 Feb 2015
Previous
123
Next