ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.09358
  4. Cited By
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
v1v2v3 (latest)

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

22 July 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
    VLM
ArXiv (abs)PDFHTML

Papers citing "Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods"

50 / 294 papers shown
Title
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image
  Captioning
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Jiasen Lu
Caiming Xiong
Devi Parikh
R. Socher
130
1,454
0
06 Dec 2016
Areas of Attention for Image Captioning
Areas of Attention for Image Captioning
M. Pedersoli
Thomas Lucas
Cordelia Schmid
Jakob Verbeek
79
206
0
03 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
352
3,270
0
02 Dec 2016
Guided Open Vocabulary Image Captioning with Constrained Beam Search
Guided Open Vocabulary Image Captioning with Constrained Beam Search
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
79
237
0
02 Dec 2016
Self-critical Sequence Training for Image Captioning
Self-critical Sequence Training for Image Captioning
Steven J. Rennie
E. Marcheret
Youssef Mroueh
Jerret Ross
Vaibhava Goel
109
1,890
0
02 Dec 2016
Improved Image Captioning via Policy Gradient optimization of SPIDEr
Improved Image Captioning via Policy Gradient optimization of SPIDEr
Siqi Liu
Zhenhai Zhu
Ning Ye
S. Guadarrama
Kevin Patrick Murphy
155
446
0
01 Dec 2016
Video Captioning with Multi-Faceted Attention
Video Captioning with Multi-Faceted Attention
Xiang Long
Chuang Gan
Gerard de Melo
73
88
0
01 Dec 2016
Modeling Relationships in Referential Expressions with Compositional
  Modular Networks
Modeling Relationships in Referential Expressions with Compositional Modular Networks
Ronghang Hu
Marcus Rohrbach
Jacob Andreas
Trevor Darrell
Kate Saenko
82
406
0
30 Nov 2016
Visual Dialog
Visual Dialog
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
149
1,002
0
26 Nov 2016
GuessWhat?! Visual object discovery through multi-modal dialogue
GuessWhat?! Visual object discovery through multi-modal dialogue
H. D. Vries
Florian Strub
A. Chandar
Olivier Pietquin
Hugo Larochelle
Aaron Courville
VLM
108
428
0
23 Nov 2016
Video Captioning with Transferred Semantic Attributes
Video Captioning with Transferred Semantic Attributes
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
77
329
0
23 Nov 2016
Image-to-Image Translation with Conditional Adversarial Networks
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola
Jun-Yan Zhu
Tinghui Zhou
Alexei A. Efros
SSeg
331
19,690
0
21 Nov 2016
Phrase Localization and Visual Relationship Detection with Comprehensive
  Image-Language Cues
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Bryan A. Plummer
Arun Mallya
Christopher M. Cervantes
Julia Hockenmaier
Svetlana Lazebnik
89
189
0
21 Nov 2016
A Hierarchical Approach for Generating Descriptive Image Paragraphs
A Hierarchical Approach for Generating Descriptive Image Paragraphs
J. Krause
Justin Johnson
Ranjay Krishna
Li Fei-Fei
VLM
95
378
0
20 Nov 2016
Leveraging Video Descriptions to Learn Video Question Answering
Leveraging Video Descriptions to Learn Video Question Answering
Kuo-Hao Zeng
Tseng-Hung Chen
Ching-Yao Chuang
Yuan-Hong Liao
Juan Carlos Niebles
Min Sun
99
179
0
12 Nov 2016
End-to-end Concept Word Detection for Video Captioning, Retrieval, and
  Question Answering
End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering
Youngjae Yu
Hyungjin Ko
Jongwook Choi
Gunhee Kim
126
231
0
10 Oct 2016
Learning What and Where to Draw
Learning What and Where to Draw
Scott E. Reed
Zeynep Akata
S. Mohan
Samuel Tenka
Bernt Schiele
Honglak Lee
DRLGAN
78
620
0
08 Oct 2016
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence
  Models
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
Ashwin K. Vijayakumar
Michael Cogswell
Ramprasaath R. Selvaraju
Q. Sun
Stefan Lee
David J. Crandall
Dhruv Batra
91
555
0
07 Oct 2016
Visual Question Answering: Datasets, Algorithms, and Future Challenges
Visual Question Answering: Datasets, Algorithms, and Future Challenges
Kushal Kafle
Christopher Kanan
OOD
82
243
0
05 Oct 2016
Semi-Supervised Classification with Graph Convolutional Networks
Semi-Supervised Classification with Graph Convolutional Networks
Thomas Kipf
Max Welling
GNNSSL
659
29,154
0
09 Sep 2016
Title Generation for User Generated Videos
Title Generation for User Generated Videos
Kuo-Hao Zeng
Tseng-Hung Chen
Juan Carlos Niebles
Min Sun
73
69
0
25 Aug 2016
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN3DV
790
36,881
0
25 Aug 2016
Modeling Context in Referring Expressions
Modeling Context in Referring Expressions
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
131
1,275
0
31 Jul 2016
SPICE: Semantic Propositional Image Caption Evaluation
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
108
1,919
0
29 Jul 2016
Visual Question Answering: A Survey of Methods and Datasets
Visual Question Answering: A Survey of Methods and Datasets
Qi Wu
Damien Teney
Peng Wang
Chunhua Shen
A. Dick
Anton Van Den Hengel
107
418
0
20 Jul 2016
Captioning Images with Diverse Objects
Captioning Images with Diverse Objects
Subhashini Venugopalan
Lisa Anne Hendricks
Marcus Rohrbach
Raymond J. Mooney
Trevor Darrell
Kate Saenko
VLM
55
178
0
24 Jun 2016
Sort Story: Sorting Jumbled Images and Captions into Stories
Sort Story: Sorting Jumbled Images and Captions into Stories
Harsh Agrawal
Arjun Chandrasekaran
Dhruv Batra
Devi Parikh
Joey Tianyi Zhou
56
60
0
23 Jun 2016
Improved Techniques for Training GANs
Improved Techniques for Training GANs
Tim Salimans
Ian Goodfellow
Wojciech Zaremba
Vicki Cheung
Alec Radford
Xi Chen
GAN
486
9,067
0
10 Jun 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
308
1,466
0
06 Jun 2016
Deep Reinforcement Learning for Dialogue Generation
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
285
1,339
0
05 Jun 2016
Fully Convolutional Networks for Semantic Segmentation
Fully Convolutional Networks for Semantic Segmentation
Evan Shelhamer
Jonathan Long
Trevor Darrell
VOSSSeg
747
37,890
0
20 May 2016
Generative Adversarial Text to Image Synthesis
Generative Adversarial Text to Image Synthesis
Scott E. Reed
Zeynep Akata
Xinchen Yan
Lajanugen Logeswaran
Bernt Schiele
Honglak Lee
GAN
207
3,148
0
17 May 2016
Improving LSTM-based Video Description with Linguistic Knowledge Mined
  from Text
Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text
Subhashini Venugopalan
Lisa Anne Hendricks
Raymond J. Mooney
Kate Saenko
VLM
56
117
0
06 Apr 2016
Image Captioning with Deep Bidirectional LSTMs
Image Captioning with Deep Bidirectional LSTMs
Cheng Wang
Haojin Yang
Christian Bartz
Christoph Meinel
VLM
60
279
0
04 Apr 2016
Image Captioning with Semantic Attention
Image Captioning with Semantic Attention
Quanzeng You
Hailin Jin
Zhaowen Wang
Chen Fang
Jiebo Luo
VLM
174
1,662
0
12 Mar 2016
Image Captioning and Visual Question Answering Based on Attributes and
  External Knowledge
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
Qi Wu
Chunhua Shen
Anton Van Den Hengel
Peng Wang
A. Dick
66
362
0
09 Mar 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
225
5,762
0
23 Feb 2016
Abstractive Text Summarization Using Sequence-to-Sequence RNNs and
  Beyond
Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond
Ramesh Nallapati
Bowen Zhou
Cicero Nogueira dos Santos
Çağlar Gülçehre
Bing Xiang
AIMat
279
2,566
0
19 Feb 2016
Multimodal Pivots for Image Caption Translation
Multimodal Pivots for Image Caption Translation
Julian Hitschler
Shigehiko Schamoni
Stefan Riezler
107
97
0
15 Jan 2016
Automatic Description Generation from Images: A Survey of Models,
  Datasets, and Evaluation Measures
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
Raffaella Bernardi
Ruken Cakici
Desmond Elliott
Aykut Erdem
Erkut Erdem
Nazli Ikizler-Cinbis
Frank Keller
A. Muscat
Barbara Plank
EGVMVLM
75
364
0
15 Jan 2016
Learning to Compose Neural Networks for Question Answering
Learning to Compose Neural Networks for Question Answering
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Dan Klein
NAIKELMBDLCoGe
99
568
0
07 Jan 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,426
0
10 Dec 2015
MovieQA: Understanding Stories in Movies through Question-Answering
MovieQA: Understanding Stories in Movies through Question-Answering
Makarand Tapaswi
Yukun Zhu
Rainer Stiefelhagen
Antonio Torralba
R. Urtasun
Sanja Fidler
115
752
0
09 Dec 2015
Rethinking the Inception Architecture for Computer Vision
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DVBDL
886
27,416
0
02 Dec 2015
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
131
1,170
0
24 Nov 2015
Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems
Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems
Jesse Dodge
Andreea Gane
Xiang Zhang
Antoine Bordes
S. Chopra
Alexander H. Miller
Arthur Szlam
Jason Weston
ELM
89
198
0
21 Nov 2015
Unsupervised Representation Learning with Deep Convolutional Generative
  Adversarial Networks
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Alec Radford
Luke Metz
Soumith Chintala
GANOOD
271
14,023
0
19 Nov 2015
Deep Compositional Captioning: Describing Novel Object Categories
  without Paired Training Data
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
Lisa Anne Hendricks
Subhashini Venugopalan
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
Trevor Darrell
CoGe
64
284
0
17 Nov 2015
Yin and Yang: Balancing and Answering Binary Visual Questions
Yin and Yang: Balancing and Answering Binary Visual Questions
Peng Zhang
Yash Goyal
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
87
352
0
16 Nov 2015
Natural Language Object Retrieval
Natural Language Object Retrieval
Ronghang Hu
Huazhe Xu
Marcus Rohrbach
Jiashi Feng
Kate Saenko
Trevor Darrell
ObjD
99
554
0
13 Nov 2015
Previous
123456
Next