ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.09358
  4. Cited By
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
v1v2v3 (latest)

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

22 July 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
    VLM
ArXiv (abs)PDFHTML

Papers citing "Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods"

50 / 294 papers shown
Title
Visual Reference Resolution using Attention Memory for Visual Dialog
Visual Reference Resolution using Attention Memory for Visual Dialog
Paul Hongsuck Seo
Andreas M. Lehrmann
Bohyung Han
Leonid Sigal
77
123
0
23 Sep 2017
FiLM: Visual Reasoning with a General Conditioning Layer
FiLM: Visual Reasoning with a General Conditioning Layer
Ethan Perez
Florian Strub
H. D. Vries
Vincent Dumoulin
Aaron Courville
FAttAIMatOffRLAI4CE
363
2,233
0
22 Sep 2017
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
Jiuxiang Gu
Jianfei Cai
G. Wang
Tsuhan Chen
75
180
0
11 Sep 2017
Incorporating Copying Mechanism in Image Captioning for Learning Novel
  Objects
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects
Ting Yao
Yingwei Pan
Yehao Li
Tao Mei
VLM
60
147
0
17 Aug 2017
Hierarchically-Attentive RNN for Album Summarization and Storytelling
Hierarchically-Attentive RNN for Album Summarization and Storytelling
Licheng Yu
Joey Tianyi Zhou
Tamara L. Berg
64
66
0
09 Aug 2017
Reinforced Video Captioning with Entailment Rewards
Reinforced Video Captioning with Entailment Rewards
Ramakanth Pasunuru
Joey Tianyi Zhou
63
115
0
07 Aug 2017
Localizing Moments in Video with Natural Language
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
123
949
0
04 Aug 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
123
4,221
0
25 Jul 2017
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
Xuwang Yin
Vicente Ordonez
VLM
83
55
0
22 Jul 2017
An empirical study on the effectiveness of images in Multimodal Neural
  Machine Translation
An empirical study on the effectiveness of images in Multimodal Neural Machine Translation
Jean-Benoit Delbrouck
Stéphane Dupont
68
39
0
04 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
781
132,363
0
12 Jun 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning
  to a Generative Visual Dialog Model
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Jiasen Lu
A. Kannan
Jianwei Yang
Devi Parikh
Dhruv Batra
BDL
74
137
0
05 Jun 2017
A simple neural network module for relational reasoning
A simple neural network module for relational reasoning
Adam Santoro
David Raposo
David Barrett
Mateusz Malinowski
Razvan Pascanu
Peter W. Battaglia
Timothy Lillicrap
GNNNAI
189
1,615
0
05 Jun 2017
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
Jingkuan Song
Zhao Guo
Lianli Gao
Wu Liu
Dongxiang Zhang
Heng Tao Shen
80
166
0
05 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
111
2,937
0
26 May 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
240
8,038
0
22 May 2017
Imagination improves Multimodal Translation
Imagination improves Multimodal Translation
Desmond Elliott
Ákos Kádár
155
137
0
11 May 2017
Inferring and Executing Programs for Visual Reasoning
Inferring and Executing Programs for Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Judy Hoffman
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
NAI
89
545
0
10 May 2017
Show, Adapt and Tell: Adversarial Training of Cross-domain Image
  Captioner
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner
Tseng-Hung Chen
Yuan-Hong Liao
Ching-Yao Chuang
W. Hsu
Jianlong Fu
Min Sun
93
142
0
02 May 2017
STAIR Captions: Constructing a Large-Scale Japanese Image Caption
  Dataset
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset
Yuya Yoshikawa
Yutaro Shigeto
A. Takeuchi
3DV
60
118
0
02 May 2017
Dense-Captioning Events in Videos
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
144
1,249
0
02 May 2017
Multi-Task Video Captioning with Video and Entailment Generation
Multi-Task Video Captioning with Video and Entailment Generation
Ramakanth Pasunuru
Joey Tianyi Zhou
57
117
0
24 Apr 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks
An Analysis of Action Recognition Datasets for Language and Vision Tasks
Spandana Gella
Frank Keller
ObjD
38
11
0
24 Apr 2017
Attention Strategies for Multi-Source Sequence-to-Sequence Learning
Attention Strategies for Multi-Source Sequence-to-Sequence Learning
Jindrich Libovický
Jindřich Helcl
AIMat
76
183
0
21 Apr 2017
Learning to Reason: End-to-End Module Networks for Visual Question
  Answering
Learning to Reason: End-to-End Module Networks for Visual Question Answering
Ronghang Hu
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Kate Saenko
KELMGNNReLMLRM
129
579
0
18 Apr 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
87
561
0
14 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
77
498
0
11 Apr 2017
Towards a Visual Privacy Advisor: Understanding and Predicting Privacy
  Risks in Images
Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images
Rakshith Shetty
Bernt Schiele
Mario Fritz
102
228
0
30 Mar 2017
Survey of the State of the Art in Natural Language Generation: Core
  tasks, applications and evaluation
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
Albert Gatt
E. Krahmer
LM&MAELM
124
820
0
29 Mar 2017
Towards Automatic Learning of Procedures from Web Instructional Videos
Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou
Chenliang Xu
Jason J. Corso
EgoV
75
830
0
28 Mar 2017
Multimodal Compact Bilinear Pooling for Multimodal Neural Machine
  Translation
Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation
Jean-Benoit Delbrouck
Stéphane Dupont
77
30
0
23 Mar 2017
Recurrent Multimodal Interaction for Referring Image Segmentation
Recurrent Multimodal Interaction for Referring Image Segmentation
Chenxi Liu
Zhe Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Alan Yuille
EgoV
73
240
0
23 Mar 2017
Recurrent Topic-Transition GAN for Visual Paragraph Generation
Recurrent Topic-Transition GAN for Visual Paragraph Generation
Xiaodan Liang
Zhiting Hu
Huatian Zhang
Chuang Gan
Eric Xing
GAN
69
202
0
21 Mar 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement
  Learning
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Abhishek Das
Satwik Kottur
J. M. F. Moura
Stefan Lee
Dhruv Batra
OffRL
124
425
0
20 Mar 2017
TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial
  Network
TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network
Ayushman Dash
J. Gamboa
Sheraz Ahmed
Marcus Liwicki
Muhammad Zeshan Afzal
GAN
75
143
0
19 Mar 2017
Towards Diverse and Natural Image Descriptions via a Conditional GAN
Towards Diverse and Natural Image Descriptions via a Conditional GAN
Bo Dai
Sanja Fidler
R. Urtasun
Dahua Lin
GAN
82
454
0
17 Mar 2017
Learning Robust Visual-Semantic Embeddings
Learning Robust Visual-Semantic Embeddings
Yao-Hung Hubert Tsai
Liang-Kang Huang
Ruslan Salakhutdinov
SSLAI4TS
67
166
0
17 Mar 2017
End-to-end optimization of goal-driven and visually grounded dialogue
  systems
End-to-end optimization of goal-driven and visually grounded dialogue systems
Florian Strub
H. D. Vries
Jérémie Mary
Bilal Piot
Aaron Courville
Olivier Pietquin
OffRL
59
138
0
15 Mar 2017
Bilateral Multi-Perspective Matching for Natural Language Sentences
Bilateral Multi-Perspective Matching for Natural Language Sentences
Zhiguo Wang
Wael Hamza
Radu Florian
90
804
0
13 Feb 2017
Representations of language in a model of visually grounded speech
  signal
Representations of language in a model of visually grounded speech signal
Grzegorz Chrupała
Lieke Gelderloos
Afra Alishahi
78
131
0
07 Feb 2017
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
Iacer Calixto
Qun Liu
N. Campbell
133
183
0
04 Feb 2017
Image-Grounded Conversations: Multimodal Context for Natural Question
  and Response Generation
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation
N. Mostafazadeh
Chris Brockett
W. Dolan
Michel Galley
Jianfeng Gao
Georgios P. Spithourakis
Lucy Vanderwende
73
183
0
28 Jan 2017
Incorporating Global Visual Features into Attention-Based Neural Machine
  Translation
Incorporating Global Visual Features into Attention-Based Neural Machine Translation
Iacer Calixto
Qun Liu
Nick Campbell
115
156
0
23 Jan 2017
Comprehension-guided referring expressions
Comprehension-guided referring expressions
Ruotian Luo
Gregory Shakhnarovich
ObjD
97
171
0
12 Jan 2017
Attention-Based Multimodal Fusion for Video Description
Attention-Based Multimodal Fusion for Video Description
Chiori Hori
Takaaki Hori
Teng-Yok Lee
Kazuhiro Sumi
J. Hershey
Tim K. Marks
78
360
0
11 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
94
275
0
30 Dec 2016
YOLO9000: Better, Faster, Stronger
YOLO9000: Better, Faster, Stronger
Joseph Redmon
Ali Farhadi
VLMObjD
183
15,633
0
25 Dec 2016
An Empirical Study of Language CNN for Image Captioning
An Empirical Study of Language CNN for Image Captioning
Jiuxiang Gu
G. Wang
Jianfei Cai
Tsuhan Chen
74
133
0
21 Dec 2016
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
  Visual Reasoning
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
313
2,387
0
20 Dec 2016
StackGAN: Text to Photo-realistic Image Synthesis with Stacked
  Generative Adversarial Networks
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Han Zhang
Tao Xu
Hongsheng Li
Shaoting Zhang
Xiaogang Wang
Xiaolei Huang
Dimitris N. Metaxas
GAN
122
2,728
0
10 Dec 2016
Previous
123456
Next