ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.09358
  4. Cited By
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
v1v2v3 (latest)

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

22 July 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
    VLM
ArXiv (abs)PDFHTML

Papers citing "Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods"

50 / 294 papers shown
Title
End-to-End Audio Visual Scene-Aware Dialog using Multimodal
  Attention-Based Video Features
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features
Chiori Hori
Huda AlAmri
Jue Wang
Gordon Wichern
Takaaki Hori
...
Raphael Gontijo-Lopes
Abhishek Das
Irfan Essa
Dhruv Batra
Devi Parikh
VGen
66
125
0
21 Jun 2018
Grounded Textual Entailment
Grounded Textual Entailment
H. Vu
Claudio Greco
A. Erofeeva
Somayeh Jafaritazehjan
Guido M. Linders
Marc Tanti
A. Testoni
Raffaella Bernardi
Albert Gatt
72
29
0
14 Jun 2018
Speaker-Follower Models for Vision-and-Language Navigation
Speaker-Follower Models for Vision-and-Language Navigation
Daniel Fried
Ronghang Hu
Volkan Cirik
Anna Rohrbach
Jacob Andreas
Louis-Philippe Morency
Taylor Berg-Kirkpatrick
Kate Saenko
Dan Klein
Trevor Darrell
LM&RoLRM
317
504
0
07 Jun 2018
Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Nayyer Aafaq
Ajmal Mian
Wen Liu
Syed Zulqarnain Gilani
Mubarak Shah
70
92
0
01 Jun 2018
Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual
  Storytelling
Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual Storytelling
Chao-Chun Hsu
Szu-Min Chen
Ming-Hsun Hsieh
Lun-Wei Ku
DiffM
41
17
0
30 May 2018
Visual Referring Expression Recognition: What Do Systems Actually Learn?
Visual Referring Expression Recognition: What Do Systems Actually Learn?
Volkan Cirik
Louis-Philippe Morency
Taylor Berg-Kirkpatrick
73
63
0
30 May 2018
Using Syntax to Ground Referring Expressions in Natural Images
Using Syntax to Ground Referring Expressions in Natural Images
Volkan Cirik
Taylor Berg-Kirkpatrick
Louis-Philippe Morency
ObjDNAI
39
82
0
26 May 2018
Toward Abstractive Summarization Using Semantic Representations
Toward Abstractive Summarization Using Semantic Representations
Fei Liu
Jeffrey Flanigan
Sam Thomson
Norman M. Sadeh
Noah A. Smith
50
302
0
25 May 2018
Hierarchically Structured Reinforcement Learning for Topically Coherent
  Visual Story Generation
Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation
Qiuyuan Huang
Zhe Gan
Asli Celikyilmaz
D. Wu
Jianfeng Wang
Xiaodong He
BDL
73
92
0
21 May 2018
SemStyle: Learning to Generate Stylised Image Captions using Unaligned
  Text
SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text
A. Mathews
Lexing Xie
Xuming He
VLM
73
115
0
18 May 2018
No Metrics Are Perfect: Adversarial Reward Learning for Visual
  Storytelling
No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling
Xin Eric Wang
Wenhu Chen
Yuan-fang Wang
William Yang Wang
59
159
0
24 Apr 2018
Object Counts! Bringing Explicit Detections Back into Image Captioning
Object Counts! Bringing Explicit Detections Back into Image Captioning
Josiah Wang
Pranava Madhyastha
Lucia Specia
ObjD
43
37
0
23 Apr 2018
Phrase-Based & Neural Unsupervised Machine Translation
Phrase-Based & Neural Unsupervised Machine Translation
Guillaume Lample
Myle Ott
Alexis Conneau
Ludovic Denoyer
MarcÁurelio Ranzato
91
682
0
20 Apr 2018
Imagine This! Scripts to Compositions to Videos
Imagine This! Scripts to Compositions to Videos
Tanmay Gupta
Dustin Schwenk
Ali Farhadi
Derek Hoiem
Aniruddha Kembhavi
CoGeVGen
146
91
0
10 Apr 2018
Image Generation from Scene Graphs
Image Generation from Scene Graphs
Justin Johnson
Agrim Gupta
Li Fei-Fei
GNN
303
820
0
04 Apr 2018
End-to-End Dense Video Captioning with Masked Transformer
End-to-End Dense Video Captioning with Masked Transformer
Luowei Zhou
Yingbo Zhou
Jason J. Corso
R. Socher
Caiming Xiong
94
529
0
03 Apr 2018
Visual Question Reasoning on General Dependency Tree
Visual Question Reasoning on General Dependency Tree
Qingxing Cao
Xiaodan Liang
Bailin Li
Guanbin Li
Liang Lin
CoGe
59
37
0
31 Mar 2018
Reconstruction Network for Video Captioning
Reconstruction Network for Video Captioning
Bairui Wang
Lin Ma
Wei Zhang
Wen Liu
121
318
0
30 Mar 2018
Two can play this Game: Visual Dialog with Discriminative Question
  Generation and Answering
Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering
Unnat Jain
Svetlana Lazebnik
Alex Schwing
MLLM
63
81
0
29 Mar 2018
Neural Baby Talk
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
230
435
0
27 Mar 2018
Video Object Segmentation with Language Referring Expressions
Video Object Segmentation with Language Referring Expressions
Anna Khoreva
Anna Rohrbach
Bernt Schiele
VOS
74
196
0
21 Mar 2018
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement
  Learning for Planned-Ahead Vision-and-Language Navigation
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation
Xin Eric Wang
Wenhan Xiong
Hongmin Wang
William Yang Wang
76
201
0
21 Mar 2018
A Dataset and Architecture for Visual Reasoning with a Working Memory
A Dataset and Architecture for Visual Reasoning with a Working Memory
G. R. Yang
Igor Ganichev
Xiao-Jing Wang
Jonathon Shlens
David Sussillo
61
54
0
16 Mar 2018
Transparency by Design: Closing the Gap Between Performance and
  Interpretability in Visual Reasoning
Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning
David Mascharka
Philip Tran
Ryan Soklaski
Arjun Majumdar
116
207
0
14 Mar 2018
Compositional Attention Networks for Machine Reasoning
Compositional Attention Networks for Machine Reasoning
Drew A. Hudson
Christopher D. Manning
BDLOODLRM
193
577
0
08 Mar 2018
Annotation Artifacts in Natural Language Inference Data
Annotation Artifacts in Natural Language Inference Data
Suchin Gururangan
Swabha Swayamdipta
Omer Levy
Roy Schwartz
Samuel R. Bowman
Noah A. Smith
155
1,180
0
06 Mar 2018
Joint Event Detection and Description in Continuous Video Streams
Joint Event Detection and Description in Continuous Video Streams
Huijuan Xu
Boyang Albert Li
Vasili Ramanishka
Leonid Sigal
Kate Saenko
33
53
0
28 Feb 2018
Photographic Text-to-Image Synthesis with a Hierarchically-nested
  Adversarial Network
Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network
Zizhao Zhang
Yuanpu Xie
Ling Yang
EGVM
95
305
0
26 Feb 2018
Learning to Count Objects in Natural Images for Visual Question
  Answering
Learning to Count Objects in Natural Images for Visual Question Answering
Yan Zhang
Jonathon S. Hare
Adam Prugel-Bennett
OOD
68
207
0
15 Feb 2018
MAttNet: Modular Attention Network for Referring Expression
  Comprehension
MAttNet: Modular Attention Network for Referring Expression Comprehension
Licheng Yu
Zhe Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Joey Tianyi Zhou
Tamara L. Berg
ObjD
111
831
0
24 Jan 2018
Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
Seunghoon Hong
Dingdong Yang
Jongwook Choi
Honglak Lee
EGVM
110
337
0
16 Jan 2018
Visual Text Correction
Visual Text Correction
Amir Mazaheri
M. Shah
83
11
0
06 Jan 2018
VSE-ens: Visual-Semantic Embeddings with Efficient Negative Sampling
VSE-ens: Visual-Semantic Embeddings with Efficient Negative Sampling
G. Guo
Songlin Zhai
Fajie Yuan
Yuan Liu
Xingwei Wang
VLM
31
11
0
05 Jan 2018
Object Referring in Videos with Language and Human Gaze
Object Referring in Videos with Language and Human Gaze
A. Vasudevan
Dengxin Dai
Luc Van Gool
VOS
72
75
0
04 Jan 2018
IQA: Visual Question Answering in Interactive Environments
IQA: Visual Question Answering in Interactive Environments
Daniel Gordon
Aniruddha Kembhavi
Mohammad Rastegari
Joseph Redmon
Dieter Fox
Ali Farhadi
LM&Ro
93
391
0
09 Dec 2017
Broadcasting Convolutional Network for Visual Relational Reasoning
Broadcasting Convolutional Network for Visual Relational Reasoning
Simyung Chang
John Yang
Seonguk Park
Nojun Kwak
46
19
0
07 Dec 2017
Grounding Referring Expressions in Images by Variational Context
Grounding Referring Expressions in Images by Variational Context
Hanwang Zhang
Yulei Niu
Shih-Fu Chang
BDLObjD
64
222
0
05 Dec 2017
Learning by Asking Questions
Learning by Asking Questions
Ishan Misra
Ross B. Girshick
Rob Fergus
M. Hebert
Abhinav Gupta
Laurens van der Maaten
63
84
0
04 Dec 2017
Don't Just Assume; Look and Answer: Overcoming Priors for Visual
  Question Answering
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal
Dhruv Batra
Devi Parikh
Aniruddha Kembhavi
OOD
155
586
0
01 Dec 2017
Embodied Question Answering
Embodied Question Answering
Abhishek Das
Samyak Datta
Georgia Gkioxari
Stefan Lee
Devi Parikh
Dhruv Batra
LM&Ro
100
651
0
30 Nov 2017
Video Captioning via Hierarchical Reinforcement Learning
Video Captioning via Hierarchical Reinforcement Learning
Xin Eric Wang
Wenhu Chen
Jiawei Wu
Yuan-fang Wang
William Yang Wang
88
229
0
29 Nov 2017
AttnGAN: Fine-Grained Text to Image Generation with Attentional
  Generative Adversarial Networks
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
Tao Xu
Pengchuan Zhang
Qiuyuan Huang
Han Zhang
Zhe Gan
Xiaolei Huang
Xiaodong He
GANViT
115
1,722
0
28 Nov 2017
Convolutional Image Captioning
Convolutional Image Captioning
J. Aneja
Aditya Deshpande
Alex Schwing
VLM
135
361
0
24 Nov 2017
Are You Talking to Me? Reasoned Visual Dialog Generation through
  Adversarial Learning
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
Qi Wu
Peng Wang
Chunhua Shen
Ian Reid
Anton Van Den Hengel
GAN
62
129
0
21 Nov 2017
Vision-and-Language Navigation: Interpreting visually-grounded
  navigation instructions in real environments
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Peter Anderson
Qi Wu
Damien Teney
Jake Bruce
Mark Johnson
Niko Sünderhauf
Ian Reid
Stephen Gould
Anton Van Den Hengel
LM&Ro
106
1,322
0
20 Nov 2017
Parallel Attention: A Unified Framework for Visual Object Discovery
  through Dialogs and Queries
Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries
Bohan Zhuang
Qi Wu
Chunhua Shen
Ian Reid
Anton Van Den Hengel
ObjD
63
134
0
17 Nov 2017
Evaluation of Automatic Video Captioning Using Direct Assessment
Evaluation of Automatic Video Captioning Using Direct Assessment
Yvette Graham
G. Awad
Alan F. Smeaton
45
30
0
29 Oct 2017
StackGAN++: Realistic Image Synthesis with Stacked Generative
  Adversarial Networks
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
Han Zhang
Tao Xu
Hongsheng Li
Shaoting Zhang
Xiaogang Wang
Xiaolei Huang
Dimitris N. Metaxas
GAN
99
1,062
0
19 Oct 2017
Findings of the Second Shared Task on Multimodal Machine Translation and
  Multilingual Image Description
Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description
Desmond Elliott
Stella Frank
Loïc Barrault
Fethi Bougares
Lucia Specia
VLM
79
220
0
19 Oct 2017
Contrastive Learning for Image Captioning
Contrastive Learning for Image Captioning
Bo Dai
Dahua Lin
SSLVLM
79
194
0
06 Oct 2017
Previous
123456
Next