ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.09522
  4. Cited By
Multimodal Research in Vision and Language: A Review of Current and
  Emerging Trends
v1v2 (latest)

Multimodal Research in Vision and Language: A Review of Current and Emerging Trends

19 October 2020
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
ArXiv (abs)PDFHTML

Papers citing "Multimodal Research in Vision and Language: A Review of Current and Emerging Trends"

50 / 180 papers shown
Title
Learning to Compose Dynamic Tree Structures for Visual Contexts
Learning to Compose Dynamic Tree Structures for Visual Contexts
Kaihua Tang
Hanwang Zhang
Baoyuan Wu
Wenhan Luo
Wen Liu
75
502
0
05 Dec 2018
Explainable and Explicit Visual Reasoning over Scene Graphs
Explainable and Explicit Visual Reasoning over Scene Graphs
Jiaxin Shi
Hanwang Zhang
Juan-Zi Li
OCL
196
234
0
05 Dec 2018
Multi-task Learning of Hierarchical Vision-Language Representation
Multi-task Learning of Hierarchical Vision-Language Representation
Duy-Kien Nguyen
Takayuki Okatani
88
52
0
03 Dec 2018
Modality-based Factorization for Multimodal Fusion
Modality-based Factorization for Multimodal Fusion
Elham J. Barezi
Peyman Momeni
Pascale Fung
101
36
0
30 Nov 2018
Unsupervised Multi-modal Neural Machine Translation
Unsupervised Multi-modal Neural Machine Translation
Yuanhang Su
Kai Fan
Nguyen Bach
C.-C. Jay Kuo
Fei Huang
122
63
0
28 Nov 2018
From Recognition to Cognition: Visual Commonsense Reasoning
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRMBDLOCLReLM
158
881
0
27 Nov 2018
Unsupervised Image Captioning
Unsupervised Image Captioning
Yang Feng
Lin Ma
Wei Liu
Jiebo Luo
VLMSSL
74
202
0
27 Nov 2018
Show, Control and Tell: A Framework for Generating Controllable and
  Grounded Captions
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
DiffM
70
175
0
26 Nov 2018
Words Can Shift: Dynamically Adjusting Word Representations Using
  Nonverbal Behaviors
Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors
Yansen Wang
Ying Shen
Zhun Liu
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
68
402
0
23 Nov 2018
Zero-Shot Transfer VQA Dataset
Zero-Shot Transfer VQA Dataset
Yuanpeng Li
Yi Yang
Jianyu Wang
Wei Xu
31
9
0
02 Nov 2018
A Corpus for Reasoning About Natural Language Grounded in Photographs
A Corpus for Reasoning About Natural Language Grounded in Photographs
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
103
604
0
01 Nov 2018
Engaging Image Captioning Via Personality
Engaging Image Captioning Via Personality
Kurt Shuster
Samuel Humeau
Hexiang Hu
Antoine Bordes
Jason Weston
79
152
0
25 Oct 2018
Interpretable Visual Question Answering by Reasoning on Dependency Trees
Interpretable Visual Question Answering by Reasoning on Dependency Trees
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
63
56
0
06 Sep 2018
A Visual Attention Grounding Neural Model for Multimodal Machine
  Translation
A Visual Attention Grounding Neural Model for Multimodal Machine Translation
Mingyang Zhou
Runxiang Cheng
Yong Jae Lee
Zhou Yu
84
79
0
24 Aug 2018
Explainable Neural Computation via Stack Neural Module Networks
Explainable Neural Computation via Stack Neural Module Networks
Ronghang Hu
Jacob Andreas
Trevor Darrell
Kate Saenko
LRMOCL
76
199
0
23 Jul 2018
"Factual" or "Emotional": Stylized Image Captioning with Adaptive
  Learning and Attention
"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention
Tianlang Chen
Zhongping Zhang
Quanzeng You
Chen Fang
Zhaowen Wang
Hailin Jin
Jiebo Luo
77
87
0
10 Jul 2018
Glow: Generative Flow with Invertible 1x1 Convolutions
Glow: Generative Flow with Invertible 1x1 Convolutions
Diederik P. Kingma
Prafulla Dhariwal
BDLDRL
295
3,134
0
09 Jul 2018
Learning to Evaluate Image Captioning
Learning to Evaluate Image Captioning
Huayu Chen
Guandao Yang
Andreas Veit
Xun Huang
Serge J. Belongie
68
148
0
17 Jun 2018
Multimodal Sentiment Analysis using Hierarchical Fusion with Context
  Modeling
Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling
Navonil Majumder
Devamanyu Hazarika
Alexander Gelbukh
Min Zhang
Soujanya Poria
59
323
0
16 Jun 2018
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech
Aditya Deshpande
J. Aneja
Liwei Wang
Alex Schwing
David A. Forsyth
68
148
0
31 May 2018
Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented
  Visual Dialog
Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog
Jiaping Zhang
Tiancheng Zhao
Zhou Yu
49
40
0
08 May 2018
To Create What You Tell: Generating Videos from Captions
To Create What You Tell: Generating Videos from Captions
Yingwei Pan
Zhaofan Qiu
Ting Yao
Houqiang Li
Tao Mei
GAN
80
154
0
23 Apr 2018
Attention U-Net: Learning Where to Look for the Pancreas
Attention U-Net: Learning Where to Look for the Pancreas
Ozan Oktay
Jo Schlemper
Loic Le Folgoc
M. J. Lee
M. Heinrich
...
Jingyu Sun
Nils Y. Hammerla
Bernhard Kainz
Ben Glocker
Daniel Rueckert
SSeg
159
5,049
0
11 Apr 2018
Image Generation from Scene Graphs
Image Generation from Scene Graphs
Justin Johnson
Agrim Gupta
Li Fei-Fei
GNN
300
820
0
04 Apr 2018
End-to-End Dense Video Captioning with Masked Transformer
End-to-End Dense Video Captioning with Masked Transformer
Luowei Zhou
Yingbo Zhou
Jason J. Corso
R. Socher
Caiming Xiong
92
529
0
03 Apr 2018
Bidirectional Attentive Fusion with Context Gating for Dense Video
  Captioning
Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
Jingwen Wang
Wenhao Jiang
Lin Ma
Wen Liu
Yong-mei Xu
75
206
0
31 Mar 2018
Two can play this Game: Visual Dialog with Discriminative Question
  Generation and Answering
Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering
Unnat Jain
Svetlana Lazebnik
Alex Schwing
MLLM
63
81
0
29 Mar 2018
Attributes as Operators: Factorizing Unseen Attribute-Object
  Compositions
Attributes as Operators: Factorizing Unseen Attribute-Object Compositions
Tushar Nagarajan
Kristen Grauman
OCLCoGe
53
56
0
27 Mar 2018
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement
  Learning for Planned-Ahead Vision-and-Language Navigation
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation
Xin Eric Wang
Wenhan Xiong
Hongmin Wang
William Yang Wang
71
200
0
21 Mar 2018
Multimodal Explanations: Justifying Decisions and Pointing to the
  Evidence
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
Dong Huk Park
Lisa Anne Hendricks
Zeynep Akata
Anna Rohrbach
Bernt Schiele
Trevor Darrell
Marcus Rohrbach
75
422
0
15 Feb 2018
Diagnose like a Radiologist: Attention Guided Convolutional Neural
  Network for Thorax Disease Classification
Diagnose like a Radiologist: Attention Guided Convolutional Neural Network for Thorax Disease Classification
Q. Guan
Yaping Huang
Zhun Zhong
Zhedong Zheng
Liang Zheng
Yi Yang
58
260
0
30 Jan 2018
Interpretable Counting for Visual Question Answering
Interpretable Counting for Visual Question Answering
Alexander R. Trott
Caiming Xiong
R. Socher
74
71
0
23 Dec 2017
StackGAN++: Realistic Image Synthesis with Stacked Generative
  Adversarial Networks
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
Han Zhang
Tao Xu
Hongsheng Li
Shaoting Zhang
Xiaogang Wang
Xiaolei Huang
Dimitris N. Metaxas
GAN
79
1,061
0
19 Oct 2017
Contrastive Learning for Image Captioning
Contrastive Learning for Image Captioning
Bo Dai
Dahua Lin
SSLVLM
79
194
0
06 Oct 2017
Automatic Spatially-aware Fashion Concept Discovery
Automatic Spatially-aware Fashion Concept Discovery
Xintong Han
Zuxuan Wu
Phoenix X. Huang
Xiao Zhang
Menglong Zhu
Yuan Li
Yang Zhao
L. Davis
73
270
0
03 Aug 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
121
4,216
0
25 Jul 2017
Tensor Fusion Network for Multimodal Sentiment Analysis
Tensor Fusion Network for Multimodal Sentiment Analysis
Amir Zadeh
Minghai Chen
Soujanya Poria
Min Zhang
Louis-Philippe Morency
82
1,236
0
23 Jul 2017
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
Jingkuan Song
Zhao Guo
Lianli Gao
Wu Liu
Dongxiang Zhang
Heng Tao Shen
67
166
0
05 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
101
2,932
0
26 May 2017
Show, Adapt and Tell: Adversarial Training of Cross-domain Image
  Captioner
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner
Tseng-Hung Chen
Yuan-Hong Liao
Ching-Yao Chuang
W. Hsu
Jianlong Fu
Min Sun
93
142
0
02 May 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement
  Learning
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Abhishek Das
Satwik Kottur
J. M. F. Moura
Stefan Lee
Dhruv Batra
OffRL
116
425
0
20 Mar 2017
Re-evaluating Automatic Metrics for Image Captioning
Re-evaluating Automatic Metrics for Image Captioning
Mert Kilickaya
Aykut Erdem
Nazli Ikizler-Cinbis
Erkut Erdem
54
181
0
22 Dec 2016
Self-critical Sequence Training for Image Captioning
Self-critical Sequence Training for Image Captioning
Steven J. Rennie
E. Marcheret
Youssef Mroueh
Jerret Ross
Vaibhava Goel
107
1,887
0
02 Dec 2016
Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive
  Architectures
Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures
Gaurav Mittal
Tanya Marwah
V. Balasubramanian
VGenDiffM
86
67
0
30 Nov 2016
Visual Dialog
Visual Dialog
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
142
998
0
26 Nov 2016
GRAM: Graph-based Attention Model for Healthcare Representation Learning
GRAM: Graph-based Attention Model for Healthcare Representation Learning
Edward Choi
M. T. Bahadori
Le Song
Walter F. Stewart
Jimeng Sun
GNN
97
677
0
21 Nov 2016
Multimodal Memory Modelling for Video Captioning
Multimodal Memory Modelling for Video Captioning
Junbo Wang
Wei Wang
Yan Huang
Liang Wang
Tieniu Tan
76
142
0
17 Nov 2016
Zero-Shot Visual Question Answering
Zero-Shot Visual Question Answering
Damien Teney
Anton Van Den Hengel
58
74
0
17 Nov 2016
Leveraging Video Descriptions to Learn Video Question Answering
Leveraging Video Descriptions to Learn Video Question Answering
Kuo-Hao Zeng
Tseng-Hung Chen
Ching-Yao Chuang
Yuan-Hong Liao
Juan Carlos Niebles
Min Sun
92
179
0
12 Nov 2016
Learning to Navigate in Complex Environments
Learning to Navigate in Complex Environments
Piotr Wojciech Mirowski
Razvan Pascanu
Fabio Viola
Hubert Soyer
Andy Ballard
...
Ross Goroshin
Laurent Sifre
Koray Kavukcuoglu
D. Kumaran
R. Hadsell
107
880
0
11 Nov 2016
Previous
1234
Next