Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.09522
Cited By
v1
v2 (latest)
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
19 October 2020
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Multimodal Research in Vision and Language: A Review of Current and Emerging Trends"
50 / 180 papers shown
Title
Towards Explainable Artificial Intelligence
Wojciech Samek
K. Müller
XAI
75
442
0
26 Sep 2019
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
57
59
0
26 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
352
941
0
24 Sep 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Tanzila Rahman
Bicheng Xu
Leonid Sigal
66
80
0
22 Sep 2019
Adaptively Aligned Image Captioning via Adaptive Attention Time
Lun Huang
Wenmin Wang
Yaxian Xia
Jie Chen
41
62
0
19 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
70
304
0
12 Sep 2019
Hierarchy Parsing for Image Captioning
Ting Yao
Yingwei Pan
Yehao Li
Tao Mei
VLM
61
165
0
09 Sep 2019
Supervised Multimodal Bitransformers for Classifying Images and Text
Douwe Kiela
Suvrat Bhooshan
Hamed Firooz
Ethan Perez
Davide Testuggine
140
247
0
06 Sep 2019
TIGEr: Text-to-Image Grounding for Image Caption Evaluation
Ming Jiang
Qiuyuan Huang
Lei Zhang
Xin Eric Wang
Pengchuan Zhang
Zhe Gan
Jana Diesner
Jianfeng Gao
88
67
0
04 Sep 2019
Reflective Decoding Network for Image Captioning
Lei Ke
Wenjie Pei
Ruiyu Li
Xiaoyong Shen
Yu-Wing Tai
ObjD
44
93
0
30 Aug 2019
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
105
163
0
27 Aug 2019
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Iro Laina
Christian Rupprecht
Nassir Navab
SSL
58
103
0
25 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
158
1,666
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
247
2,483
0
20 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
202
905
0
16 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
62
173
0
14 Aug 2019
Why Does a Visual Question Have Different Answers?
Nilavra Bhattacharya
Qing Li
Danna Gurari
50
65
0
12 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
76
38
0
12 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
141
1,955
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
231
3,684
0
06 Aug 2019
Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
Federico Landi
Lorenzo Baraldi
M. Corsini
Rita Cucchiara
LM&Ro
75
26
0
05 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
87
806
0
25 Jun 2019
RUBi: Reducing Unimodal Biases in Visual Question Answering
Rémi Cadène
Corentin Dancette
H. Ben-younes
Matthieu Cord
Devi Parikh
CML
96
373
0
24 Jun 2019
Distilling Translations with Visual Awareness
Julia Ive
Pranava Madhyastha
Lucia Specia
VLM
139
76
0
18 Jun 2019
Image Captioning: Transforming Objects into Words
Simão Herdade
Armin Kappeler
K. Boakye
Joao Soares
ViT
116
470
0
14 Jun 2019
Multi-scale self-guided attention for medical image segmentation
Ashish Sinha
Jose Dolz
SSeg
66
417
0
07 Jun 2019
Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
103
356
0
31 May 2019
Attention Is (not) All You Need for Commonsense Reasoning
T. Klein
Moin Nabi
LRM
61
37
0
31 May 2019
Self-Critical Reasoning for Robust Visual Question Answering
Jialin Wu
Raymond J. Mooney
OOD
NAI
71
161
0
24 May 2019
Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
Md. Shad Akhtar
Dushyant Singh Chauhan
Deepanway Ghosal
Soujanya Poria
Asif Ekbal
P. Bhattacharyya
83
167
0
14 May 2019
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
Yuankai Qi
Qi Wu
Peter Anderson
Xinze Wang
Wenjie Wang
Chunhua Shen
Anton Van Den Hengel
LM&Ro
91
324
0
23 Apr 2019
UR-FUNNY: A Multimodal Language Dataset for Understanding Humor
M. Hasan
Wasifur Rahman
Amir Zadeh
Jianyuan Zhong
Md. Iftekhar Tanveer
Louis-Philippe Morency
Mohammed
Ehsan Hoque
69
185
0
14 Apr 2019
Factor Graph Attention
Idan Schwartz
Seunghak Yu
Tamir Hazan
Alex Schwing
75
110
0
11 Apr 2019
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
Hao Tan
Licheng Yu
Joey Tianyi Zhou
SSL
88
318
0
08 Apr 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
101
551
0
06 Apr 2019
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis
Minfeng Zhu
Pingbo Pan
Wei Chen
Yi Yang
GAN
54
582
0
02 Apr 2019
Multimodal Machine Translation with Embedding Prediction
Tosho Hirasawa
Hayahide Yamagishi
Yukio Matsumura
Mamoru Komachi
30
16
0
01 Apr 2019
Information Maximizing Visual Question Generation
Ranjay Krishna
Michael S. Bernstein
Li Fei-Fei
94
95
0
27 Mar 2019
Unpaired Image Captioning via Scene Graph Alignments
Jiuxiang Gu
Shafiq Joty
Jianfei Cai
Handong Zhao
Xu Yang
G. Wang
GNN
64
174
0
26 Mar 2019
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning
Dong-Jin Kim
Jinsoo Choi
Tae-Hyun Oh
In So Kweon
54
84
0
14 Mar 2019
MirrorGAN: Learning Text-to-image Generation by Redescription
Tingting Qiao
Jing Zhang
Duanqing Xu
Dacheng Tao
VLM
GAN
61
541
0
14 Mar 2019
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation
Liyiming Ke
Xiujun Li
Yonatan Bisk
Ari Holtzman
Zhe Gan
Jingjing Liu
Jianfeng Gao
Yejin Choi
S. Srinivasa
86
168
0
06 Mar 2019
The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation
Chih-Yao Ma
Zuxuan Wu
G. Al-Regib
Caiming Xiong
Z. Kira
LM&Ro
85
174
0
05 Mar 2019
Object-driven Text-to-Image Synthesis via Adversarial Training
Wenbo Li
Pengchuan Zhang
Lei Zhang
Qiuyuan Huang
Xiaodong He
Siwei Lyu
Jianfeng Gao
GAN
71
302
0
27 Feb 2019
Dual Attention Networks for Visual Reference Resolution in Visual Dialog
Gi-Cheon Kang
Jaeseo Lim
Byoung-Tak Zhang
41
73
0
25 Feb 2019
Composing Text and Image for Image Retrieval - An Empirical Odyssey
Nam S. Vo
Lu Jiang
Chen Sun
Kevin Patrick Murphy
Li Li
Li Fei-Fei
James Hays
CoGe
54
368
0
18 Dec 2018
Recent Advances in Autoencoder-Based Representation Learning
Michael Tschannen
Olivier Bachem
Mario Lucic
OOD
SSL
DRL
69
445
0
12 Dec 2018
Recursive Visual Attention in Visual Dialog
Yulei Niu
Hanwang Zhang
Manli Zhang
Jianhong Zhang
Zhiwu Lu
Ji-Rong Wen
83
119
0
06 Dec 2018
Auto-Encoding Scene Graphs for Image Captioning
Xu Yang
Kaihua Tang
Hanwang Zhang
Jianfei Cai
156
699
0
06 Dec 2018
Counterfactual Critic Multi-Agent Training for Scene Graph Generation
Long Chen
Hanwang Zhang
Jun Xiao
Xiangnan He
Shiliang Pu
Shih-Fu Chang
89
159
0
06 Dec 2018
Previous
1
2
3
4
Next