ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.09522
  4. Cited By
Multimodal Research in Vision and Language: A Review of Current and
  Emerging Trends
v1v2 (latest)

Multimodal Research in Vision and Language: A Review of Current and Emerging Trends

19 October 2020
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
ArXiv (abs)PDFHTML

Papers citing "Multimodal Research in Vision and Language: A Review of Current and Emerging Trends"

50 / 180 papers shown
Title
Towards Explainable Artificial Intelligence
Towards Explainable Artificial Intelligence
Wojciech Samek
K. Müller
XAI
75
442
0
26 Sep 2019
Compact Trilinear Interaction for Visual Question Answering
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
57
59
0
26 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLMVLM
352
941
0
24 Sep 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event
  Captioning
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Tanzila Rahman
Bicheng Xu
Leonid Sigal
66
80
0
22 Sep 2019
Adaptively Aligned Image Captioning via Adaptive Attention Time
Adaptively Aligned Image Captioning via Adaptive Attention Time
Lun Huang
Wenmin Wang
Yaxian Xia
Jie Chen
41
62
0
19 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
70
304
0
12 Sep 2019
Hierarchy Parsing for Image Captioning
Hierarchy Parsing for Image Captioning
Ting Yao
Yingwei Pan
Yehao Li
Tao Mei
VLM
61
165
0
09 Sep 2019
Supervised Multimodal Bitransformers for Classifying Images and Text
Supervised Multimodal Bitransformers for Classifying Images and Text
Douwe Kiela
Suvrat Bhooshan
Hamed Firooz
Ethan Perez
Davide Testuggine
140
247
0
06 Sep 2019
TIGEr: Text-to-Image Grounding for Image Caption Evaluation
TIGEr: Text-to-Image Grounding for Image Caption Evaluation
Ming Jiang
Qiuyuan Huang
Lei Zhang
Xin Eric Wang
Pengchuan Zhang
Zhe Gan
Jana Diesner
Jianfeng Gao
88
67
0
04 Sep 2019
Reflective Decoding Network for Image Captioning
Reflective Decoding Network for Image Captioning
Lei Ke
Wenjie Pei
Ruiyu Li
Xiaoyong Shen
Yu-Wing Tai
ObjD
44
93
0
30 Aug 2019
Controllable Video Captioning with POS Sequence Guidance Based on Gated
  Fusion Network
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
105
163
0
27 Aug 2019
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Iro Laina
Christian Rupprecht
Nassir Navab
SSL
58
103
0
25 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLMMLLMSSL
158
1,666
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLMMLLM
247
2,483
0
20 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSLVLMMLLM
202
905
0
16 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
62
173
0
14 Aug 2019
Why Does a Visual Question Have Different Answers?
Why Does a Visual Question Have Different Answers?
Nilavra Bhattacharya
Qing Li
Danna Gurari
50
65
0
12 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language
  Interactions
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
76
38
0
12 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
141
1,955
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSLVLM
231
3,684
0
06 Aug 2019
Embodied Vision-and-Language Navigation with Dynamic Convolutional
  Filters
Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
Federico Landi
Lorenzo Baraldi
M. Corsini
Rita Cucchiara
LM&Ro
75
26
0
05 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
87
806
0
25 Jun 2019
RUBi: Reducing Unimodal Biases in Visual Question Answering
RUBi: Reducing Unimodal Biases in Visual Question Answering
Rémi Cadène
Corentin Dancette
H. Ben-younes
Matthieu Cord
Devi Parikh
CML
96
373
0
24 Jun 2019
Distilling Translations with Visual Awareness
Distilling Translations with Visual Awareness
Julia Ive
Pranava Madhyastha
Lucia Specia
VLM
139
76
0
18 Jun 2019
Image Captioning: Transforming Objects into Words
Image Captioning: Transforming Objects into Words
Simão Herdade
Armin Kappeler
K. Boakye
Joao Soares
ViT
116
470
0
14 Jun 2019
Multi-scale self-guided attention for medical image segmentation
Multi-scale self-guided attention for medical image segmentation
Ashish Sinha
Jose Dolz
SSeg
66
417
0
07 Jun 2019
Scene Text Visual Question Answering
Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
103
356
0
31 May 2019
Attention Is (not) All You Need for Commonsense Reasoning
Attention Is (not) All You Need for Commonsense Reasoning
T. Klein
Moin Nabi
LRM
61
37
0
31 May 2019
Self-Critical Reasoning for Robust Visual Question Answering
Self-Critical Reasoning for Robust Visual Question Answering
Jialin Wu
Raymond J. Mooney
OODNAI
71
161
0
24 May 2019
Multi-task Learning for Multi-modal Emotion Recognition and Sentiment
  Analysis
Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
Md. Shad Akhtar
Dushyant Singh Chauhan
Deepanway Ghosal
Soujanya Poria
Asif Ekbal
P. Bhattacharyya
83
167
0
14 May 2019
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor
  Environments
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
Yuankai Qi
Qi Wu
Peter Anderson
Xinze Wang
Wenjie Wang
Chunhua Shen
Anton Van Den Hengel
LM&Ro
91
324
0
23 Apr 2019
UR-FUNNY: A Multimodal Language Dataset for Understanding Humor
UR-FUNNY: A Multimodal Language Dataset for Understanding Humor
M. Hasan
Wasifur Rahman
Amir Zadeh
Jianyuan Zhong
Md. Iftekhar Tanveer
Louis-Philippe Morency
Mohammed
Ehsan Hoque
69
185
0
14 Apr 2019
Factor Graph Attention
Factor Graph Attention
Idan Schwartz
Seunghak Yu
Tamir Hazan
Alex Schwing
75
110
0
11 Apr 2019
Learning to Navigate Unseen Environments: Back Translation with
  Environmental Dropout
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
Hao Tan
Licheng Yu
Joey Tianyi Zhou
SSL
88
318
0
08 Apr 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for
  Video-and-Language Research
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
101
551
0
06 Apr 2019
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image
  Synthesis
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis
Minfeng Zhu
Pingbo Pan
Wei Chen
Yi Yang
GAN
54
582
0
02 Apr 2019
Multimodal Machine Translation with Embedding Prediction
Multimodal Machine Translation with Embedding Prediction
Tosho Hirasawa
Hayahide Yamagishi
Yukio Matsumura
Mamoru Komachi
30
16
0
01 Apr 2019
Information Maximizing Visual Question Generation
Information Maximizing Visual Question Generation
Ranjay Krishna
Michael S. Bernstein
Li Fei-Fei
94
95
0
27 Mar 2019
Unpaired Image Captioning via Scene Graph Alignments
Unpaired Image Captioning via Scene Graph Alignments
Jiuxiang Gu
Shafiq Joty
Jianfei Cai
Handong Zhao
Xu Yang
G. Wang
GNN
64
174
0
26 Mar 2019
Dense Relational Captioning: Triple-Stream Networks for
  Relationship-Based Captioning
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning
Dong-Jin Kim
Jinsoo Choi
Tae-Hyun Oh
In So Kweon
54
84
0
14 Mar 2019
MirrorGAN: Learning Text-to-image Generation by Redescription
MirrorGAN: Learning Text-to-image Generation by Redescription
Tingting Qiao
Jing Zhang
Duanqing Xu
Dacheng Tao
VLMGAN
61
541
0
14 Mar 2019
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language
  Navigation
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation
Liyiming Ke
Xiujun Li
Yonatan Bisk
Ari Holtzman
Zhe Gan
Jingjing Liu
Jianfeng Gao
Yejin Choi
S. Srinivasa
86
168
0
06 Mar 2019
The Regretful Agent: Heuristic-Aided Navigation through Progress
  Estimation
The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation
Chih-Yao Ma
Zuxuan Wu
G. Al-Regib
Caiming Xiong
Z. Kira
LM&Ro
85
174
0
05 Mar 2019
Object-driven Text-to-Image Synthesis via Adversarial Training
Object-driven Text-to-Image Synthesis via Adversarial Training
Wenbo Li
Pengchuan Zhang
Lei Zhang
Qiuyuan Huang
Xiaodong He
Siwei Lyu
Jianfeng Gao
GAN
71
302
0
27 Feb 2019
Dual Attention Networks for Visual Reference Resolution in Visual Dialog
Dual Attention Networks for Visual Reference Resolution in Visual Dialog
Gi-Cheon Kang
Jaeseo Lim
Byoung-Tak Zhang
41
73
0
25 Feb 2019
Composing Text and Image for Image Retrieval - An Empirical Odyssey
Composing Text and Image for Image Retrieval - An Empirical Odyssey
Nam S. Vo
Lu Jiang
Chen Sun
Kevin Patrick Murphy
Li Li
Li Fei-Fei
James Hays
CoGe
54
368
0
18 Dec 2018
Recent Advances in Autoencoder-Based Representation Learning
Recent Advances in Autoencoder-Based Representation Learning
Michael Tschannen
Olivier Bachem
Mario Lucic
OODSSLDRL
69
445
0
12 Dec 2018
Recursive Visual Attention in Visual Dialog
Recursive Visual Attention in Visual Dialog
Yulei Niu
Hanwang Zhang
Manli Zhang
Jianhong Zhang
Zhiwu Lu
Ji-Rong Wen
83
119
0
06 Dec 2018
Auto-Encoding Scene Graphs for Image Captioning
Auto-Encoding Scene Graphs for Image Captioning
Xu Yang
Kaihua Tang
Hanwang Zhang
Jianfei Cai
156
699
0
06 Dec 2018
Counterfactual Critic Multi-Agent Training for Scene Graph Generation
Counterfactual Critic Multi-Agent Training for Scene Graph Generation
Long Chen
Hanwang Zhang
Jun Xiao
Xiangnan He
Shiliang Pu
Shih-Fu Chang
89
159
0
06 Dec 2018
Previous
1234
Next