Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.10770
Cited By
Deep Modular Co-Attention Networks for Visual Question Answering
25 June 2019
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep Modular Co-Attention Networks for Visual Question Answering"
19 / 119 papers shown
Title
End-to-End Object Detection with Adaptive Clustering Transformer
Minghang Zheng
Peng Gao
Renrui Zhang
Kunchang Li
Xiaogang Wang
Hongsheng Li
Hao Dong
ViT
22
193
0
18 Nov 2020
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
13
42
0
04 Nov 2020
Bayesian Attention Modules
Xinjie Fan
Shujian Zhang
Bo Chen
Mingyuan Zhou
117
59
0
20 Oct 2020
Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition
Jean-Benoit Delbrouck
Noé Tits
Stéphane Dupont
33
20
0
05 Oct 2020
AiR: Attention with Reasoning Capability
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
13
36
0
28 Jul 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
35
488
0
11 Jun 2020
Estimating semantic structure for the VQA answer space
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
23
4
0
10 Jun 2020
History for Visual Dialog: Do we really need it?
Shubham Agarwal
Trung Bui
Joon-Young Lee
Ioannis Konstas
Verena Rieser
VLM
19
69
0
08 May 2020
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
Duy-Kien Nguyen
Vedanuj Goswami
Xinlei Chen
36
23
0
24 Apr 2020
Visual Question Answering for Cultural Heritage
P. Bongini
Federico Becattini
Andrew D. Bagdanov
A. Bimbo
190
22
0
22 Mar 2020
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
120
189
0
19 Mar 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
M. Farazi
Salman H. Khan
Nick Barnes
23
17
0
20 Jan 2020
A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects
A. Magassouba
K. Sugiura
Hisashi Kawai
38
9
0
23 Dec 2019
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
21
15
0
06 Dec 2019
Modulated Self-attention Convolutional Network for VQA
Jean-Benoit Delbrouck
Antoine Maiorca
Nathan Hubens
Stéphane Dupont
18
1
0
08 Oct 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
75
2,450
0
20 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
27
38
0
12 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
35
1,913
0
09 Aug 2019
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
158
1,464
0
06 Jun 2016
Previous
1
2
3