Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.00775
Cited By
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
3 April 2018
Duy-Kien Nguyen
Takayuki Okatani
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering"
45 / 45 papers shown
Title
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
34
0
0
31 May 2023
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification
Souhail Bakkali
Zuheng Ming
Mickael Coustaty
Marçal Rusiñol
10
6
0
11 May 2023
Co-attention Propagation Network for Zero-Shot Video Object Segmentation
Gensheng Pei
Yazhou Yao
Fumin Shen
Daniel Huang
Xing-Rui Huang
Hengtao Shen
VOS
38
12
0
08 Apr 2023
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
26
4
0
16 Dec 2022
AlignVE: Visual Entailment Recognition Based on Alignment Relations
Biwei Cao
Jiuxin Cao
Jie Gui
Jiayun Shen
Bo Liu
Lei He
Yuan Yan Tang
James T. Kwok
26
7
0
16 Nov 2022
Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive Networks
Xiaofeng Lei
Shaohua Li
Xinxing Xu
Huazhu Fu
Yong Liu
...
Mingrui Tan
Yanyu Xu
Jocelyn Hui Lin Goh
Rick Siow Mong Goh
Ching-Yu Cheng
23
1
0
25 Sep 2022
Changer: Feature Interaction is What You Need for Change Detection
Sheng Fang
Kaiyu Li
Zhe Li
45
170
0
17 Sep 2022
MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning
Shangfei Zheng
Weiqing Wang
Jianfeng Qu
Hongzhi Yin
Wei Chen
Lei Zhao
LRM
21
22
0
03 Sep 2022
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
43
68
0
02 Jun 2022
Co-VQA : Answering by Interactive Sub Question Sequence
Ruonan Wang
Yuxi Qian
Fangxiang Feng
Xiaojie Wang
Huixing Jiang
LRM
29
16
0
02 Apr 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Jiabo Ye
Junfeng Tian
Ming Yan
Xiaoshan Yang
Xuwu Wang
Ji Zhang
Liang He
Xin Lin
ObjD
13
61
0
29 Mar 2022
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
Yang Ding
Jing Yu
Bangchang Liu
Yue Hu
Mingxin Cui
Qi Wu
13
62
0
17 Mar 2022
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Jianjian Cao
Xiameng Qin
Sanyuan Zhao
Jianbing Shen
31
20
0
14 Dec 2021
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
48
2
0
11 Dec 2021
MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants
Alkesh Patel
Joel Ruben Antony Moniz
R. Nguyen
Nicholas Tzou
Hadas Kotek
Vincent Renkens
VGen
21
1
0
13 Oct 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
40
19
0
27 Sep 2021
Auto-Parsing Network for Image Captioning and Visual Question Answering
Xu Yang
Chongyang Gao
Hanwang Zhang
Jianfei Cai
24
35
0
24 Aug 2021
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
Jianyu Wang
Bingkun Bao
Changsheng Xu
19
75
0
10 Jul 2021
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
21
16
0
23 Mar 2021
Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
EgoV
27
15
0
16 Feb 2021
End-to-End Object Detection with Adaptive Clustering Transformer
Minghang Zheng
Peng Gao
Renrui Zhang
Kunchang Li
Xiaogang Wang
Hongsheng Li
Hao Dong
ViT
41
193
0
18 Nov 2020
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
13
42
0
04 Nov 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei Chen
Weiping Wang
Li Liu
M. Lew
VLM
118
31
0
16 Oct 2020
Multi-Task Learning with Deep Neural Networks: A Survey
M. Crawshaw
CVBM
53
609
0
10 Sep 2020
DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act Recognition and Sentiment Classification
Libo Qin
Wanxiang Che
Yangming Li
Minheng Ni
Ting Liu
24
93
0
16 Aug 2020
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
33
59
0
14 Aug 2020
REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering
Siwen Luo
S. Han
Kaiyuan Sun
Josiah Poon
CoGe
LRM
ReLM
26
4
0
27 Jul 2020
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
K. Gouthaman
Anurag Mittal
50
78
0
13 Jul 2020
History for Visual Dialog: Do we really need it?
Shubham Agarwal
Trung Bui
Joon-Young Lee
Ioannis Konstas
Verena Rieser
VLM
19
69
0
08 May 2020
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
Duy-Kien Nguyen
Vedanuj Goswami
Xinlei Chen
39
23
0
24 Apr 2020
CQ-VQA: Visual Question Answering on Categorized Questions
Aakansha Mishra
A. Anand
Prithwijit Guha
33
6
0
17 Feb 2020
See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks
Xiankai Lu
Wenguan Wang
Chao Ma
Jianbing Shen
Ling Shao
Fatih Porikli
VOS
19
462
0
19 Jan 2020
A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects
A. Magassouba
K. Sugiura
Hisashi Kawai
38
9
0
23 Dec 2019
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
36
59
0
26 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
25
299
0
12 Sep 2019
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
27
38
0
12 Aug 2019
Multi-modality Latent Interaction Network for Visual Question Answering
Peng Gao
Haoxuan You
Zhanpeng Zhang
Xiaogang Wang
Hongsheng Li
25
82
0
10 Aug 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
36
797
0
25 Jun 2019
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
27
377
0
20 May 2019
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations
Fenglin Liu
Yuanxin Liu
Xuancheng Ren
Xiaodong He
Xu Sun
VLM
34
81
0
15 May 2019
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing
Xihui Liu
Zihao Wang
Jing Shao
Xiaogang Wang
Hongsheng Li
ObjD
19
180
0
03 Mar 2019
Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha
Kushal Kafle
Christopher Kanan
25
82
0
01 Mar 2019
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering
Peng Gao
Zhengkai Jiang
Haoxuan You
Pan Lu
Steven C. H. Hoi
Xiaogang Wang
Hongsheng Li
AIMat
24
363
0
13 Dec 2018
Multi-task Learning of Hierarchical Vision-Language Representation
Duy-Kien Nguyen
Takayuki Okatani
25
51
0
03 Dec 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
167
1,465
0
06 Jun 2016
1