Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.01847
Cited By
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
6 June 2016
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding"
50 / 225 papers shown
Title
VU-BERT: A Unified framework for Visual Dialog
Tong Ye
Shijing Si
Jianzong Wang
Rui Wang
Ning Cheng
Jing Xiao
MLLM
35
5
0
22 Feb 2022
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
16
89
0
31 Jan 2022
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Golnaz Ghiasi
Xiuye Gu
Huayu Chen
Nayeon Lee
VLM
35
371
0
22 Dec 2021
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Jianjian Cao
Xiameng Qin
Sanyuan Zhao
Jianbing Shen
31
20
0
14 Dec 2021
Quality-Aware Multimodal Biometric Recognition
Sobhan Soleymani
Ali Dabouei
Fariborz Taherkhani
Seyed Mehdi Iranmanesh
J. Dawson
Nasser M. Nasrabadi
CVBM
27
3
0
10 Dec 2021
Consensus Graph Representation Learning for Better Grounded Image Captioning
Wenqiao Zhang
Haochen Shi
Siliang Tang
Jun Xiao
Qiang Yu
Yueting Zhuang
15
54
0
02 Dec 2021
Relational Graph Learning for Grounded Video Description Generation
Wenqiao Zhang
Qing Guo
Siliang Tang
Haizhou Shi
Haochen Shi
Jun Xiao
Yueting Zhuang
Luu Anh Tuan
19
33
0
02 Dec 2021
Achieving Human Parity on Visual Question Answering
Ming Yan
Haiyang Xu
Chenliang Li
Junfeng Tian
Bin Bi
...
Ji Zhang
Songfang Huang
Fei Huang
Luo Si
Rong Jin
26
12
0
17 Nov 2021
Temporal-attentive Covariance Pooling Networks for Video Recognition
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
21
24
0
27 Oct 2021
Single-Modal Entropy based Active Learning for Visual Question Answering
Dong-Jin Kim
Jae-Won Cho
Jinsoo Choi
Yunjae Jung
In So Kweon
25
12
0
21 Oct 2021
DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning
Yizhi Wang
Zheng Lian
3DV
32
20
0
13 Oct 2021
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Yulei Niu
Hanwang Zhang
Jun Xiao
AAML
OOD
15
36
0
03 Oct 2021
Multimodality in Meta-Learning: A Comprehensive Survey
Yao Ma
Shilin Zhao
Weixiao Wang
Yaoman Li
Irwin King
50
53
0
28 Sep 2021
How to find a good image-text embedding for remote sensing visual question answering?
Christel Chappuis
Sylvain Lobry
B. Kellenberger
Bertrand Le Saux
D. Tuia
40
20
0
24 Sep 2021
Towards Joint Intent Detection and Slot Filling via Higher-order Attention
Dongsheng Chen
Zhiqi Huang
Xian Wu
Shen Ge
Yuexian Zou
29
20
0
18 Sep 2021
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
Jiawei Chen
C. Ho
ViT
26
77
0
20 Aug 2021
Identifying Illicit Drug Dealers on Instagram with Large-scale Multimodal Data Fusion
Chuanbo Hu
Minglei Yin
Bing Liu
Xin Li
Yanfang Ye
22
9
0
18 Aug 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
199
405
0
13 Jul 2021
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
Jianyu Wang
Bingkun Bao
Changsheng Xu
19
75
0
10 Jul 2021
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich
F. Putze
Tanja Schultz
27
2
0
28 Jun 2021
Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions
Daniel Rosenberg
Itai Gat
Amir Feder
Roi Reichart
AAML
34
16
0
08 Jun 2021
Multiple Meta-model Quantifying for Medical Visual Question Answering
Tuong Khanh Long Do
Binh X. Nguyen
Erman Tjiputra
Minh-Ngoc Tran
Quang-Dieu Tran
A. Nguyen
38
98
0
19 May 2021
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
29
139
0
17 May 2021
gComm: An environment for investigating generalization in Grounded Language Acquisition
Rishi Hazra
Sonu Dixit
23
0
0
09 May 2021
e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
Maxime Kayser
Oana-Maria Camburu
Leonard Salewski
Cornelius Emde
Virginie Do
Zeynep Akata
Thomas Lukasiewicz
VLM
26
100
0
08 May 2021
A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations
Varun Nagaraj Rao
Xingjian Zhen
K. Hovsepian
Mingwei Shen
29
17
0
29 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
S. Hoi
MLLM
23
19
0
16 Apr 2021
"Subverting the Jewtocracy": Online Antisemitism Detection Using Multimodal Deep Learning
Mohit Chandra
D. Pailla
Himanshu Bhatia
AadilMehdi J. Sanchawala
Manish Gupta
Manish Shrivastava
Ponnurangam Kumaraguru
11
38
0
13 Apr 2021
Biomedical Question Answering: A Survey of Approaches and Challenges
Qiao Jin
Zheng Yuan
Guangzhi Xiong
Qian Yu
Huaiyuan Ying
Chuanqi Tan
Mosha Chen
Songfang Huang
Xiaozhong Liu
Sheng Yu
23
95
0
10 Feb 2021
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Y. Liu
Yangyang Guo
Jianhua Yin
Xuemeng Song
Weifeng Liu
Liqiang Nie
29
28
0
03 Feb 2021
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
Antoni Rosinol
Andrew Violette
Marcus Abate
Nathan Hughes
Yun Chang
J. Shi
Arjun Gupta
Luca Carlone
3DV
36
220
0
18 Jan 2021
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao
Bailin Li
Xiaodan Liang
Keze Wang
Liang Lin
44
36
0
14 Dec 2020
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
21
66
0
10 Dec 2020
Intrinsically Motivated Compositional Language Emergence
Rishi Hazra
Sonu Dixit
Sayambhu Sen
11
1
0
09 Dec 2020
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
13
42
0
04 Nov 2020
Where to Look and How to Describe: Fashion Image Retrieval with an Attentional Heterogeneous Bilinear Network
Haibo Su
Peng Wang
Lingqiao Liu
Hui Li
Zhuguo Li
Yanning Zhang
27
27
0
26 Oct 2020
Combination of Deep Speaker Embeddings for Diarisation
Guangzhi Sun
Chao Zhang
P. Woodland
17
20
0
22 Oct 2020
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
Itai Gat
Idan Schwartz
A. Schwing
Tamir Hazan
55
89
0
21 Oct 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
S. Hoi
40
30
0
20 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei Chen
Weiping Wang
Li Liu
M. Lew
VLM
115
31
0
16 Oct 2020
Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval
Andrés Mafla
S. Dey
Ali Furkan Biten
Lluís Gómez
Dimosthenis Karatzas
27
25
0
21 Sep 2020
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Yikuan Li
Hanyin Wang
Yuan Luo
16
63
0
03 Sep 2020
A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
33
228
0
27 Aug 2020
Enriching Video Captions With Contextual Text
Philipp Rimle
Pelin Dogan
Markus Gross
30
3
0
29 Jul 2020
AiR: Attention with Reasoning Capability
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
13
36
0
28 Jul 2020
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
25
101
0
28 Jul 2020
Approximated Bilinear Modules for Temporal Modeling
Xinqi Zhu
Chang Xu
Langwen Hui
Cewu Lu
Dacheng Tao
17
23
0
25 Jul 2020
Second-Order Pooling for Graph Neural Networks
Zhengyang Wang
Shuiwang Ji
GNN
19
80
0
20 Jul 2020
Referring Expression Comprehension: A Survey of Methods and Datasets
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
50
93
0
19 Jul 2020
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
K. Gouthaman
Anurag Mittal
44
78
0
13 Jul 2020
Previous
1
2
3
4
5
Next