Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1708.03619
Cited By
Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering
10 August 2017
Zhou Yu
Jun-chen Yu
Chenchao Xiang
Jianping Fan
Dacheng Tao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering"
50 / 52 papers shown
Title
Hadamard product in deep learning: Introduction, Advances and Challenges
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
V. Cevher
AAML
98
1
0
17 Apr 2025
Generalizable Prompt Learning of CLIP: A Brief Overview
Fangming Cui
Yonggang Zhang
Xuan Wang
Xule Wang
Liang Xiao
VPVLM
VLM
227
0
0
03 Mar 2025
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
56
9
0
16 Oct 2024
Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA
Elham J. Barezi
Parisa Kordjamshidi
CoGe
37
0
0
27 Jun 2024
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification
Souhail Bakkali
Zuheng Ming
Mickael Coustaty
Marçal Rusiñol
10
6
0
11 May 2023
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery
Lalithkumar Seenivasan
Mobarakol Islam
Gokul Kannan
Hongliang Ren
24
40
0
19 Apr 2023
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance Industry
Azin Asgarian
Rohit Saha
Daniel Jakubovitz
Julia Peyre
32
2
0
15 Jan 2023
Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection
Yanxin Long
Jianhua Han
Runhu Huang
Xu Hang
Yi Zhu
Chunjing Xu
Xiaodan Liang
VLM
ObjD
35
18
0
02 Nov 2022
What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?
Yang Trista Cao
Kyle Seelman
Kyungjun Lee
Hal Daumé
20
5
0
26 Oct 2022
Locate before Answering: Answer Guided Question Localization for Video Question Answering
Tianwen Qian
Ran Cui
Jingjing Chen
Pai Peng
Xiao-Wei Guo
Yu-Gang Jiang
34
17
0
05 Oct 2022
MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning
Shangfei Zheng
Weiqing Wang
Jianfeng Qu
Hongzhi Yin
Wei Chen
Lei Zhao
LRM
21
22
0
03 Sep 2022
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
Yang Ding
Jing Yu
Bangchang Liu
Yue Hu
Mingxin Cui
Qi Wu
13
62
0
17 Mar 2022
Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering
Mingxiao Li
Marie-Francine Moens
17
12
0
06 Mar 2022
Recent, rapid advancement in visual question answering architecture: a review
V. Kodali
Daniel Berleant
40
9
0
02 Mar 2022
3D Question Answering
Shuquan Ye
Dongdong Chen
Songfang Han
Jing Liao
ViT
31
47
0
15 Dec 2021
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Jianjian Cao
Xiameng Qin
Sanyuan Zhao
Jianbing Shen
31
20
0
14 Dec 2021
Distribution Knowledge Embedding for Graph Pooling
Kaixuan Chen
Mingli Song
Shunyu Liu
Na Yu
Zunlei Feng
Gengshi Han
Xiuming Zhang
GNN
47
22
0
29 Sep 2021
How to find a good image-text embedding for remote sensing visual question answering?
Christel Chappuis
Sylvain Lobry
B. Kellenberger
Bertrand Le Saux
D. Tuia
40
20
0
24 Sep 2021
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil
Cheng Zhang
D. Xuan
Wei-Lun Chao
61
20
0
13 Sep 2021
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
21
2
0
09 Sep 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
27
53
0
16 Aug 2021
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
Jianyu Wang
Bingkun Bao
Changsheng Xu
19
75
0
10 Jul 2021
Biomedical Question Answering: A Survey of Approaches and Challenges
Qiao Jin
Zheng Yuan
Guangzhi Xiong
Qian Yu
Huaiyuan Ying
Chuanqi Tan
Mosha Chen
Songfang Huang
Xiaozhong Liu
Sheng Yu
29
95
0
10 Feb 2021
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao
Bailin Li
Xiaodan Liang
Keze Wang
Liang Lin
44
36
0
14 Dec 2020
After All, Only The Last Neuron Matters: Comparing Multi-modal Fusion Functions for Scene Graph Generation
Mohamed Karim Belaid
31
1
0
09 Nov 2020
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
13
42
0
04 Nov 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei Chen
Weiping Wang
Li Liu
M. Lew
VLM
118
31
0
16 Oct 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
M. Farazi
Salman H. Khan
Nick Barnes
23
17
0
20 Jan 2020
In Defense of Grid Features for Visual Question Answering
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OOD
ObjD
23
318
0
10 Jan 2020
Modulated Self-attention Convolutional Network for VQA
Jean-Benoit Delbrouck
Antoine Maiorca
Nathan Hubens
Stéphane Dupont
23
1
0
08 Oct 2019
Adaptively Denoising Proposal Collection for Weakly Supervised Object Localization
Dianbo Liu
Yuanwei Wu
Wenchi Ma
Guanghui Wang
28
13
0
04 Oct 2019
DNN-based cross-lingual voice conversion using Bottleneck Features
M. K. Reddy
K. S. Rao
26
4
0
09 Sep 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
93
2,456
0
20 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
30
157
0
20 Aug 2019
Attentional Feature-Pair Relation Networks for Accurate Face Recognition
Bong-Nam Kang
Yonghyun Kim
Bongjin Jun
Daijin Kim
CVBM
24
37
0
17 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
27
38
0
12 Aug 2019
LoRMIkA: Local rule-based model interpretability with k-optimal associations
Dilini Sewwandi Rajapaksha
Christoph Bergmeir
Wray L. Buntine
35
31
0
11 Aug 2019
Multi-modality Latent Interaction Network for Visual Question Answering
Peng Gao
Haoxuan You
Zhanpeng Zhang
Xiaogang Wang
Hongsheng Li
25
82
0
10 Aug 2019
Question-Agnostic Attention for Visual Question Answering
M. Farazi
Salman H Khan
Nick Barnes
13
10
0
09 Aug 2019
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering
Cheng Zhang
Wei-Lun Chao
D. Xuan
23
50
0
28 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
36
797
0
25 Jun 2019
RUBi: Reducing Unimodal Biases in Visual Question Answering
Rémi Cadène
Corentin Dancette
H. Ben-younes
Matthieu Cord
Devi Parikh
CML
19
369
0
24 Jun 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
24
439
0
06 Jun 2019
Frontal Low-rank Random Tensors for Fine-grained Action Segmentation
Yan Zhang
Krikamol Muandet
Qianli Ma
Heiko Neumann
Siyu Tang
37
3
0
03 Jun 2019
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
27
377
0
20 May 2019
MUREL: Multimodal Relational Reasoning for Visual Question Answering
Rémi Cadène
H. Ben-younes
Matthieu Cord
Nicolas Thome
LRM
19
271
0
25 Feb 2019
AU R-CNN: Encoding Expert Prior Knowledge into R-CNN for Action Unit Detection
Chen Ma
Li Chen
Jun-hai Yong
11
86
0
14 Dec 2018
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering
Peng Gao
Zhengkai Jiang
Haoxuan You
Pan Lu
Steven C. H. Hoi
Xiaogang Wang
Hongsheng Li
AIMat
24
363
0
13 Dec 2018
Visual Reasoning by Progressive Module Networks
Seung Wook Kim
Makarand Tapaswi
Sanja Fidler
ReLM
LRM
36
13
0
06 Jun 2018
Joint Image Captioning and Question Answering
Jialin Wu
Zeyuan Hu
Raymond J. Mooney
24
12
0
22 May 2018
1
2
Next