ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.01847
  4. Cited By
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

6 June 2016
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
ArXivPDFHTML

Papers citing "Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding"

50 / 225 papers shown
Title
Tensor Sketch: Fast and Scalable Polynomial Kernel Approximation
Tensor Sketch: Fast and Scalable Polynomial Kernel Approximation
Ninh Pham
Rasmus Pagh
27
0
0
13 May 2025
TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition
TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition
Feng Liu
Ziwang Fu
Yixuan Wang
Qijian Zheng
40
4
0
10 May 2025
Hadamard product in deep learning: Introduction, Advances and Challenges
Hadamard product in deep learning: Introduction, Advances and Challenges
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
V. Cevher
AAML
98
0
0
17 Apr 2025
A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
Xiang Liu
Zhaoxiang Liu
Huan Hu
Zezhou Chen
Kohou Wang
Ning Wang
Kai Wang
43
1
0
10 Mar 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
82
8
0
21 Feb 2025
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Joe Dhanith
Shravan Venkatraman
Modigari Narendra
Vigya Sharma
Santhosh Malarvannan
81
0
0
20 Feb 2025
MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion
  Recognition
MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition
Peihao Xiang
Chaohao Lin
Kaida Wu
Ou Bai
34
3
0
28 Apr 2024
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
Quan Van Nguyen
Dan Quang Tran
Huy Quang Pham
Thang Kien-Bao Nguyen
Nghia Hieu Nguyen
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
CoGe
39
3
0
16 Apr 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
35
3
0
19 Feb 2024
Convincing Rationales for Visual Question Answering Reasoning
Convincing Rationales for Visual Question Answering Reasoning
Kun Li
G. Vosselman
Michael Ying Yang
44
1
0
06 Feb 2024
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric
  Robotic Manipulation
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li
Mingxu Zhang
Yiran Geng
Haoran Geng
Yuxing Long
Yan Shen
Renrui Zhang
Jiaming Liu
Hao Dong
LM&Ro
LRM
37
78
0
24 Dec 2023
Multimodality of AI for Education: Towards Artificial General
  Intelligence
Multimodality of AI for Education: Towards Artificial General Intelligence
Gyeong-Geon Lee
Lehong Shi
Ehsan Latif
Yizhu Gao
Arne Bewersdorff
...
Zheng Liu
Hui Wang
Gengchen Mai
Tiaming Liu
Xiaoming Zhai
24
38
0
10 Dec 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Alan L. Yuille
CoGe
27
12
0
27 Oct 2023
Gramian Attention Heads are Strong yet Efficient Vision Learners
Gramian Attention Heads are Strong yet Efficient Vision Learners
Jongbin Ryu
Dongyoon Han
J. Lim
32
1
0
25 Oct 2023
Divert More Attention to Vision-Language Object Tracking
Divert More Attention to Vision-Language Object Tracking
Mingzhe Guo
Zhipeng Zhang
Li Jing
Haibin Ling
Heng Fan
VLM
37
3
0
19 Jul 2023
Cross-modal Place Recognition in Image Databases using Event-based
  Sensors
Cross-modal Place Recognition in Image Databases using Event-based Sensors
Xiangli Ji
Jiaxin Wei
Yifu Wang
Huiliang Shang
L. Kneip
57
1
0
03 Jul 2023
Modularized Zero-shot VQA with Pre-trained Models
Modularized Zero-shot VQA with Pre-trained Models
Rui Cao
Jing Jiang
LRM
27
2
0
27 May 2023
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document
  Image Classification
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification
Souhail Bakkali
Zuheng Ming
Mickael Coustaty
Marçal Rusiñol
10
6
0
11 May 2023
Sketch-based Video Object Localization
Sketch-based Video Object Localization
Sangmin Woo
So-Yeong Jeon
Jinyoung Park
Minji Son
Sumin Lee
Changick Kim
16
0
0
02 Apr 2023
Reading and Reasoning over Chart Images for Evidence-based Automated
  Fact-Checking
Reading and Reasoning over Chart Images for Evidence-based Automated Fact-Checking
Mubashara Akhtar
O. Cocarascu
Elena Simperl
21
25
0
27 Jan 2023
Tensor Networks Meet Neural Networks: A Survey and Future Perspectives
Tensor Networks Meet Neural Networks: A Survey and Future Perspectives
Maolin Wang
Y. Pan
Zenglin Xu
Xiangli Yang
Guangxi Li
A. Cichocki
Andrzej Cichocki
53
19
0
22 Jan 2023
UnICLAM:Contrastive Representation Learning with Adversarial Masking for
  Unified and Interpretable Medical Vision Question Answering
UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering
Chenlu Zhan
Peng Peng
Hongsen Wang
Tao Chen
Hongwei Wang
MedIm
23
3
0
21 Dec 2022
InterMulti:Multi-view Multimodal Interactions with Text-dominated
  Hierarchical High-order Fusion for Emotion Analysis
InterMulti:Multi-view Multimodal Interactions with Text-dominated Hierarchical High-order Fusion for Emotion Analysis
Feng Qiu
Wanzeng Kong
Yu-qiong Ding
34
2
0
20 Dec 2022
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual
  Reasoning
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
Zhuowan Li
Xingrui Wang
Elias Stengel-Eskin
Adam Kortylewski
Wufei Ma
Benjamin Van Durme
Max Planck Institute for Informatics
OOD
LRM
26
57
0
01 Dec 2022
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and
  Grounding
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
Siyi Liu
Yaoyuan Liang
Feng Li
Shijia Huang
Hao Zhang
Hang Su
Jun Zhu
Lei Zhang
ObjD
50
25
0
28 Nov 2022
Versatile Diffusion: Text, Images and Variations All in One Diffusion
  Model
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
Xingqian Xu
Zhangyang Wang
Eric Zhang
Kai Wang
Humphrey Shi
DiffM
35
183
0
15 Nov 2022
YORO -- Lightweight End to End Visual Grounding
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
21
21
0
15 Nov 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing
  Data
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
74
106
0
23 Oct 2022
MUG: Interactive Multimodal Grounding on User Interfaces
MUG: Interactive Multimodal Grounding on User Interfaces
Tao Li
Gang Li
Jingjie Zheng
Purple Wang
Yang Li
LLMAG
33
8
0
29 Sep 2022
DM$^2$S$^2$: Deep Multi-Modal Sequence Sets with Hierarchical Modality
  Attention
DM2^22S2^22: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention
Shunsuke Kitada
Yuki Iwazaki
Riku Togashi
Hitoshi Iyatomi
21
1
0
07 Sep 2022
MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning
MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning
Shangfei Zheng
Weiqing Wang
Jianfeng Qu
Hongzhi Yin
Wei Chen
Lei Zhao
LRM
21
22
0
03 Sep 2022
FashionVQA: A Domain-Specific Visual Question Answering System
FashionVQA: A Domain-Specific Visual Question Answering System
Min Wang
A. Mahjoubfar
Anupama Joshi
29
3
0
24 Aug 2022
Large-Scale Traffic Congestion Prediction based on Multimodal Fusion and
  Representation Mapping
Large-Scale Traffic Congestion Prediction based on Multimodal Fusion and Representation Mapping
Bo Zhou
Jiahui Liu
Songyi Cui
Yaping Zhao
26
5
0
23 Aug 2022
M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval
M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval
Shuo Liu
Weize Quan
Mingyuan Zhou
Sihong Chen
Jian Kang
Zhenlan Zhao
Chen Chen
Dong-Ming Yan
28
0
0
16 Aug 2022
Visual Perturbation-aware Collaborative Learning for Overcoming the
  Language Prior Problem
Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem
Yudong Han
Liqiang Nie
Jianhua Yin
Jianlong Wu
Yan Yan
24
12
0
24 Jul 2022
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with
  Natural Language Explanations
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations
Qian Yang
Yunxin Li
Baotian Hu
Lin Ma
Yuxin Ding
Min Zhang
27
10
0
23 Jul 2022
Divert More Attention to Vision-Language Tracking
Divert More Attention to Vision-Language Tracking
Mingzhe Guo
Zhipeng Zhang
Heng Fan
Li Jing
29
53
0
03 Jul 2022
From Pixels to Objects: Cubic Visual Attention for Visual Question
  Answering
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Heng Tao Shen
32
62
0
04 Jun 2022
Structured Two-stream Attention Network for Video Question Answering
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
37
68
0
02 Jun 2022
V-Doc : Visual questions answers with Documents
V-Doc : Visual questions answers with Documents
Yihao Ding
Zhe Huang
Runlin Wang
Yanhang Zhang
Xianru Chen
Yuzhong Ma
Hyunsuk Chung
S. Han
31
15
0
27 May 2022
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks
  for Visual Question Answering
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
Yanan Wang
Michihiro Yasunaga
Hongyu Ren
Shinya Wada
J. Leskovec
29
17
0
23 May 2022
Learning to Answer Visual Questions from Web Videos
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
34
33
0
10 May 2022
Adapting CLIP For Phrase Localization Without Further Training
Adapting CLIP For Phrase Localization Without Further Training
Jiahao Li
G. Shakhnarovich
Raymond A. Yeh
VLM
CLIP
30
25
0
07 Apr 2022
An Algebraic Approach to Learning and Grounding
An Algebraic Approach to Learning and Grounding
Johanna Björklund
Adam Dahlgren Lindström
F. Drewes
24
0
0
06 Apr 2022
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
LRM
NAI
27
20
0
05 Apr 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement
  Networks for End-to-End Visual Grounding
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Jiabo Ye
Junfeng Tian
Ming Yan
Xiaoshan Yang
Xuwu Wang
Ji Zhang
Liang He
Xin Lin
ObjD
11
61
0
29 Mar 2022
Large-scale Bilingual Language-Image Contrastive Learning
Large-scale Bilingual Language-Image Contrastive Learning
ByungSoo Ko
Geonmo Gu
VLM
32
14
0
28 Mar 2022
REX: Reasoning-aware and Grounded Explanation
REX: Reasoning-aware and Grounded Explanation
Shi Chen
Qi Zhao
25
18
0
11 Mar 2022
NLX-GPT: A Model for Natural Language Explanations in Vision and
  Vision-Language Tasks
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
Fawaz Sammani
Tanmoy Mukherjee
Nikos Deligiannis
MILM
ELM
LRM
18
67
0
09 Mar 2022
Recent, rapid advancement in visual question answering architecture: a
  review
Recent, rapid advancement in visual question answering architecture: a review
V. Kodali
Daniel Berleant
34
9
0
02 Mar 2022
12345
Next