ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1805.07932
  4. Cited By
Bilinear Attention Networks

Bilinear Attention Networks

21 May 2018
Jin-Hwa Kim
Jaehyun Jun
Byoung-Tak Zhang
    AIMat
ArXivPDFHTML

Papers citing "Bilinear Attention Networks"

50 / 164 papers shown
Title
Question-Driven Graph Fusion Network For Visual Question Answering
Question-Driven Graph Fusion Network For Visual Question Answering
Yuxi Qian
Yuncong Hu
Ruonan Wang
Fangxiang Feng
Xiaojie Wang
GNN
21
10
0
03 Apr 2022
Co-VQA : Answering by Interactive Sub Question Sequence
Co-VQA : Answering by Interactive Sub Question Sequence
Ruonan Wang
Yuxi Qian
Fangxiang Feng
Xiaojie Wang
Huixing Jiang
LRM
29
16
0
02 Apr 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
27
16
0
27 Mar 2022
MuKEA: Multimodal Knowledge Extraction and Accumulation for
  Knowledge-based Visual Question Answering
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
Yang Ding
Jing Yu
Bangchang Liu
Yue Hu
Mingxin Cui
Qi Wu
13
62
0
17 Mar 2022
REX: Reasoning-aware and Grounded Explanation
REX: Reasoning-aware and Grounded Explanation
Shi Chen
Qi Zhao
25
18
0
11 Mar 2022
Can Open Domain Question Answering Systems Answer Visual Knowledge
  Questions?
Can Open Domain Question Answering Systems Answer Visual Knowledge Questions?
Jiawen Zhang
Abhijit Mishra
Avinesh P.V.S
Siddharth Patwardhan
Sachin Agarwal
24
0
0
09 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
392
4,185
0
28 Jan 2022
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular
  Vision-Language Pre-training
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training
Yehao Li
Jiahao Fan
Yingwei Pan
Ting Yao
Weiyao Lin
Tao Mei
MLLM
ObjD
33
19
0
11 Jan 2022
Does CLIP Benefit Visual Question Answering in the Medical Domain as
  Much as it Does in the General Domain?
Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?
Sedigheh Eslami
Gerard de Melo
Christoph Meinel
CLIP
MedIm
24
116
0
27 Dec 2021
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in
  Visual Question Answering
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Jianjian Cao
Xiameng Qin
Sanyuan Zhao
Jianbing Shen
36
20
0
14 Dec 2021
Grounded Language-Image Pre-training
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
31
1,025
0
07 Dec 2021
MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided
  Multimodal Attention for Textbook Question Answering
MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering
Fangzhi Xu
Qika Lin
Jing Liu
Lingling Zhang
Tianzhe Zhao
Qianyi Chai
Yudai Pan
21
2
0
06 Dec 2021
Video-Text Pre-training with Learned Regions
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
33
23
0
02 Dec 2021
ML-Decoder: Scalable and Versatile Classification Head
ML-Decoder: Scalable and Versatile Classification Head
T. Ridnik
Gilad Sharir
Avi Ben-Cohen
Emanuel Ben-Baruch
Asaf Noy
VLM
32
102
0
25 Nov 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language
  Modeling
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
196
387
0
06 Nov 2021
RF-Net: a Unified Meta-learning Framework for RF-enabled One-shot Human
  Activity Recognition
RF-Net: a Unified Meta-learning Framework for RF-enabled One-shot Human Activity Recognition
Shuya Ding
Zhe Chen
Tianyue Zheng
Jun Luo
26
111
0
29 Oct 2021
Perceptual Score: What Data Modalities Does Your Model Perceive?
Perceptual Score: What Data Modalities Does Your Model Perceive?
Itai Gat
Idan Schwartz
Alex Schwing
44
30
0
27 Oct 2021
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual
  Language Reasoning
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Zhou Yu
Xiaodan Liang
Song-Chun Zhu
AIMat
41
184
0
25 Oct 2021
Single-Modal Entropy based Active Learning for Visual Question Answering
Single-Modal Entropy based Active Learning for Visual Question Answering
Dong-Jin Kim
Jae-Won Cho
Jinsoo Choi
Yunjae Jung
In So Kweon
25
12
0
21 Oct 2021
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLM
CLIP
104
992
0
09 Oct 2021
Counterfactual Samples Synthesizing and Training for Robust Visual
  Question Answering
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Yulei Niu
Hanwang Zhang
Jun Xiao
AAML
OOD
21
36
0
03 Oct 2021
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real
  Images
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Zhuowan Li
Elias Stengel-Eskin
Yixiao Zhang
Cihang Xie
Q. Tran
Benjamin Van Durme
Alan Yuille
VLM
24
15
0
01 Oct 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual
  Question Answering
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
40
19
0
27 Sep 2021
Feature Correlation Aggregation: on the Path to Better Graph Neural
  Networks
Feature Correlation Aggregation: on the Path to Better Graph Neural Networks
Jieming Zhou
Tong Zhang
Pengfei Fang
L. Petersson
Mehrtash Harandi
GNN
31
1
0
20 Sep 2021
Towards Joint Intent Detection and Slot Filling via Higher-order Attention
Dongsheng Chen
Zhiqi Huang
Xian Wu
Shen Ge
Yuexian Zou
31
20
0
18 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question
  Answering
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
30
18
0
04 Sep 2021
On the Significance of Question Encoder Sequence Model in the
  Out-of-Distribution Performance in Visual Question Answering
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
45
0
0
28 Aug 2021
Auto-Parsing Network for Image Captioning and Visual Question Answering
Auto-Parsing Network for Image Captioning and Visual Question Answering
Xu Yang
Chongyang Gao
Hanwang Zhang
Jianfei Cai
24
35
0
24 Aug 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal
  Analytics
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
19
31
0
18 Aug 2021
MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in
  Sequential Recommendation
MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation
Hojoon Lee
Dongyoon Hwang
Sunghwan Hong
Changyeon Kim
Seungryong Kim
Jaegul Choo
27
10
0
17 Aug 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Chenyu You
Caiming Xiong
Guosheng Lin
FaML
83
1,893
0
16 Jul 2021
Zero-shot Visual Question Answering using Knowledge Graph
Zero-shot Visual Question Answering using Knowledge Graph
Zhuo Chen
Jiaoyan Chen
Yuxia Geng
Jeff Z. Pan
Zonggang Yuan
Huajun Chen
23
70
0
12 Jul 2021
Probing Inter-modality: Visual Parsing with Self-Attention for
  Vision-Language Pre-training
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Hongwei Xue
Yupan Huang
Bei Liu
Houwen Peng
Jianlong Fu
Houqiang Li
Jiebo Luo
33
89
0
25 Jun 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video
  Question Answering
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
18
53
0
19 Jun 2021
Supervising the Transfer of Reasoning Patterns in VQA
Supervising the Transfer of Reasoning Patterns in VQA
Corentin Kervadec
Christian Wolf
G. Antipov
M. Baccouche
Madiha Nadri Wolf
30
10
0
10 Jun 2021
PAM: Understanding Product Images in Cross Product Category Attribute
  Extraction
PAM: Understanding Product Images in Cross Product Category Attribute Extraction
Rongmei Lin
Xiang He
J. Feng
Nasser Zalmout
Yan Liang
Li Xiong
Xin Luna Dong
36
35
0
08 Jun 2021
Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused
  Interventions
Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions
Daniel Rosenberg
Itai Gat
Amir Feder
Roi Reichart
AAML
44
16
0
08 Jun 2021
DoubleField: Bridging the Neural Surface and Radiance Fields for
  High-fidelity Human Reconstruction and Rendering
DoubleField: Bridging the Neural Surface and Radiance Fields for High-fidelity Human Reconstruction and Rendering
Ruizhi Shao
Hongwen Zhang
He Zhang
Mingjia Chen
Yan-Pei Cao
Tao Yu
Yebin Liu
3DH
22
64
0
07 Jun 2021
Multiple Meta-model Quantifying for Medical Visual Question Answering
Multiple Meta-model Quantifying for Medical Visual Question Answering
Tuong Khanh Long Do
Binh X. Nguyen
Erman Tjiputra
Minh-Ngoc Tran
Quang-Dieu Tran
A. Nguyen
38
99
0
19 May 2021
Exemplar-Based Open-Set Panoptic Segmentation Network
Exemplar-Based Open-Set Panoptic Segmentation Network
Jaedong Hwang
Seoung Wug Oh
Joon-Young Lee
Bohyung Han
VLM
33
50
0
18 May 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
90
864
0
26 Apr 2021
Robust Sensor Fusion Algorithms Against Voice Command Attacks in
  Autonomous Vehicles
Robust Sensor Fusion Algorithms Against Voice Command Attacks in Autonomous Vehicles
Jiwei Guan
Xi Zheng
Chen Wang
Yipeng Zhou
A. Jolfaei
AAML
23
5
0
20 Apr 2021
Augmenting Deep Classifiers with Polynomial Neural Networks
Augmenting Deep Classifiers with Polynomial Neural Networks
Grigorios G. Chrysos
Markos Georgopoulos
Jiankang Deng
Jean Kossaifi
Yannis Panagakis
Anima Anandkumar
24
18
0
16 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language
  Representation Learning
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
51
271
0
07 Apr 2021
Energy-Based Learning for Scene Graph Generation
Energy-Based Learning for Scene Graph Generation
M. Suhail
Abhay Mittal
Behjat Siddiquie
Chris Broaddus
J. Eledath
Gérard Medioni
Leonid Sigal
45
158
0
03 Mar 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
277
525
0
04 Feb 2021
Latent Variable Models for Visual Question Answering
Latent Variable Models for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
25
5
0
16 Jan 2021
WeaQA: Weak Supervision via Captions for Visual Question Answering
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
25
35
0
04 Dec 2020
Language-guided Navigation via Cross-Modal Grounding and Alternate
  Adversarial Learning
Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning
Weixia Zhang
Chao Ma
Qi Wu
Xiaokang Yang
39
44
0
22 Nov 2020
On the spatial attention in Spatio-Temporal Graph Convolutional Networks
  for skeleton-based human action recognition
On the spatial attention in Spatio-Temporal Graph Convolutional Networks for skeleton-based human action recognition
Negar Heidari
Alexandros Iosifidis
GNN
19
12
0
07 Nov 2020
Previous
1234
Next