ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.14095
  4. Cited By
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual
  Question Answering

MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering

27 October 2020
Aisha Urooj Khan
Amir Mazaheri
N. Lobo
M. Shah
ArXivPDFHTML

Papers citing "MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering"

27 / 27 papers shown
Title
Leveraging Static Relationships for Intra-Type and Inter-Type Message Passing in Video Question Answering
Leveraging Static Relationships for Intra-Type and Inter-Type Message Passing in Video Question Answering
Lili Liang
Guanglu Sun
48
0
0
03 Apr 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
36
0
0
13 Jan 2025
Assessing Modality Bias in Video Question Answering Benchmarks with
  Multimodal Large Language Models
Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models
Jean Park
Kuk Jin Jang
Basam Alasaly
Sriharsha Mopidevi
Andrew Zolensky
Eric Eaton
Insup Lee
Kevin Johnson
35
4
0
22 Aug 2024
End-to-End Video Question Answering with Frame Scoring Mechanisms and
  Adaptive Sampling
End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling
Jianxin Liang
Xiaojun Meng
Yueqian Wang
Chang Liu
Qun Liu
Dongyan Zhao
29
5
0
21 Jul 2024
Self-Regulated Data-Free Knowledge Amalgamation for Text Classification
Self-Regulated Data-Free Knowledge Amalgamation for Text Classification
Prashanth Vijayaraghavan
Hongzhi Wang
Luyao Shi
Tyler Baldwin
David Beymer
Ehsan Degan
29
1
0
16 Jun 2024
Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal
  Reasoning for Real-world Video Question Answering
Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering
Lili Liang
Guanglu Sun
Jin Qiu
Lizhong Zhang
NAI
21
3
0
05 Apr 2024
Can Model Fusing Help Transformers in Long Document Classification? An
  Empirical Study
Can Model Fusing Help Transformers in Long Document Classification? An Empirical Study
Damith Premasiri
Tharindu Ranasinghe
R. Mitkov
VLM
21
1
0
18 Jul 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research
  in Hausa Language
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
I. Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
22
4
0
28 May 2023
ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning
  over Untrimmed Videos
ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
Zhou Yu
Lixiang Zheng
Zhou Zhao
A. Fedoseev
Jianping Fan
Kui Ren
Jun Yu
CoGe
35
13
0
04 May 2023
Learning Situation Hyper-Graphs for Video Question Answering
Learning Situation Hyper-Graphs for Video Question Answering
Aisha Urooj Khan
Hilde Kuehne
Bo Wu
Kim Chheu
Walid Bousselham
Chuang Gan
N. Lobo
M. Shah
34
15
0
18 Apr 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Xiao Wang
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
28
201
0
20 Feb 2023
M-SENSE: Modeling Narrative Structure in Short Personal Narratives Using
  Protagonist's Mental Representations
M-SENSE: Modeling Narrative Structure in Short Personal Narratives Using Protagonist's Mental Representations
Prashanth Vijayaraghavan
D. Roy
11
2
0
18 Feb 2023
SceneGATE: Scene-Graph based co-Attention networks for TExt visual
  question answering
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
20
4
0
16 Dec 2022
ParCNetV2: Oversized Kernel with Enhanced Attention
ParCNetV2: Oversized Kernel with Enhanced Attention
Ruihan Xu
Haokui Zhang
Wenze Hu
Shiliang Zhang
Xiaoyu Wang
ViT
25
6
0
14 Nov 2022
DiMBERT: Learning Vision-Language Grounded Representations with
  Disentangled Multimodal-Attention
DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention
Fenglin Liu
Xian Wu
Shen Ge
Xuancheng Ren
Wei Fan
Xu Sun
Yuexian Zou
VLM
73
12
0
28 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering
  via Decoupling Spatial-Temporal Modeling
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
25
2
0
08 Oct 2022
Semantic Structure based Query Graph Prediction for Question Answering
  over Knowledge Graph
Semantic Structure based Query Graph Prediction for Question Answering over Knowledge Graph
Mingchen Li
Jonathan Shihao Ji
11
14
0
15 Apr 2022
Knowledge Mining with Scene Text for Fine-Grained Recognition
Knowledge Mining with Scene Text for Fine-Grained Recognition
Hao Wang
Junchao Liao
Tianheng Cheng
Zewen Gao
Hao Liu
Bo Ren
X. Bai
Wenyu Liu
14
14
0
27 Mar 2022
Local-Global Context Aware Transformer for Language-Guided Video
  Segmentation
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
26
74
0
18 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
18
85
0
02 Mar 2022
VLP: A Survey on Vision-Language Pre-training
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
82
212
0
18 Feb 2022
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual
  Question Answering
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
29
19
0
27 Sep 2021
Improved RAMEN: Towards Domain Generalization for Visual Question
  Answering
Improved RAMEN: Towards Domain Generalization for Visual Question Answering
Bhanuka Gamage
Lim Chern Hong
22
1
0
06 Sep 2021
Found a Reason for me? Weakly-supervised Grounded Visual Question
  Answering using Capsules
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
Aisha Urooj Khan
Hilde Kuehne
Kevin Duarte
Chuang Gan
N. Lobo
M. Shah
15
34
0
11 May 2021
TubeR: Tubelet Transformer for Video Action Detection
TubeR: Tubelet Transformer for Video Action Detection
Jiaojiao Zhao
Yanyi Zhang
Xinyu Li
Hao Chen
Shuai Bing
...
Yuanjun Xiong
Davide Modolo
I. Marsic
Cees G. M. Snoek
Joseph Tighe
ViT
28
70
0
02 Apr 2021
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Linbo Jin
Ben Chen
Hao Zhou
Minghui Qiu
Ling Shao
VLM
22
120
0
30 Mar 2021
On the hidden treasure of dialog in video question answering
On the hidden treasure of dialog in video question answering
Deniz Engin
Franccois Schnitzler
Ngoc Q. K. Duong
Yannis Avrithis
21
10
0
26 Mar 2021
1