ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.01455
  4. Cited By
Multimodal Residual Learning for Visual QA

Multimodal Residual Learning for Visual QA

5 June 2016
Jin-Hwa Kim
Sang-Woo Lee
Donghyun Kwak
Min-Oh Heo
Jeonghee Kim
Jung-Woo Ha
Byoung-Tak Zhang
ArXivPDFHTML

Papers citing "Multimodal Residual Learning for Visual QA"

50 / 55 papers shown
Title
Hadamard product in deep learning: Introduction, Advances and Challenges
Hadamard product in deep learning: Introduction, Advances and Challenges
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
V. Cevher
AAML
98
1
0
17 Apr 2025
PinLanding: Content-First Keyword Landing Page Generation via Multi-Modal AI for Web-Scale Discovery
Faye Zhang
Jasmine Wan
Qianyu Cheng
Jinfeng Rao
44
0
0
01 Mar 2025
Leveraging Large Language Models for Multimodal Search
Leveraging Large Language Models for Multimodal Search
Oriol Barbany
Michael Huang
Xinliang Zhu
Arnab Dhua
33
9
0
24 Apr 2024
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
Quan Van Nguyen
Dan Quang Tran
Huy Quang Pham
Thang Kien-Bao Nguyen
Nghia Hieu Nguyen
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
CoGe
39
3
0
16 Apr 2024
Language Guided Local Infiltration for Interactive Image Retrieval
Language Guided Local Infiltration for Interactive Image Retrieval
Fuxiang Huang
Lei Zhang
26
5
0
16 Apr 2023
Effective Multimodal Reinforcement Learning with Modality Alignment and
  Importance Enhancement
Effective Multimodal Reinforcement Learning with Modality Alignment and Importance Enhancement
Jinming Ma
Feng Wu
Yingfeng Chen
Xianpeng Ji
Yu-qiong Ding
OffRL
33
4
0
18 Feb 2023
Composed Image Retrieval with Text Feedback via Multi-grained
  Uncertainty Regularization
Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
Yiyang Chen
Zhedong Zheng
Wei Ji
Leigang Qu
Tat-Seng Chua
39
37
0
14 Nov 2022
Multimodal Feature Extraction for Memes Sentiment Classification
Multimodal Feature Extraction for Memes Sentiment Classification
Sofiane Ouaari
Tsegaye Misikir Tashu
Tomáš Horváth
20
6
0
07 Jul 2022
Training and challenging models for text-guided fashion image retrieval
Training and challenging models for text-guided fashion image retrieval
Eric Dodds
Jack Culpepper
Gaurav Srivastava
23
8
0
23 Apr 2022
Attention Mechanism based Cognition-level Scene Understanding
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
35
0
0
17 Apr 2022
MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided
  Multimodal Attention for Textbook Question Answering
MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering
Fangzhi Xu
Qika Lin
Jing Liu
Lingling Zhang
Tianzhe Zhao
Qianyi Chai
Yudai Pan
21
2
0
06 Dec 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual
  Question Answering
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
40
19
0
27 Sep 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language
  Models
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
22
191
0
09 Aug 2021
Bridge to Answer: Structure-aware Graph Interaction Network for Video
  Question Answering
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
167
100
0
29 Apr 2021
Adaptive Offline Quintuplet Loss for Image-Text Matching
Adaptive Offline Quintuplet Loss for Image-Text Matching
Tianlang Chen
Jiajun Deng
Jiebo Luo
181
68
0
07 Mar 2020
Anatomy-aware 3D Human Pose Estimation with Bone-based Pose
  Decomposition
Anatomy-aware 3D Human Pose Estimation with Bone-based Pose Decomposition
Tianlang Chen
Chen Fang
Xiaohui Shen
Yiheng Zhu
Zhili Chen
Jiebo Luo
3DH
MedIm
27
23
0
24 Feb 2020
Residual Knowledge Distillation
Residual Knowledge Distillation
Mengya Gao
Yujun Shen
Quanquan Li
Chen Change Loy
22
28
0
21 Feb 2020
Multi-modal Deep Analysis for Multimedia
Multi-modal Deep Analysis for Multimedia
Wenwu Zhu
Xin Eric Wang
Hongzhi Li
29
38
0
11 Oct 2019
Compact Trilinear Interaction for Visual Question Answering
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
36
59
0
26 Sep 2019
OmniNet: A unified architecture for multi-modal multi-task learning
OmniNet: A unified architecture for multi-modal multi-task learning
Subhojeet Pramanik
Priyanka Agrawal
A. Hussain
27
41
0
17 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
36
798
0
25 Jun 2019
Question Guided Modular Routing Networks for Visual Question Answering
Question Guided Modular Routing Networks for Visual Question Answering
Yanze Wu
Qiang Sun
Jianqi Ma
Bin Li
Yanwei Fu
Yao Peng
Xiangyang Xue
23
1
0
17 Apr 2019
Dual Attention Networks for Visual Reference Resolution in Visual Dialog
Dual Attention Networks for Visual Reference Resolution in Visual Dialog
Gi-Cheon Kang
Jaeseo Lim
Byoung-Tak Zhang
22
72
0
25 Feb 2019
Conditional Transfer with Dense Residual Attention: Synthesizing traffic
  signs from street-view imagery
Conditional Transfer with Dense Residual Attention: Synthesizing traffic signs from street-view imagery
Clint Sebastian
R. Uittenbogaard
J. Vijverberg
B. Boom
Peter H. N. de With
ViT
23
7
0
05 Sep 2018
Learning Visual Knowledge Memory Networks for Visual Question Answering
Learning Visual Knowledge Memory Networks for Visual Question Answering
Zhou Su
Chen Zhu
Yinpeng Dong
Dongqi Cai
Yurong Chen
Jianguo Li
34
62
0
13 Jun 2018
GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story
  Generation
GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story Generation
Taehyeong Kim
Min-Oh Heo
Seonil Son
Kyoung-Wha Park
Byoung-Tak Zhang
31
75
0
28 May 2018
R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual
  Question Answering
R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering
Pan Lu
Lei Ji
Wei Zhang
Nan Duan
M. Zhou
Jianyong Wang
CoGe
25
79
0
24 May 2018
Deep Multimodal Subspace Clustering Networks
Deep Multimodal Subspace Clustering Networks
Mahdi Abavisani
Vishal M. Patel
33
163
0
17 Apr 2018
Improved Fusion of Visual and Language Representations by Dense
  Symmetric Co-Attention for Visual Question Answering
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Duy-Kien Nguyen
Takayuki Okatani
30
279
0
03 Apr 2018
Motion-Appearance Co-Memory Networks for Video Question Answering
Motion-Appearance Co-Memory Networks for Video Question Answering
J. Gao
Runzhou Ge
Kan Chen
Ram Nevatia
41
240
0
29 Mar 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
51
426
0
23 Mar 2018
Dual Recurrent Attention Units for Visual Question Answering
Dual Recurrent Attention Units for Visual Question Answering
Ahmed Osman
Wojciech Samek
36
30
0
01 Feb 2018
Don't Just Assume; Look and Answer: Overcoming Priors for Visual
  Question Answering
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal
Dhruv Batra
Devi Parikh
Aniruddha Kembhavi
OOD
76
582
0
01 Dec 2017
Exploring Human-like Attention Supervision in Visual Question Answering
Exploring Human-like Attention Supervision in Visual Question Answering
Tingting Qiao
Jianfeng Dong
Duanqing Xu
19
104
0
19 Sep 2017
VQS: Linking Segmentations to Questions and Answers for Supervised
  Attention in VQA and Question-Focused Semantic Segmentation
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Chuang Gan
Yandong Li
Haoxiang Li
Chen Sun
Boqing Gong
27
126
0
15 Aug 2017
Structured Attentions for Visual Question Answering
Structured Attentions for Visual Question Answering
Chen Zhu
Yanpeng Zhao
Shuaiyi Huang
Kewei Tu
Yi Ma
FAtt
32
106
0
07 Aug 2017
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for
  Visual Question Answering
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering
Zhou Yu
Jun-chen Yu
Jianping Fan
Dacheng Tao
41
663
0
04 Aug 2017
DeepStory: Video Story QA by Deep Embedded Memory Networks
DeepStory: Video Story QA by Deep Embedded Memory Networks
Kyung-Min Kim
Min-Oh Heo
Seongho Choi
Byoung-Tak Zhang
26
174
0
04 Jul 2017
Modulating early visual processing by language
Modulating early visual processing by language
H. D. Vries
Florian Strub
Jérémie Mary
Hugo Larochelle
Olivier Pietquin
Aaron Courville
31
484
0
02 Jul 2017
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
67
578
0
18 May 2017
The Promise of Premise: Harnessing Question Premises in Visual Question
  Answering
The Promise of Premise: Harnessing Question Premises in Visual Question Answering
Aroma Mahendru
Viraj Prabhu
Akrit Mohapatra
Dhruv Batra
Stefan Lee
NAI
37
38
0
01 May 2017
Residual Attention Network for Image Classification
Residual Attention Network for Image Classification
Fei Wang
Mengqing Jiang
Chao Qian
Shuo Yang
Cheng Li
Honggang Zhang
Xiaogang Wang
Xiaoou Tang
66
3,289
0
23 Apr 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
34
547
0
14 Apr 2017
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question
  Answering
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering
V. Kazemi
Ali Elqursh
OOD
28
184
0
11 Apr 2017
An Analysis of Visual Question Answering Algorithms
An Analysis of Visual Question Answering Algorithms
Kushal Kafle
Christopher Kanan
30
231
0
28 Mar 2017
Recurrent Multimodal Interaction for Referring Image Segmentation
Recurrent Multimodal Interaction for Referring Image Segmentation
Chenxi Liu
Zhe Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Alan Yuille
EgoV
36
234
0
23 Mar 2017
Task-driven Visual Saliency and Attention-based Visual Question
  Answering
Task-driven Visual Saliency and Attention-based Visual Question Answering
Yuetan Lin
Zhangyang Pang
Donghui Wang
Yueting Zhuang
35
26
0
22 Feb 2017
The VQA-Machine: Learning How to Use Existing Vision Algorithms to
  Answer New Questions
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
Peng Wang
Qi Wu
Chunhua Shen
Anton Van Den Hengel
OOD
39
86
0
16 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
155
3,136
0
02 Dec 2016
Dual Attention Networks for Multimodal Reasoning and Matching
Dual Attention Networks for Multimodal Reasoning and Matching
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
45
664
0
02 Nov 2016
12
Next