Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.01455
Cited By
Multimodal Residual Learning for Visual QA
5 June 2016
Jin-Hwa Kim
Sang-Woo Lee
Donghyun Kwak
Min-Oh Heo
Jeonghee Kim
Jung-Woo Ha
Byoung-Tak Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multimodal Residual Learning for Visual QA"
50 / 55 papers shown
Title
Hadamard product in deep learning: Introduction, Advances and Challenges
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
V. Cevher
AAML
98
1
0
17 Apr 2025
PinLanding: Content-First Keyword Landing Page Generation via Multi-Modal AI for Web-Scale Discovery
Faye Zhang
Jasmine Wan
Qianyu Cheng
Jinfeng Rao
44
0
0
01 Mar 2025
Leveraging Large Language Models for Multimodal Search
Oriol Barbany
Michael Huang
Xinliang Zhu
Arnab Dhua
33
9
0
24 Apr 2024
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
Quan Van Nguyen
Dan Quang Tran
Huy Quang Pham
Thang Kien-Bao Nguyen
Nghia Hieu Nguyen
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
CoGe
39
3
0
16 Apr 2024
Language Guided Local Infiltration for Interactive Image Retrieval
Fuxiang Huang
Lei Zhang
26
5
0
16 Apr 2023
Effective Multimodal Reinforcement Learning with Modality Alignment and Importance Enhancement
Jinming Ma
Feng Wu
Yingfeng Chen
Xianpeng Ji
Yu-qiong Ding
OffRL
33
4
0
18 Feb 2023
Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
Yiyang Chen
Zhedong Zheng
Wei Ji
Leigang Qu
Tat-Seng Chua
39
37
0
14 Nov 2022
Multimodal Feature Extraction for Memes Sentiment Classification
Sofiane Ouaari
Tsegaye Misikir Tashu
Tomáš Horváth
20
6
0
07 Jul 2022
Training and challenging models for text-guided fashion image retrieval
Eric Dodds
Jack Culpepper
Gaurav Srivastava
23
8
0
23 Apr 2022
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
35
0
0
17 Apr 2022
MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering
Fangzhi Xu
Qika Lin
Jing Liu
Lingling Zhang
Tianzhe Zhao
Qianyi Chai
Yudai Pan
21
2
0
06 Dec 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
40
19
0
27 Sep 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
22
191
0
09 Aug 2021
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
167
100
0
29 Apr 2021
Adaptive Offline Quintuplet Loss for Image-Text Matching
Tianlang Chen
Jiajun Deng
Jiebo Luo
181
68
0
07 Mar 2020
Anatomy-aware 3D Human Pose Estimation with Bone-based Pose Decomposition
Tianlang Chen
Chen Fang
Xiaohui Shen
Yiheng Zhu
Zhili Chen
Jiebo Luo
3DH
MedIm
27
23
0
24 Feb 2020
Residual Knowledge Distillation
Mengya Gao
Yujun Shen
Quanquan Li
Chen Change Loy
22
28
0
21 Feb 2020
Multi-modal Deep Analysis for Multimedia
Wenwu Zhu
Xin Eric Wang
Hongzhi Li
29
38
0
11 Oct 2019
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
36
59
0
26 Sep 2019
OmniNet: A unified architecture for multi-modal multi-task learning
Subhojeet Pramanik
Priyanka Agrawal
A. Hussain
27
41
0
17 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
36
798
0
25 Jun 2019
Question Guided Modular Routing Networks for Visual Question Answering
Yanze Wu
Qiang Sun
Jianqi Ma
Bin Li
Yanwei Fu
Yao Peng
Xiangyang Xue
23
1
0
17 Apr 2019
Dual Attention Networks for Visual Reference Resolution in Visual Dialog
Gi-Cheon Kang
Jaeseo Lim
Byoung-Tak Zhang
22
72
0
25 Feb 2019
Conditional Transfer with Dense Residual Attention: Synthesizing traffic signs from street-view imagery
Clint Sebastian
R. Uittenbogaard
J. Vijverberg
B. Boom
Peter H. N. de With
ViT
23
7
0
05 Sep 2018
Learning Visual Knowledge Memory Networks for Visual Question Answering
Zhou Su
Chen Zhu
Yinpeng Dong
Dongqi Cai
Yurong Chen
Jianguo Li
34
62
0
13 Jun 2018
GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story Generation
Taehyeong Kim
Min-Oh Heo
Seonil Son
Kyoung-Wha Park
Byoung-Tak Zhang
31
75
0
28 May 2018
R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering
Pan Lu
Lei Ji
Wei Zhang
Nan Duan
M. Zhou
Jianyong Wang
CoGe
25
79
0
24 May 2018
Deep Multimodal Subspace Clustering Networks
Mahdi Abavisani
Vishal M. Patel
33
163
0
17 Apr 2018
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Duy-Kien Nguyen
Takayuki Okatani
30
279
0
03 Apr 2018
Motion-Appearance Co-Memory Networks for Video Question Answering
J. Gao
Runzhou Ge
Kan Chen
Ram Nevatia
41
240
0
29 Mar 2018
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
51
426
0
23 Mar 2018
Dual Recurrent Attention Units for Visual Question Answering
Ahmed Osman
Wojciech Samek
36
30
0
01 Feb 2018
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal
Dhruv Batra
Devi Parikh
Aniruddha Kembhavi
OOD
76
582
0
01 Dec 2017
Exploring Human-like Attention Supervision in Visual Question Answering
Tingting Qiao
Jianfeng Dong
Duanqing Xu
19
104
0
19 Sep 2017
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Chuang Gan
Yandong Li
Haoxiang Li
Chen Sun
Boqing Gong
27
126
0
15 Aug 2017
Structured Attentions for Visual Question Answering
Chen Zhu
Yanpeng Zhao
Shuaiyi Huang
Kewei Tu
Yi Ma
FAtt
32
106
0
07 Aug 2017
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering
Zhou Yu
Jun-chen Yu
Jianping Fan
Dacheng Tao
41
663
0
04 Aug 2017
DeepStory: Video Story QA by Deep Embedded Memory Networks
Kyung-Min Kim
Min-Oh Heo
Seongho Choi
Byoung-Tak Zhang
26
174
0
04 Jul 2017
Modulating early visual processing by language
H. D. Vries
Florian Strub
Jérémie Mary
Hugo Larochelle
Olivier Pietquin
Aaron Courville
31
484
0
02 Jul 2017
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
67
578
0
18 May 2017
The Promise of Premise: Harnessing Question Premises in Visual Question Answering
Aroma Mahendru
Viraj Prabhu
Akrit Mohapatra
Dhruv Batra
Stefan Lee
NAI
37
38
0
01 May 2017
Residual Attention Network for Image Classification
Fei Wang
Mengqing Jiang
Chao Qian
Shuo Yang
Cheng Li
Honggang Zhang
Xiaogang Wang
Xiaoou Tang
66
3,289
0
23 Apr 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
34
547
0
14 Apr 2017
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering
V. Kazemi
Ali Elqursh
OOD
28
184
0
11 Apr 2017
An Analysis of Visual Question Answering Algorithms
Kushal Kafle
Christopher Kanan
30
231
0
28 Mar 2017
Recurrent Multimodal Interaction for Referring Image Segmentation
Chenxi Liu
Zhe Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Alan Yuille
EgoV
36
234
0
23 Mar 2017
Task-driven Visual Saliency and Attention-based Visual Question Answering
Yuetan Lin
Zhangyang Pang
Donghui Wang
Yueting Zhuang
35
26
0
22 Feb 2017
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
Peng Wang
Qi Wu
Chunhua Shen
Anton Van Den Hengel
OOD
39
86
0
16 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
155
3,136
0
02 Dec 2016
Dual Attention Networks for Multimodal Reasoning and Matching
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
45
664
0
02 Nov 2016
1
2
Next