Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1708.02711
Cited By
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
9 August 2017
Damien Teney
Peter Anderson
Xiaodong He
A. Hengel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge"
50 / 55 papers shown
Title
Hadamard product in deep learning: Introduction, Advances and Challenges
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
V. Cevher
AAML
98
0
0
17 Apr 2025
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Minheng Ni
Yutao Fan
Lei Zhang
Wangmeng Zuo
LRM
AI4CE
31
6
0
04 Oct 2024
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Edison Marrese-Taylor
Hamed Damirchi
A. Hengel
VLM
40
1
0
27 May 2024
Convincing Rationales for Visual Question Answering Reasoning
Kun Li
G. Vosselman
Michael Ying Yang
44
1
0
06 Feb 2024
LOIS: Looking Out of Instance Semantics for Visual Question Answering
Siyu Zhang
Ye Chen
Yaoru Sun
Fang Wang
Haibo Shi
Haoran Wang
25
4
0
26 Jul 2023
Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature
Ana Claudia Akemi Matsuki de Faria
Felype de Castro Bastos
Jose Victor Nogueira Alves da Silva
Vitor Lopes Fabris
Valeska Uchôa
Décio Gonccalves de Aguiar Neto
C. F. G. Santos
30
22
0
18 May 2023
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
27
318
0
01 Dec 2022
Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation
Jiangbin Zheng
Siyuan Li
Cheng Tan
Chong Wu
Yidong Chen
Stan Z. Li
SLR
24
7
0
01 Nov 2022
Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning
Damien Teney
Maxime Peyrard
Ehsan Abbasnejad
35
29
0
06 Jul 2022
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
35
68
0
02 Jun 2022
Learning to Retrieve Videos by Asking Questions
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
30
16
0
11 May 2022
Optimal quadratic binding for relational reasoning in vector symbolic neural architectures
Naoki Hiratani
H. Sompolinsky
27
5
0
14 Apr 2022
ICDAR 2021 Competition on Document VisualQuestion Answering
Rubèn Pérez Tito
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
35
23
0
10 Nov 2021
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
16
2
0
09 Sep 2021
Auto-Parsing Network for Image Captioning and Visual Question Answering
Xu Yang
Chongyang Gao
Hanwang Zhang
Jianfei Cai
22
35
0
24 Aug 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
19
192
0
09 Aug 2021
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich
F. Putze
Tanja Schultz
22
2
0
28 Jun 2021
Contextual Dropout: An Efficient Sample-Dependent Dropout Module
Xinjie Fan
Shujian Zhang
Korawat Tanwisuth
Xiaoning Qian
Mingyuan Zhou
OOD
BDL
UQCV
27
27
0
06 Mar 2021
Causal Attention for Vision-Language Tasks
Xu Yang
Hanwang Zhang
Guojun Qi
Jianfei Cai
CML
28
148
0
05 Mar 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
79
110
0
31 Jan 2021
Latent Variable Models for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
25
5
0
16 Jan 2021
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao
Bailin Li
Xiaodan Liang
Keze Wang
Liang Lin
44
36
0
14 Dec 2020
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
13
42
0
04 Nov 2020
Bayesian Attention Modules
Xinjie Fan
Shujian Zhang
Bo Chen
Mingyuan Zhou
114
59
0
20 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei-Neng Chen
Weiping Wang
Li Liu
M. Lew
VLM
112
31
0
16 Oct 2020
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Michael Cogswell
Jiasen Lu
Rishabh Jain
Stefan Lee
Devi Parikh
Dhruv Batra
VLM
EgoV
28
15
0
24 Jul 2020
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
Yongjing Yin
Fandong Meng
Jinsong Su
Chulun Zhou
Zhengyuan Yang
Jie Zhou
Jiebo Luo
25
138
0
17 Jul 2020
Cross-Modality Relevance for Reasoning on Language and Vision
Chen Zheng
Quan Guo
Parisa Kordjamshidi
LRM
35
36
0
12 May 2020
History for Visual Dialog: Do we really need it?
Shubham Agarwal
Trung Bui
Joon-Young Lee
Ioannis Konstas
Verena Rieser
VLM
11
69
0
08 May 2020
Explainable Deep Learning: A Field Guide for the Uninitiated
Gabrielle Ras
Ning Xie
Marcel van Gerven
Derek Doran
AAML
XAI
38
370
0
30 Apr 2020
A Study on Multimodal and Interactive Explanations for Visual Question Answering
Kamran Alipour
J. Schulze
Yi Yao
Avi Ziskind
Giedrius Burachas
24
27
0
01 Mar 2020
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
24
60
0
07 Dec 2019
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
33
59
0
26 Sep 2019
Why Does a Visual Question Have Different Answers?
Nilavra Bhattacharya
Qing Li
Danna Gurari
23
65
0
12 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
19
38
0
12 Aug 2019
Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection
Keren Ye
Mingda Zhang
Adriana Kovashka
Wei Li
Danfeng Qin
Jesse Berent
25
60
0
23 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
22
796
0
25 Jun 2019
Self-Critical Reasoning for Robust Visual Question Answering
Jialin Wu
Raymond J. Mooney
OOD
NAI
24
159
0
24 May 2019
Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng
Wenguan Wang
Siyuan Qi
Song-Chun Zhu
39
117
0
11 Apr 2019
Learning To Follow Directions in Street View
Karl Moritz Hermann
Mateusz Malinowski
Piotr Wojciech Mirowski
Andras Banki-Horvath
Keith Anderson
R. Hadsell
SSL
16
66
0
01 Mar 2019
Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha
Kushal Kafle
Christopher Kanan
17
82
0
01 Mar 2019
Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention
Shalini Ghosh
Giedrius Burachas
Arijit Ray
Avi Ziskind
11
65
0
15 Feb 2019
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
Zhe Gan
Yu Cheng
Ahmed El Kholy
Linjie Li
Jingjing Liu
Jianfeng Gao
11
104
0
01 Feb 2019
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
CoGe
31
322
0
20 Jan 2019
Fusion Strategies for Learning User Embeddings with Neural Networks
Philipp Blandfort
Tushar Karayil
Federico Raue
Jörn Hees
Andreas Dengel
FedML
22
9
0
08 Jan 2019
Multi-task Learning of Hierarchical Vision-Language Representation
Duy-Kien Nguyen
Takayuki Okatani
23
51
0
03 Dec 2018
Interpretable Visual Question Answering by Reasoning on Dependency Trees
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
27
55
0
06 Sep 2018
Learning Conditioned Graph Structures for Interpretable Visual Question Answering
Will Norcliffe-Brown
Efstathios Vafeias
Sarah Parisot
GNN
13
236
0
19 Jun 2018
Learning Visual Knowledge Memory Networks for Visual Question Answering
Zhou Su
Chen Zhu
Yinpeng Dong
Dongqi Cai
Yurong Chen
Jianguo Li
34
62
0
13 Jun 2018
Joint Image Captioning and Question Answering
Jialin Wu
Zeyuan Hu
Raymond J. Mooney
24
12
0
22 May 2018
1
2
Next