Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.14142
Cited By
LOIS: Looking Out of Instance Semantics for Visual Question Answering
26 July 2023
Siyu Zhang
Ye Chen
Yaoru Sun
Fang Wang
Haibo Shi
Haoran Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LOIS: Looking Out of Instance Semantics for Visual Question Answering"
34 / 34 papers shown
Title
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Heng Tao Shen
38
62
0
04 Jun 2022
Grounding Answers for Visual Questions Asked by Visually Impaired People
Chongyan Chen
Samreen Anjum
Danna Gurari
48
50
0
04 Feb 2022
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
308
529
0
04 Feb 2021
Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Q. Tian
Min Zhang
88
69
0
30 Oct 2020
A Novel Attention-based Aggregation Function to Combine Vision and Language
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
34
9
0
27 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
122
437
0
02 Apr 2020
A Question-Centric Model for Visual Question Answering in Medical Imaging
Minh H. Vu
Tommy Löfstedt
T. Nyholm
Raphael Sznitman
MedIm
39
59
0
02 Mar 2020
In Defense of Grid Features for Visual Question Answering
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OOD
ObjD
50
320
0
10 Jan 2020
BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation
Hao Chen
Kunyang Sun
Zhi Tian
Chunhua Shen
Yongming Huang
Youliang Yan
ISeg
115
487
0
02 Jan 2020
PolarMask: Single Shot Instance Segmentation with Polar Representation
Enze Xie
Pei Sun
Xiaoge Song
Wenhai Wang
Ding Liang
Chunhua Shen
Ping Luo
ISeg
67
539
0
29 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
130
1,657
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
214
2,467
0
20 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
120
1,939
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
205
3,659
0
06 Aug 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
79
802
0
25 Jun 2019
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
52
380
0
20 May 2019
Relation-Aware Graph Attention Network for Visual Question Answering
Linjie Li
Zhe Gan
Yu Cheng
Jingjing Liu
GNN
136
343
0
29 Mar 2019
Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha
Kushal Kafle
Christopher Kanan
48
82
0
01 Mar 2019
MUREL: Multimodal Relational Reasoning for Visual Question Answering
Rémi Cadène
H. Ben-younes
Matthieu Cord
Nicolas Thome
LRM
55
273
0
25 Feb 2019
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
H. Ben-younes
Rémi Cadène
Nicolas Thome
Matthieu Cord
43
218
0
31 Jan 2019
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering
Peng Gao
Zhengkai Jiang
Haoxuan You
Pan Lu
Steven C. H. Hoi
Xiaogang Wang
Hongsheng Li
AIMat
69
364
0
13 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.2K
93,936
0
11 Oct 2018
Generative Image Inpainting with Contextual Attention
Jiahui Yu
Zhe Lin
Jimei Yang
Xiaohui Shen
Xin Lu
Thomas S. Huang
GAN
DiffM
77
2,255
0
24 Jan 2018
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal
Dhruv Batra
Devi Parikh
Aniruddha Kembhavi
OOD
136
585
0
01 Dec 2017
Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering
Pan Lu
Hongsheng Li
Wei Zhang
Jianyong Wang
Xiaogang Wang
57
80
0
18 Nov 2017
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
Damien Teney
Peter Anderson
Xiaodong He
Anton Van Den Hengel
86
382
0
09 Aug 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
104
4,201
0
25 Jul 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
304
3,187
0
02 Dec 2016
Visual Question Answering: Datasets, Algorithms, and Future Challenges
Kushal Kafle
Christopher Kanan
OOD
65
238
0
05 Oct 2016
Stacked Attention Networks for Image Question Answering
Zichao Yang
Xiaodong He
Jianfeng Gao
Li Deng
Alex Smola
BDL
101
1,875
0
07 Nov 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
423
61,900
0
04 Jun 2015
Exploring Models and Data for Image Question Answering
Mengye Ren
Ryan Kiros
R. Zemel
80
713
0
08 May 2015
Ask Your Neurons: A Neural-based Approach to Answering Questions about Images
Mateusz Malinowski
Marcus Rohrbach
Mario Fritz
100
597
0
05 May 2015
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
162
5,421
0
03 May 2015
1