Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.06890
Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
20 December 2016
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"
50 / 1,475 papers shown
Title
Generalization and Robustness Implications in Object-Centric Learning
Andrea Dittadi
Samuele Papa
Michele De Vita
Bernhard Schölkopf
Ole Winther
Francesco Locatello
OCL
OOD
27
74
0
01 Jul 2021
Weakly Supervised Temporal Adjacent Network for Language Grounding
Yuechen Wang
Jiajun Deng
Wen-gang Zhou
Houqiang Li
31
67
0
30 Jun 2021
Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue
Shoya Matsumori
Kosuke Shingyouchi
Yukikoko Abe
Yosuke Fukuchi
K. Sugiura
M. Imai
44
16
0
29 Jun 2021
Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference
Riko Suzuki
Hitomi Yanaka
K. Mineshima
D. Bekki
VGen
MLLM
21
1
0
27 Jun 2021
CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning
Daniel J. McDuff
Yale Song
Jiyoung Lee
Vibhav Vineet
Sai H. Vemprala
N. Gyde
Hadi Salman
Shuang Ma
Kwanghoon Sohn
Ashish Kapoor
CML
33
28
0
25 Jun 2021
Leveraging Language to Learn Program Abstractions and Search Heuristics
Catherine Wong
Kevin Ellis
J. Tenenbaum
Jacob Andreas
27
54
0
18 Jun 2021
Grounding Spatio-Temporal Language with Transformers
Tristan Karch
Laetitia Teodorescu
Katja Hofmann
Clément Moulin-Frier
Pierre-Yves Oudeyer
LM&Ro
27
11
0
16 Jun 2021
How Modular Should Neural Module Networks Be for Systematic Generalization?
Vanessa D’Amario
Tomotake Sasaki
Xavier Boix
15
17
0
15 Jun 2021
Styleformer: Transformer based Generative Adversarial Networks with Style Vector
Jeeseung Park
Younggeun Kim
ViT
29
48
0
13 Jun 2021
NAAQA: A Neural Architecture for Acoustic Question Answering
Jerome Abdelnour
Jean Rouat
G. Salvi
6
4
0
11 Jun 2021
Learning to See by Looking at Noise
Manel Baradad
Jonas Wulff
Tongzhou Wang
Phillip Isola
Antonio Torralba
33
89
0
10 Jun 2021
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
29
579
0
10 Jun 2021
Group Equivariant Subsampling
Jin Xu
Hyunjik Kim
Tom Rainforth
Yee Whye Teh
26
21
0
10 Jun 2021
Spatially Invariant Unsupervised 3D Object-Centric Learning and Scene Decomposition
Tianyu Wang
Miaomiao Liu
K. S. Ng
3DPC
OCL
42
2
0
10 Jun 2021
Supervising the Transfer of Reasoning Patterns in VQA
Corentin Kervadec
Christian Wolf
G. Antipov
M. Baccouche
Madiha Nadri Wolf
35
10
0
10 Jun 2021
Simulated Adversarial Testing of Face Recognition Models
Nataniel Ruiz
Adam Kortylewski
Weichao Qiu
Cihang Xie
Sarah Adel Bargal
Alan Yuille
Stan Sclaroff
AAML
CVBM
27
15
0
08 Jun 2021
Prediction or Comparison: Toward Interpretable Qualitative Reasoning
Mucheng Ren
Heyan Huang
Yang Gao
LRM
29
0
0
04 Jun 2021
Human-Adversarial Visual Question Answering
Sasha Sheng
Amanpreet Singh
Vedanuj Goswami
Jose Alberto Lopez Magana
Wojciech Galuba
Devi Parikh
Douwe Kiela
OOD
EgoV
AAML
26
60
0
04 Jun 2021
Grounding Complex Navigational Instructions Using Scene Graphs
Michiel de Jong
Satyapriya Krishna
Anuva Agarwal
LM&Ro
15
0
0
03 Jun 2021
Independent Prototype Propagation for Zero-Shot Compositionality
Frank Ruis
Gertjan J. Burghouts
Doina Bucur
22
53
0
01 Jun 2021
On Compositional Generalization of Neural Machine Translation
Yafu Li
Yongjing Yin
Yulong Chen
Yue Zhang
164
45
0
31 May 2021
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning
Jiaqi Chen
Jianheng Tang
Jinghui Qin
Xiaodan Liang
Lingbo Liu
Eric Xing
Liang Lin
AIMat
22
160
0
30 May 2021
TexRel: a Green Family of Datasets for Emergent Communications on Relations
Hugh Perkins
37
2
0
26 May 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
28
55
0
24 May 2021
Inclusion of Domain-Knowledge into GNNs using Mode-Directed Inverse Entailment
T. Dash
A. Srinivasan
A. Baskar
41
13
0
22 May 2021
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
34
140
0
17 May 2021
Show Why the Answer is Correct! Towards Explainable AI using Compositional Temporal Attention
Nihar Bendre
K. Desai
Peyman Najafirad
CoGe
31
6
0
15 May 2021
Designing Multimodal Datasets for NLP Challenges
James Pustejovsky
E. Holderness
Jingxuan Tu
Parker Glenn
Kyeongmin Rim
Kelley Lynch
R. Brutti
31
5
0
12 May 2021
Image interpretation by iterative bottom-up top-down processing
S. Ullman
Liav Assif
Alona Strugatski
B. Vatashsky
Hila Levy
Aviv Netanyahu
A. Yaari
27
5
0
12 May 2021
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
Aisha Urooj Khan
Hilde Kuehne
Kevin Duarte
Chuang Gan
N. Lobo
M. Shah
23
36
0
11 May 2021
Autoencoder Based Inter-Vehicle Generalization for In-Cabin Occupant Classification
S. Cruz
B. Taetz
Oliver Wasenmüller
Thomas Stifter
D. Stricker
10
4
0
07 May 2021
Iterated learning for emergent systematicity in VQA
Ankit Vani
Max Schwarzer
Yucheng Lu
Eeshan Gunesh Dhekane
Rameswar Panda
72
24
0
03 May 2021
A survey on VQA_Datasets and Approaches
Yeyun Zou
Qiyu Xie
50
18
0
02 May 2021
Unsupervised Layered Image Decomposition into Object Prototypes
Tom Monnier
Elliot Vincent
Jean Ponce
Mathieu Aubry
OCL
16
53
0
29 Apr 2021
A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations
Varun Nagaraj Rao
Xingjian Zhen
K. Hovsepian
Mingwei Shen
37
18
0
29 Apr 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
93
864
0
26 Apr 2021
Ask & Explore: Grounded Question Answering for Curiosity-Driven Exploration
Jivat Neet Kaur
Yiding Jiang
Paul Pu Liang
LRM
16
2
0
24 Apr 2021
Weakly-supervised Multi-task Learning for Multimodal Affect Recognition
Wenliang Dai
Samuel Cahyawijaya
Yejin Bang
Pascale Fung
CVBM
41
11
0
23 Apr 2021
Towards Solving Multimodal Comprehension
Pritish Sahu
Karan Sikka
Ajay Divakaran
11
2
0
20 Apr 2021
Constrained Language Models Yield Few-Shot Semantic Parsers
Richard Shin
C. H. Lin
Sam Thomson
Charles C. Chen
Subhro Roy
Emmanouil Antonios Platanios
Adam Pauls
Dan Klein
J. Eisner
Benjamin Van Durme
311
199
0
18 Apr 2021
Question Decomposition with Dependency Graphs
Matan Hasson
Jonathan Berant
GNN
42
9
0
17 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
33
19
0
16 Apr 2021
Self-supervised Video Object Segmentation by Motion Grouping
Charig Yang
Hala Lamdouar
Erika Lu
Andrew Zisserman
Weidi Xie
VOS
OCL
30
157
0
15 Apr 2021
Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention
Leon Bergen
Dzmitry Bahdanau
Timothy J. O'Donnell
FedML
20
1
0
14 Apr 2021
Neuro-Symbolic VQA: A review from the perspective of AGI desiderata
Ian Berlot-Attwell
16
3
0
13 Apr 2021
MultiModalQA: Complex Question Answering over Text, Tables and Images
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
32
150
0
13 Apr 2021
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images
Shailaja Keyur Sampat
Akshay Kumar
Yezhou Yang
Chitta Baral
29
26
0
13 Apr 2021
SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning
Roshanak Mirzaee
Hossein Rajaby Faghihi
Qiang Ning
Parisa Kordjmashidi
26
77
0
12 Apr 2021
Object-Centric Representation Learning for Video Question Answering
Long Hoang Dang
T. Le
Vuong Le
T. Tran
27
7
0
12 Apr 2021
Towards a Collective Agenda on AI for Earth Science Data Analysis
D. Tuia
R. Roscher
Jan Dirk Wegner
Nathan Jacobs
Xiaoxiang Zhu
Gustau Camps-Valls
AI4CE
44
68
0
11 Apr 2021
Previous
1
2
3
...
18
19
20
...
28
29
30
Next