Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.06890
Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
20 December 2016
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"
50 / 1,475 papers shown
Title
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
Baoxiong Jia
Ting Lei
Song-Chun Zhu
Siyuan Huang
EgoV
37
61
0
08 Oct 2022
Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images
Yafei Yang
Bo Yang
OCL
111
17
0
05 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
40
16
0
05 Oct 2022
RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank
Q. Garrido
Randall Balestriero
Laurent Najman
Yann LeCun
SSL
68
74
0
05 Oct 2022
Differentiable Mathematical Programming for Object-Centric Representation Learning
Adeel Pervez
Phillip Lippe
E. Gavves
OCL
49
5
0
05 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
45
10
0
04 Oct 2022
Extending Compositional Attention Networks for Social Reasoning in Videos
Christina Sartzetaki
Georgios Paraskevopoulos
Alexandros Potamianos
LRM
31
3
0
03 Oct 2022
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach
Georgios Tziafas
Hamidreza Kasaei
LM&Ro
20
3
0
03 Oct 2022
Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation
Xinhang Liu
Jiaben Chen
Huai Yu
Yu-Wing Tai
Chi-Keung Tang
95
28
0
02 Oct 2022
Multimodal Analogical Reasoning over Knowledge Graphs
Ningyu Zhang
Lei Li
Xiang Chen
Xiaozhuan Liang
Shumin Deng
Huajun Chen
64
26
0
01 Oct 2022
Compositional Semantic Parsing with Large Language Models
Andrew Drozdov
Nathanael Scharli
Ekin Akyuurek
Nathan Scales
Xinying Song
Xinyun Chen
Olivier Bousquet
Denny Zhou
ReLM
LRM
208
92
0
29 Sep 2022
A Multiagent Framework for the Asynchronous and Collaborative Extension of Multitask ML Systems
Andrea Gesmundo
31
2
0
29 Sep 2022
On the visual analytic intelligence of neural networks
Stanislaw Wo'zniak
Hlynur Jónsson
G. Cherubini
A. Pantazi
E. Eleftheriou
25
0
0
28 Sep 2022
Towards Faithful Model Explanation in NLP: A Survey
Qing Lyu
Marianna Apidianaki
Chris Callison-Burch
XAI
120
110
0
22 Sep 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
44
21
0
21 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
211
1,134
0
20 Sep 2022
A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems
Andrea Gesmundo
23
18
0
15 Sep 2022
The Embeddings World and Artificial General Intelligence
M. H. Chehreghani
19
1
0
14 Sep 2022
StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation
A. Maharana
Darryl Hannan
Joey Tianyi Zhou
DiffM
39
78
0
13 Sep 2022
MaXM: Towards Multilingual Visual Question Answering
Soravit Changpinyo
Linting Xue
Michal Yarom
Ashish V. Thapliyal
Idan Szpektor
J. Amelot
Xi Chen
Radu Soricut
33
8
0
12 Sep 2022
Ask Before You Act: Generalising to Novel Environments by Asking Questions
Ross Murphy
S. Mosesov
Javier Leguina Peral
Thymo ter Doest
LRM
32
0
0
10 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
20
63
0
07 Sep 2022
Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit
G. Sejnova
M. Vavrecka
Karla Stepanova
VGen
33
0
0
07 Sep 2022
Trust in Language Grounding: a new AI challenge for human-robot teams
David M. Bossens
C. Evers
44
1
0
05 Sep 2022
Injecting Image Details into CLIP's Feature Space
Zilun Zhang
Cuifeng Shen
Yuan-Chung Shen
Huixin Xiong
Xinyu Zhou
VLM
CLIP
32
0
0
31 Aug 2022
Shaken, and Stirred: Long-Range Dependencies Enable Robust Outlier Detection with PixelCNN++
Barath Mohan Umapathi
Kushal Chauhan
Pradeep Shenoy
D. Sridharan
42
0
0
29 Aug 2022
LogicRank: Logic Induced Reranking for Generative Text-to-Image Systems
Bjorn Deiseroth
P. Schramowski
Hikaru Shindo
Devendra Singh Dhami
Kristian Kersting
EGVM
DiffM
24
1
0
29 Aug 2022
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task
Stan Weixian Lei
Difei Gao
Jay Zhangjie Wu
Yuxuan Wang
Wei Liu
Meng Zhang
Mike Zheng Shou
25
35
0
24 Aug 2022
Neuro-Symbolic Visual Dialog
Adnen Abdessaied
Mihai Bâce
Andreas Bulling
NAI
21
3
0
22 Aug 2022
ILLUME: Rationalizing Vision-Language Models through Human Interactions
Manuel Brack
P. Schramowski
Bjorn Deiseroth
Kristian Kersting
VLM
MLLM
27
3
0
17 Aug 2022
Patching open-vocabulary models by interpolating weights
Gabriel Ilharco
Mitchell Wortsman
S. Gadre
Shuran Song
Hannaneh Hajishirzi
Simon Kornblith
Ali Farhadi
Ludwig Schmidt
VLM
KELM
37
169
0
10 Aug 2022
CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning
Adam Dahlgren Lindström
Savitha Sam Abraham
19
50
0
10 Aug 2022
ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding
Bingning Wang
Feiya Lv
Ting Yao
Yiming Yuan
Jin Ma
Yu Luo
Haijin Liang
31
3
0
05 Aug 2022
Generative Bias for Robust Visual Question Answering
Jae-Won Cho
Dong-Jin Kim
H. Ryu
In So Kweon
OOD
CML
41
19
0
01 Aug 2022
Testing Relational Understanding in Text-Guided Image Generation
C. Conwell
T. Ullman
EGVM
160
65
0
29 Jul 2022
DoRO: Disambiguation of referred object for embodied agents
Pradip Pramanick
Chayan Sarkar
S. Paul
R. Roychoudhury
Brojeshwar Bhowmick
LM&Ro
20
14
0
28 Jul 2022
Unit Testing for Concepts in Neural Networks
Charles Lovering
Ellie Pavlick
25
28
0
28 Jul 2022
Break and Make: Interactive Structural Understanding Using LEGO Bricks
Aaron Walsman
Muru Zhang
Klemen Kotar
Karthik Desingh
Ali Farhadi
Dieter Fox
40
10
0
27 Jul 2022
Neural Groundplans: Persistent Neural Scene Representations from a Single Image
Prafull Sharma
A. Tewari
Yilun Du
Sergey Zakharov
Rares Andrei Ambrus
Adrien Gaidon
William T. Freeman
F. Durand
J. Tenenbaum
Vincent Sitzmann
SSL
OCL
29
16
0
22 Jul 2022
The Neural Race Reduction: Dynamics of Abstraction in Gated Networks
Andrew M. Saxe
Shagun Sodhani
Sam Lewallen
AI4CE
34
34
0
21 Jul 2022
Semantic-aware Modular Capsule Routing for Visual Question Answering
Yudong Han
Jianhua Yin
Jianlong Wu
Yin-wei Wei
Liqiang Nie
35
7
0
21 Jul 2022
Semantic uncertainty intervals for disentangled latent spaces
S. Sankaranarayanan
Anastasios Nikolas Angelopoulos
Stephen Bates
Yaniv Romano
Phillip Isola
UQCV
48
21
0
20 Jul 2022
Rethinking Data Augmentation for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Jun Xiao
OOD
40
42
0
18 Jul 2022
Semantic Novelty Detection via Relational Reasoning
Francesco Cappio Borlino
S. Bucci
Tatiana Tommasi
17
4
0
18 Jul 2022
Sparse Relational Reasoning with Object-Centric Representations
Alex F Spies
Alessandra Russo
Murray Shanahan
OCL
NAI
25
3
0
15 Jul 2022
Convolutional Bypasses Are Better Vision Transformer Adapters
Shibo Jie
Zhi-Hong Deng
VPVLM
21
132
0
14 Jul 2022
3D Concept Grounding on Neural Fields
Yining Hong
Yilun Du
Chun-Tse Lin
J. Tenenbaum
Chuang Gan
31
19
0
13 Jul 2022
Fine-grained Activities of People Worldwide
J. Byrne
Greg Castañón
Zhongheng Li
G. Ettinger
24
3
0
11 Jul 2022
CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination
Hyounghun Kim
Abhaysinh Zala
Joey Tianyi Zhou
22
6
0
08 Jul 2022
Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning
Kyra Ahrens
Matthias Kerzel
Jae Hee Lee
C. Weber
S. Wermter
21
0
0
06 Jul 2022
Previous
1
2
3
...
13
14
15
...
28
29
30
Next