Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.06890
Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
20 December 2016
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"
50 / 1,475 papers shown
Title
OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning
Yin-Tao Huang
Tonglin Chen
Zhimeng Shen
Jinghao Huang
Bin Li
Xiangyang Xue
OCL
40
1
0
16 Jun 2023
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Rabiul Awal
Le Zhang
Aishwarya Agrawal
LRM
48
12
0
16 Jun 2023
Modularity Trumps Invariance for Compositional Robustness
I. Mason
Anirban Sarkar
Tomotake Sasaki
Xavier Boix
OOD
26
1
0
15 Jun 2023
LOVM: Language-Only Vision Model Selection
O. Zohar
Shih-Cheng Huang
Kuan-Chieh Wang
Serena Yeung
MLLM
47
13
0
15 Jun 2023
Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment
Royi Rassin
Eran Hirsch
Daniel Glickman
Shauli Ravfogel
Yoav Goldberg
Gal Chechik
DiffM
48
100
0
15 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
46
7
0
14 Jun 2023
Scalable Neural-Probabilistic Answer Set Programming
Arseny Skryagin
Daniel Ochs
Devendra Singh Dhami
Kristian Kersting
47
5
0
14 Jun 2023
Urania: Visualizing Data Analysis Pipelines for Natural Language-Based Data Exploration
Yi Guo
Nana Cao
Xiaoyu Qi
Haoyang Li
Danqing Shi
Jing Zhang
Qing Chen
Daniel Weiskopf
40
4
0
13 Jun 2023
V-LoL: A Diagnostic Dataset for Visual Logical Learning
Lukas Helff
Wolfgang Stammer
Hikaru Shindo
Devendra Singh Dhami
Kristian Kersting
NAI
29
3
0
13 Jun 2023
Generating Language Corrections for Teaching Physical Control Tasks
Megha Srivastava
Noah D. Goodman
Dorsa Sadigh
36
5
0
12 Jun 2023
DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles
Tal Daniel
Aviv Tamar
DiffM
35
8
0
09 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
33
22
0
09 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLP
Sandro Pezzelle
29
9
0
08 Jun 2023
M
3
^3
3
IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
...
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
MLLM
VLM
27
115
0
07 Jun 2023
Multimodal Fusion Interactions: A Study of Human and Automatic Quantification
Paul Pu Liang
Yun Cheng
Ruslan Salakhutdinov
Louis-Philippe Morency
25
6
0
07 Jun 2023
Infusing Lattice Symmetry Priors in Attention Mechanisms for Sample-Efficient Abstract Geometric Reasoning
Mattia Atzeni
Mrinmaya Sachan
Andreas Loukas
LRM
30
3
0
05 Jun 2023
Systematic Visual Reasoning through Object-Centric Relational Abstraction
Taylor Webb
S. S. Mondal
Jonathan D. Cohen
OCL
32
24
0
04 Jun 2023
TimelineQA: A Benchmark for Question Answering over Timelines
W. Tan
Jane Dwivedi-Yu
Yuliang Li
Lambert Mathias
Marzieh Saeidi
J. Yan
A. Halevy
LMTD
37
10
0
01 Jun 2023
MEWL: Few-shot multimodal word learning with referential uncertainty
Guangyuan Jiang
Manjie Xu
Shiji Xin
Weihan Liang
Yujia Peng
Chi Zhang
Yixin Zhu
OffRL
41
16
0
01 Jun 2023
Sensitivity of Slot-Based Object-Centric Models to their Number of Slots
Roland S. Zimmermann
Sjoerd van Steenkiste
Mehdi S. M. Sajjadi
Thomas Kipf
Klaus Greff
OCL
42
5
0
30 May 2023
Autoencoding Conditional Neural Processes for Representation Learning
Victor Prokhorov
Ivan Titov
N. Siddharth
BDL
20
0
0
29 May 2023
Multi-Scale Attention for Audio Question Answering
Guangyao Li
Yixin Xu
Di Hu
30
16
0
29 May 2023
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion
Haobo Yang
Wenyu Wang
Zexin Cao
Zhekai Duan
Xuchen Liu
VLM
29
0
0
28 May 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
Ibrahim Said Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
31
5
0
28 May 2023
Im-Promptu: In-Context Composition from Image Prompts
Bhishma Dedhia
Michael Chang
Jake C. Snell
Thomas Griffiths
N. Jha
LRM
MLLM
37
1
0
26 May 2023
Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets
Brandon Smith
Miguel Farinha
S. Hall
Hannah Rose Kirk
Aleksandar Shtedritski
Max Bain
44
19
0
24 May 2023
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples
P. Sadler
David Schlangen
29
2
0
24 May 2023
Text encoders bottleneck compositionality in contrastive vision-language models
Amita Kamath
Jack Hessel
Kai-Wei Chang
CoGe
CLIP
VLM
35
19
0
24 May 2023
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario
Tianwen Qian
Jingjing Chen
Linhai Zhuo
Yang Jiao
Yueping Jiang
29
138
0
24 May 2023
Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach
Harman Singh
Poorva Garg
M. Gupta
Kevin Shah
Ashish Goswami
A. Mondal
Arnab Kumar Mondal
Dinesh Khandelwal
Dinesh Garg
Parag Singla
LM&Ro
21
1
0
23 May 2023
SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models
Ziyi Wu
Jingyu Hu
Wuyue Lu
Igor Gilitschenski
Animesh Garg
DiffM
OCL
41
45
0
18 May 2023
Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature
Ana Claudia Akemi Matsuki de Faria
Felype de Castro Bastos
Jose Victor Nogueira Alves da Silva
Vitor Lopes Fabris
Valeska Uchôa
Décio Gonccalves de Aguiar Neto
C. F. G. Santos
35
23
0
18 May 2023
Probing the Role of Positional Information in Vision-Language Models
Philipp J. Rösch
Jindrich Libovický
24
8
0
17 May 2023
HICO-DET-SG and V-COCO-SG: New Data Splits for Evaluating the Systematic Generalization Performance of Human-Object Interaction Detection Models
Kenta Takemoto
Moyuru Yamada
Tomotake Sasaki
H. Akima
42
0
0
17 May 2023
Motion Question Answering via Modular Motion Programs
Mark Endo
Joy Hsu
Jiaman Li
Jiajun Wu
LRM
30
14
0
15 May 2023
Neurosymbolic AI and its Taxonomy: a survey
Wandemberg Gibaut
Leonardo Pereira
Fabio Grassiotto
Alexandre Osorio
Eder Gadioli
Amparo Munoz
Sildolfo Gomes
Claudio dos Santos
NAI
AI4CE
37
5
0
12 May 2023
A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information
Vladimir Araujo
Alvaro Soto
Marie-Francine Moens
KELM
22
2
0
12 May 2023
Combo of Thinking and Observing for Outside-Knowledge VQA
Q. Si
Yuchen Mo
Zheng Lin
Huishan Ji
Weiping Wang
51
13
0
10 May 2023
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
T. Gong
Chengqi Lyu
Shilong Zhang
Yudong Wang
Miao Zheng
Qianmengke Zhao
Kuikun Liu
Wenwei Zhang
Ping Luo
Kai-xiang Chen
MLLM
34
254
0
08 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
41
35
0
05 May 2023
Continual Reasoning: Non-Monotonic Reasoning in Neurosymbolic AI using Continual Learning
Sofoklis Kyriakopoulos
Artur Garcez
NAI
LRM
26
0
0
03 May 2023
Visual Transformation Telling
Wanqing Cui
Mustafa Nasir-Moin
Yanyan Lan
Viola J. Chen
Jiafeng Guo
Xueqi Cheng
LRM
67
1
0
03 May 2023
Visual Reasoning: from State to Transformation
Xin Hong
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
27
4
0
02 May 2023
Multimodal Graph Transformer for Multimodal Question Answering
Xuehai He
Xin Eric Wang
41
7
0
30 Apr 2023
Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement
N. Gkanatsios
Ayush Jain
Zhou Xian
Yunchu Zhang
C. Atkeson
Katerina Fragkiadaki
LM&Ro
98
31
0
27 Apr 2023
DataComp: In search of the next generation of multimodal datasets
S. Gadre
Gabriel Ilharco
Alex Fang
J. Hayase
Georgios Smyrnis
...
A. Dimakis
J. Jitsev
Y. Carmon
Vaishaal Shankar
Ludwig Schmidt
VLM
33
415
0
27 Apr 2023
PVP: Pre-trained Visual Parameter-Efficient Tuning
Zhao Song
Ke Yang
Naiyang Guan
Junjie Zhu
Peng Qiao
Qingyong Hu
VPVLM
VLM
40
3
0
26 Apr 2023
Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models
Jason J. Yu
Fereshteh Forghani
Konstantinos G. Derpanis
Marcus A. Brubaker
DiffM
37
45
0
21 Apr 2023
Hyperbolic Image-Text Representations
Karan Desai
Maximilian Nickel
Tanmay Rajpurohit
Justin Johnson
Ramakrishna Vedantam
VLM
47
57
0
18 Apr 2023
Learning Situation Hyper-Graphs for Video Question Answering
Aisha Urooj Khan
Hilde Kuehne
Bo Wu
Kim Chheu
Walid Bousselham
Chuang Gan
N. Lobo
M. Shah
41
15
0
18 Apr 2023
Previous
1
2
3
...
9
10
11
...
28
29
30
Next