Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.06890
Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
20 December 2016
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"
50 / 1,475 papers shown
Title
A Data Source for Reasoning Embodied Agents
Jack Lanchantin
Sainbayar Sukhbaatar
Gabriel Synnaeve
Yuxuan Sun
Kavya Srinet
Arthur Szlam
LM&Ro
LRM
35
5
0
14 Sep 2023
Dynamic MOdularized Reasoning for Compositional Structured Explanation Generation
Xiyan Fu
Anette Frank
LRM
41
1
0
14 Sep 2023
Hydra: Multi-head Low-rank Adaptation for Parameter Efficient Fine-tuning
Sanghyeon Kim
Hyunmo Yang
Younghyun Kim
Youngjoon Hong
Eunbyung Park
AI4CE
32
16
0
13 Sep 2023
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
Palaash Agrawal
Haidi Azaman
Cheston Tan
56
3
0
13 Sep 2023
Compositional Learning of Visually-Grounded Concepts Using Reinforcement
Zijun Lin
Haidi Azaman
M Ganesh Kumar
Cheston Tan
CoGe
OffRL
30
3
0
08 Sep 2023
DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners
Clarence Lee
M Ganesh Kumar
Cheston Tan
28
3
0
07 Sep 2023
Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding
Cheng Shi
Sibei Yang
LRM
29
6
0
03 Sep 2023
Iterative Multi-granular Image Editing using Diffusion Models
K. J. Joseph
Prateksha Udhayanan
Tripti Shukla
Aishwarya Agarwal
Srikrishna Karanam
Koustava Goswami
Balaji Vasan Srinivasan
DiffM
38
16
0
01 Sep 2023
RobustCLEVR: A Benchmark and Framework for Evaluating Robustness in Object-centric Learning
Nathan G. Drenkow
Mathias Unberath
38
5
0
28 Aug 2023
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
Emanuele Bugliarello
Hernan Moraldo
Ruben Villegas
Mohammad Babaeizadeh
M. Saffar
Han Zhang
D. Erhan
V. Ferrari
Pieter-Jan Kindermans
P. Voigtlaender
VGen
43
10
0
22 Aug 2023
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models
Navid Rajabi
Jana Kosecka
VLM
34
11
0
18 Aug 2023
Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning
Pengbo Hu
Jingxian Qi
Xingyu Li
Hong Li
Xinqi Wang
Bing Quan
Ruiyu Wang
Yi Zhou
LRM
LLMAG
38
15
0
18 Aug 2023
Learning the meanings of function words from grounded language using a visual question answering model
Eva Portelance
Michael C. Frank
Dan Jurafsky
NAI
38
7
0
16 Aug 2023
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
Kaicheng Yang
Jiankang Deng
Xiang An
Jiawei Li
Ziyong Feng
Jia Guo
Jing Yang
Tongliang Liu
VLM
CLIP
48
46
0
16 Aug 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
36
77
0
12 Aug 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
Fanqing Meng
Wenqi Shao
Zhanglin Peng
Chong Jiang
Kaipeng Zhang
Yu Qiao
Ping Luo
30
13
0
11 Aug 2023
FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods
Robin Hesse
Simone Schaub-Meyer
Stefan Roth
AAML
37
33
0
11 Aug 2023
When and How Does Known Class Help Discover Unknown Ones? Provable Understanding Through Spectral Analysis
Yiyou Sun
Zhenmei Shi
Yingyu Liang
Yixuan Li
45
19
0
09 Aug 2023
PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning
Florian Bordes
Shashank Shekhar
Mark Ibrahim
Diane Bouchacourt
Pascal Vincent
Ari S. Morcos
36
26
0
08 Aug 2023
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation
Xuefeng Hu
Ke Zhang
Lu Xia
Albert Y. C. Chen
Jiajia Luo
...
Nan Qiao
Xiao Zeng
Min Sun
Cheng-Hao Kuo
Ram Nevatia
VLM
27
25
0
04 Aug 2023
Stochastic positional embeddings improve masked image modeling
Amir Bar
Florian Bordes
Assaf Shocher
Mahmoud Assran
Pascal Vincent
Nicolas Ballas
Trevor Darrell
Amir Globerson
Yann LeCun
36
3
0
31 Jul 2023
Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy
Shibo Jie
Haoqing Wang
Zhiwei Deng
32
31
0
31 Jul 2023
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks
Kousik Rajesh
Mrigank Raman
M. A. Karim
Pranit Chawla
VLM
25
2
0
31 Jul 2023
Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering
N. Naik
Christopher Potts
Elisa Kreiss
35
3
0
28 Jul 2023
Testing the Depth of ChatGPT's Comprehension via Cross-Modal Tasks Based on ASCII-Art: GPT3.5's Abilities in Regard to Recognizing and Generating ASCII-Art Are Not Totally Lacking
David Bayani
MLLM
38
5
0
28 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
34
18
0
21 Jul 2023
CLR: Channel-wise Lightweight Reprogramming for Continual Learning
Yunhao Ge
Yuecheng Li
Shuo Ni
Jiaping Zhao
Ming-Hsuan Yang
Laurent Itti
CLL
50
11
0
21 Jul 2023
OBJECT 3DIT: Language-guided 3D-aware Image Editing
Oscar Michel
Anand Bhattad
Eli VanderBilt
Ranjay Krishna
Aniruddha Kembhavi
Tanmay Gupta
DiffM
37
38
0
20 Jul 2023
Improving Multimodal Datasets with Image Captioning
Thao Nguyen
S. Gadre
Gabriel Ilharco
Sewoong Oh
Ludwig Schmidt
VLM
19
71
0
19 Jul 2023
Grounded Object Centric Learning
Avinash Kori
Francesco Locatello
Fabio De Sousa Ribeiro
Francesca Toni
Ben Glocker
OCL
22
7
0
18 Jul 2023
Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP
S. Basu
S. Hu
Maziar Sanjabi
Daniela Massiceti
S. Feizi
VLM
24
4
0
18 Jul 2023
COLLIE: Systematic Construction of Constrained Text Generation Tasks
Shunyu Yao
Howard Chen
Austin W. Hanjie
Runzhe Yang
Karthik Narasimhan
47
32
0
17 Jul 2023
Does Visual Pretraining Help End-to-End Reasoning?
Chen Sun
Calvin Luo
Xingyi Zhou
Anurag Arnab
Cordelia Schmid
OCL
LRM
ViT
43
3
0
17 Jul 2023
Multi-Object Discovery by Low-Dimensional Object Motion
Sadra Safadoust
Fatma Guney
OCL
31
9
0
16 Jul 2023
IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation
Thiviyan Thanapalasingam
Emile van Krieken
Peter Bloem
Paul T. Groth
39
1
0
13 Jul 2023
MMBench: Is Your Multi-modal Model an All-around Player?
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Bo Li
Songyang Zhang
...
Jiaqi Wang
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
29
934
0
12 Jul 2023
Diffusion idea exploration for art generation
N. Verma
DiffM
42
1
0
11 Jul 2023
Compositional Generalization from First Principles
Thaddäus Wiedemer
Prasanna Mayilvahanan
Matthias Bethge
Wieland Brendel
OCL
39
37
0
10 Jul 2023
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery
Yun-Qiu Lv
Jing Zhang
Nick Barnes
Yuchao Dai
36
11
0
07 Jul 2023
Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition
Yuwei Bao
B. Lattimer
J. Chai
CLL
46
1
0
05 Jul 2023
Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation
Sébastien Lachapelle
Divyat Mahajan
Ioannis Mitliagkas
Simon Lacoste-Julien
44
25
0
05 Jul 2023
SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space
Lasha Abzianidze
J. Zwarts
Yoad Winter
27
2
0
05 Jul 2023
Learning Differentiable Logic Programs for Abstract Visual Reasoning
Hikaru Shindo
Viktor Pfanschilling
Devendra Singh Dhami
Kristian Kersting
NAI
39
6
0
03 Jul 2023
The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes
D. Recasens
Martin R. Oswald
Marc Pollefeys
Javier Civera
40
3
0
29 Jun 2023
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
A. S. Penamakuri
Manish Gupta
Mithun Das Gupta
Anand Mishra
45
7
0
29 Jun 2023
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
Avinash Madasu
Vasudev Lal
CoGe
44
3
0
28 Jun 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
44
603
0
27 Jun 2023
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
S. Hall
F. G. Abrantes
Hanwen Zhu
Grace A. Sodunke
Aleksandar Shtedritski
Hannah Rose Kirk
CoGe
39
39
0
21 Jun 2023
Neuro-Symbolic Bi-Directional Translation -- Deep Learning Explainability for Climate Tipping Point Research
C. Ashcraft
Jennifer Sleeman
Caroline Tang
Jay Brett
A. Gnanadesikan
37
1
0
19 Jun 2023
The Psychophysics of Human Three-Dimensional Active Visuospatial Problem-Solving
M. Solbach
John K. Tsotsos
31
0
0
19 Jun 2023
Previous
1
2
3
...
8
9
10
...
28
29
30
Next