Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.06890
Cited By
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
20 December 2016
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning"
50 / 1,475 papers shown
Title
Mini but Mighty: Finetuning ViTs with Mini Adapters
Imad Eddine Marouf
Enzo Tartaglione
Stéphane Lathuilière
41
5
0
07 Nov 2023
A Graph-Theoretic Framework for Understanding Open-World Semi-Supervised Learning
Yiyou Sun
Zhenmei Shi
Yixuan Li
OffRL
43
20
0
06 Nov 2023
PRISM: Progressive Restoration for Scene Graph-based Image Manipulation
Pavel Jahoda
Azade Farshad
Yousef Yeganeh
Ehsan Adeli
Nassir Navab
DiffM
25
2
0
03 Nov 2023
Active Reasoning in an Open-World Environment
Manjie Xu
Guangyuan Jiang
Weihan Liang
Chi Zhang
Yixin Zhu
LLMAG
LRM
23
10
0
03 Nov 2023
ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos
Te-Lin Wu
Zi-Yi Dou
Qingyuan Hu
Yu Hou
Nischal Reddy Chandra
Marjorie Freedman
R. Weischedel
Nanyun Peng
44
5
0
02 Nov 2023
VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization
Suraj Jyothi Unni
Raha Moraffah
Huan Liu
43
2
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
56
36
0
01 Nov 2023
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Ao Ma
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
43
15
0
30 Oct 2023
LILO: Learning Interpretable Libraries by Compressing and Documenting Code
Gabriel Grand
L. Wong
Matthew Bowers
Theo X. Olausson
Muxin Liu
Joshua B. Tenenbaum
Jacob Andreas
21
21
0
30 Oct 2023
What's "up" with vision-language models? Investigating their struggle with spatial reasoning
Amita Kamath
Jack Hessel
Kai-Wei Chang
LRM
CoGe
19
98
0
30 Oct 2023
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions
Luca Longo
Mario Brcic
Federico Cabitza
Jaesik Choi
Roberto Confalonieri
...
Andrés Páez
Wojciech Samek
Johannes Schneider
Timo Speith
Simone Stumpf
36
192
0
30 Oct 2023
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images
Seongsu Bae
Daeun Kyung
Jaehee Ryu
Eunbyeol Cho
Gyubok Lee
...
Jungwoo Oh
Lei Ji
E. Chang
Tackeun Kim
Edward Choi
49
20
0
28 Oct 2023
ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese
Khiem Vinh Tran
Hao Phu Phan
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
34
5
0
27 Oct 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Alan Yuille
CoGe
27
12
0
27 Oct 2023
Synthetic Data as Validation
Qixing Hu
Alan Yuille
Zongwei Zhou
SyDa
OOD
26
8
0
24 Oct 2023
What's Left? Concept Grounding with Logic-Enhanced Foundation Models
Joy Hsu
Jiayuan Mao
Joshua B. Tenenbaum
Jiajun Wu
VLM
ReLM
LRM
42
21
0
24 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
41
51
0
23 Oct 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Zhecan Wang
Long Chen
Haoxuan You
Keyang Xu
Yicheng He
Wenhao Li
Noal Codella
Kai-Wei Chang
Shih-Fu Chang
33
3
0
23 Oct 2023
LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly
Bowen Fu
Sek Kun Leong
Yan Di
Jiwen Tang
Xiangyang Ji
37
5
0
20 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
33
5
0
17 Oct 2023
GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers
Takeru Miyato
Bernhard Jaeger
Max Welling
Andreas Geiger
ViT
46
15
0
16 Oct 2023
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World
Rujie Wu
Xiaojian Ma
Zhenliang Zhang
Wei Wang
Qing Li
Song-Chun Zhu
Yizhou Wang
LRM
VLM
41
7
0
16 Oct 2023
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Maya Okawa
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
CoGe
DiffM
39
46
0
13 Oct 2023
Leveraging Image Augmentation for Object Manipulation: Towards Interpretable Controllability in Object-Centric Learning
Jinwoo Kim
Janghyuk Choi
Jaehyun Kang
Changyeon Lee
Ho-Jin Choi
Seon Joo Kim
OCL
40
0
0
13 Oct 2023
AutoVP: An Automated Visual Prompting Framework and Benchmark
Hsi-Ai Tsao
Lei Hsiung
Pin-Yu Chen
Sijia Liu
Tsung-Yi Ho
VLM
21
18
0
12 Oct 2023
Visual Question Generation in Bengali
Mahmud Hasan
Labiba Islam
J. Ruma
T. Mayeesha
Rashedur Rahman
26
1
0
12 Oct 2023
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Junyu Lu
Di Zhang
Xiaojun Wu
Xinyu Gao
Ruyi Gan
Jiaxing Zhang
Yan Song
Pingjian Zhang
VLM
MLLM
22
7
0
12 Oct 2023
State of the Art on Diffusion Models for Visual Computing
Ryan Po
Wang Yifan
Vladislav Golyanik
Kfir Aberman
Jonathan T. Barron
...
Matthias Nießner
Bjorn Ommer
Christian Theobalt
Peter Wonka
Gordon Wetzstein
38
103
0
11 Oct 2023
Self-supervised Object-Centric Learning for Videos
Görkay Aydemir
Weidi Xie
Fatma Guney
OCL
VOS
SSL
38
24
0
10 Oct 2023
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang
Xiaotong Zhai
Zhongkai Zhao
Yongshuo Zong
Xin Wen
Bingchen Zhao
LRM
16
0
0
10 Oct 2023
What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?
Siting Li
Chenzhuang Du
Yue Zhao
Yu Huang
Hang Zhao
24
4
0
10 Oct 2023
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
ReLM
LRM
36
8
0
09 Oct 2023
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models
Holy Lovenia
Wenliang Dai
Samuel Cahyawijaya
Ziwei Ji
Pascale Fung
MLLM
36
51
0
09 Oct 2023
CLEVRER-Humans: Describing Physical and Causal Events the Human Way
Jiayuan Mao
Xuelin Yang
Xikun Zhang
Noah D. Goodman
Jiajun Wu
NAI
30
22
0
05 Oct 2023
Learning Hierarchical Relational Representations through Relational Convolutions
Awni Altabaa
John Lafferty
33
2
0
05 Oct 2023
Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models
Zihao Lin
Yan Sun
Yifan Shi
Xueqian Wang
Lifu Huang
Li Shen
Dacheng Tao
49
11
0
04 Oct 2023
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Zejun Li
Ye Wang
Mengfei Du
Qingwen Liu
Binhao Wu
...
Zhihao Fan
Jie Fu
Jingjing Chen
Xuanjing Huang
Zhongyu Wei
35
13
0
04 Oct 2023
Human Mobility Question Answering (Vision Paper)
Hao Xue
Flora D. Salim
32
0
0
02 Oct 2023
Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering
Han Zhou
Xingchen Wan
Lev Proleev
Diana Mincu
Jilin Chen
Katherine A. Heller
Subhrajit Roy
UQLM
33
54
0
29 Sep 2023
Toloka Visual Question Answering Benchmark
Mert Pilanci
Nikita Pavlichenko
Sergey Koshelev
Daniil Likhobaba
Alisa Smirnova
37
4
0
28 Sep 2023
The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering
Hai-ping Yu
Yu Tian
Sateesh Kumar
Linjie Yang
Heng Wang
VLM
38
17
0
27 Sep 2023
Variational Inference for Scalable 3D Object-centric Learning
Tianyu Wang
K. S. Ng
Miaomiao Liu
OCL
DRL
3DPC
30
0
0
25 Sep 2023
Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation
Ke Fan
Jingshi Lei
Xuelin Qian
Miaopeng Yu
Tianjun Xiao
Tong He
Zheng-Wei Zhang
Yanwei Fu
VOS
23
4
0
23 Sep 2023
Associative Transformer
Yuwei Sun
H. Ochiai
Zhirong Wu
Stephen Lin
Ryota Kanai
ViT
65
0
0
22 Sep 2023
Boolformer: Symbolic Regression of Logic Functions with Transformers
Stéphane dÁscoli
Samy Bengio
Josh Susskind
Emmanuel Abbe
29
5
0
21 Sep 2023
Improve the efficiency of deep reinforcement learning through semantic exploration guided by natural language
Zhourui Guo
Meng Yao
Yang Yu
Qiyue Yin
OnRL
28
1
0
21 Sep 2023
D3: Data Diversity Design for Systematic Generalization in Visual Question Answering
Amir Rahimi
Vanessa D’Amario
Moyuru Yamada
Kentaro Takemoto
Tomotake Sasaki
Xavier Boix
43
1
0
15 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
26
1
0
15 Sep 2023
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Henry Hengyuan Zhao
Pichao Wang
Yuyang Zhao
Hao Luo
F. Wang
Mike Zheng Shou
ViT
42
14
0
15 Sep 2023
Learning by Self-Explaining
Wolfgang Stammer
Felix Friedrich
David Steinmann
Manuel Brack
Hikaru Shindo
Kristian Kersting
39
7
0
15 Sep 2023
Previous
1
2
3
...
7
8
9
...
28
29
30
Next