Attention over learned object embeddings enables complex visual reasoning

15 December 2020

Papers citing "Attention over learned object embeddings enables complex visual reasoning"

40 / 40 papers shown

Title
Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem Declan Campbell Sunayana Rane Tyler Giallanza Nicolò De Sabbata Kia Ghods ... Alexander Ku Steven M. Frankland Thomas Griffiths Jonathan D. Cohen Taylor W. Webb 63 15 0 31 Oct 2024
Compositional Physical Reasoning of Objects and Events from Videos Zhenfang Chen Shilong Dong Kexin Yi Yunzhu Li Mingyu Ding Antonio Torralba Joshua B. Tenenbaum Chuang Gan OCL 71 1 0 02 Aug 2024
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models Amir Mohammad Karimi Mamaghan Samuele Papa Karl Henrik Johansson Stefan Bauer Andrea Dittadi OCL 66 7 0 22 Jul 2024
CLEVRER-Humans: Describing Physical and Causal Events the Human Way Jiayuan Mao Xuelin Yang Xikun Zhang Noah D. Goodman Jiajun Wu NAI 39 22 0 05 Oct 2023
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events Li Xu He Huang Jun Liu ViT LRM 56 83 0 29 Mar 2021
ACRE: Abstract Causal REasoning Beyond Covariation Chi Zhang Baoxiong Jia Mark Edmonds Song-Chun Zhu Yixin Zhu CML 79 48 0 26 Mar 2021
Hopper: Multi-hop Transformer for Spatiotemporal Reasoning Honglu Zhou Asim Kadav Farley Lai Alexandru Niculescu-Mizil Martin Renqiang Min Mubbasir Kapadia H. Graf LRM 51 18 0 19 Mar 2021
Coordination Among Neural Modules Through a Shared Global Workspace Anirudh Goyal Aniket Didolkar Alex Lamb Kartikeya Badola Nan Rosemary Ke Nasim Rahaman Jonathan Binas Charles Blundell Michael C. Mozer Yoshua Bengio 167 98 0 01 Mar 2021
Unsupervised Discovery of 3D Physical Objects from Video Yilun Du Kevin A. Smith Tomer Ulman J. Tenenbaum Jiajun Wu OCL 135 38 0 24 Jul 2020
Object-Centric Learning with Slot Attention Francesco Locatello Dirk Weissenborn Thomas Unterthiner Aravindh Mahendran G. Heigold Jakob Uszkoreit Alexey Dosovitskiy Thomas Kipf OCL 128 832 0 26 Jun 2020
Language Models are Few-Shot Learners Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 321 41,106 0 28 May 2020
End-to-End Object Detection with Transformers Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov Sergey Zagoruyko ViT 3DV PINN 221 12,847 0 26 May 2020
Learning Object Permanence from Video Aviv Shamsian Ofri Kleinfeld Amir Globerson Gal Chechik SSL 57 31 0 23 Mar 2020
A Simple Framework for Contrastive Learning of Visual Representations Ting-Li Chen Simon Kornblith Mohammad Norouzi Geoffrey E. Hinton SSL 119 18,523 0 13 Feb 2020
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition Zhixuan Lin Yi-Fu Wu Skand Peri Weihao Sun Gautam Singh Fei Deng Jindong Jiang Sungjin Ahn BDL OCL 3DPC 102 247 0 08 Jan 2020
Deep Learning for Symbolic Mathematics Guillaume Lample François Charton 3DGS 46 406 0 02 Dec 2019
CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning Rohit Girdhar Deva Ramanan 36 177 0 10 Oct 2019
CLEVRER: CoLlision Events for Video REpresentation and Reasoning Kexin Yi Yuta Saito Yunzhu Li Pushmeet Kohli Jiajun Wu Antonio Torralba J. Tenenbaum NAI 60 461 0 03 Oct 2019
Video Representation Learning by Dense Predictive Coding Tengda Han Weidi Xie Andrew Zisserman SSL 53 359 0 10 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations Weijie Su Xizhou Zhu Yue Cao Bin Li Lewei Lu Furu Wei Jifeng Dai VLM MLLM SSL 97 1,657 0 22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers Hao Hao Tan Joey Tianyi Zhou VLM MLLM 177 2,467 0 20 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language Liunian Harold Li Mark Yatskar Da Yin Cho-Jui Hsieh Kai-Wei Chang VLM 92 1,939 0 09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Jiasen Lu Dhruv Batra Devi Parikh Stefan Lee SSL VLM 160 3,659 0 06 Aug 2019
Shaping Belief States with Generative Environment Models for RL Karol Gregor Danilo Jimenez Rezende F. Besse Yan Wu Hamza Merzic Aaron van den Oord OffRL AI4CE 55 118 0 21 Jun 2019
Learning Video Representations using Contrastive Bidirectional Transformer Chen Sun Fabien Baradel Kevin Patrick Murphy Cordelia Schmid SSL ViT 88 133 0 13 Jun 2019
VideoBERT: A Joint Model for Video and Language Representation Learning Chen Sun Austin Myers Carl Vondrick Kevin Patrick Murphy Cordelia Schmid VLM SSL 26 1,238 0 03 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes Yang You Jing Li Sashank J. Reddi Jonathan Hseu Sanjiv Kumar Srinadh Bhojanapalli Xiaodan Song J. Demmel Kurt Keutzer Cho-Jui Hsieh ODL 94 991 0 01 Apr 2019
Multi-Object Representation Learning with Iterative Variational Inference Klaus Greff Raphael Lopez Kaufman Rishabh Kabra Nicholas Watters Christopher P. Burgess Daniel Zoran Loic Matthey M. Botvinick Alexander Lerchner OCL SSL 49 505 0 01 Mar 2019
MONet: Unsupervised Scene Decomposition and Representation Christopher P. Burgess Loic Matthey Nicholas Watters Rishabh Kabra I. Higgins M. Botvinick Alexander Lerchner OCL 44 519 0 22 Jan 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Zihang Dai Zhilin Yang Yiming Yang J. Carbonell Quoc V. Le Ruslan Salakhutdinov VLM 75 3,707 0 09 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 603 93,936 0 11 Oct 2018
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding Kexin Yi Jiajun Wu Chuang Gan Antonio Torralba Pushmeet Kohli J. Tenenbaum NAI 57 606 0 04 Oct 2018
Compositional Attention Networks for Machine Reasoning Drew A. Hudson Christopher D. Manning BDL OOD LRM 56 574 0 08 Mar 2018
Object-based reasoning in VQA Mikyas T. Desta Larry Chen Tomasz Kornuta 45 33 0 29 Jan 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 230 129,831 0 12 Jun 2017
Mask R-CNN Kaiming He Georgia Gkioxari Piotr Dollár Ross B. Girshick ObjD 224 27,018 0 20 Mar 2017
Discovering objects and their relations from entangled scene representations David Raposo Adam Santoro David Barrett Razvan Pascanu Timothy Lillicrap Peter W. Battaglia GNN OCL 41 71 0 16 Feb 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning Justin Johnson B. Hariharan Laurens van der Maaten Li Fei-Fei C. L. Zitnick Ross B. Girshick CoGe 248 2,346 0 20 Dec 2016
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross B. Girshick Jian Sun AIMat ObjD 321 61,900 0 04 Jun 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 262 149,474 0 22 Dec 2014