v1v2v3v4v5 (latest)

Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models

22 July 2024

Amir Mohammad Karimi Mamaghan

Samuele Papa

Karl Henrik Johansson

Stefan Bauer

Andrea Dittadi

OCL

ArXiv (abs)PDF HTML

Papers citing "Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models"

37 / 87 papers shown

Title
Emerging Properties in Self-Supervised Vision Transformers Mathilde Caron Hugo Touvron Ishan Misra Hervé Jégou Julien Mairal Piotr Bojanowski Armand Joulin 703 6,121 0 29 Apr 2021
GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement Martin Engelcke Oiwi Parker Jones Ingmar Posner OCL 77 119 0 20 Apr 2021
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning Zhenfang Chen Jiayuan Mao Jiajun Wu Kwan-Yee K. Wong J. Tenenbaum Chuang Gan VGen 81 94 0 30 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 967 29,731 0 26 Feb 2021
Attention over learned object embeddings enables complex visual reasoning David Ding Felix Hill Adam Santoro Malcolm Reynolds M. Botvinick OCL 102 71 0 15 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 667 41,369 0 22 Oct 2020
Improving Generative Imagination in Object-Centric World Models Zhixuan Lin Yi-Fu Wu Skand Peri Bofeng Fu Jindong Jiang Sungjin Ahn OCL 109 81 0 05 Oct 2020
Object-Centric Learning with Slot Attention Francesco Locatello Dirk Weissenborn Thomas Unterthiner Aravindh Mahendran G. Heigold Jakob Uszkoreit Alexey Dosovitskiy Thomas Kipf OCL 225 856 0 26 Jun 2020
Language Models are Few-Shot Learners Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 841 42,332 0 28 May 2020
Decision-Making with Auto-Encoding Variational Bayes Romain Lopez Pierre Boyeau Nir Yosef Michael I. Jordan Jeffrey Regier BDL 470 10,591 0 17 Feb 2020
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition Zhixuan Lin Yi-Fu Wu Skand Peri Weihao Sun Gautam Singh Fei Deng Jindong Jiang Sungjin Ahn BDL OCL 3DPC 168 250 0 08 Jan 2020
Contrastive Learning of Structured World Models Thomas Kipf Elise van der Pol Max Welling OCL DRL 81 285 0 27 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 456 20,298 0 23 Oct 2019
SCALOR: Generative World Models with Scalable Object Representations Jindong Jiang Sepehr Janghorbani Gerard de Melo Sungjin Ahn OCL DRL 90 133 0 06 Oct 2019
LAVAE: Disentangling Location and Appearance Andrea Dittadi Ole Winther OCL BDL DRL 118 6 0 25 Sep 2019
Recurrent Independent Mechanisms Anirudh Goyal Alex Lamb Jordan Hoffmann Shagun Sodhani Sergey Levine Yoshua Bengio Bernhard Schölkopf 91 337 0 24 Sep 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Jiasen Lu Dhruv Batra Devi Parikh Stefan Lee SSL VLM 237 3,693 0 06 Aug 2019
GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations Martin Engelcke Adam R. Kosiorek Oiwi Parker Jones Ingmar Posner OCL 124 307 0 30 Jul 2019
Multi-Object Representation Learning with Iterative Variational Inference Klaus Greff Raphael Lopez Kaufman Rishabh Kabra Nicholas Watters Christopher P. Burgess Daniel Zoran Loic Matthey M. Botvinick Alexander Lerchner OCL SSL 106 509 0 01 Mar 2019
MONet: Unsupervised Scene Decomposition and Representation Christopher P. Burgess Loic Matthey Nicholas Watters Rishabh Kabra I. Higgins M. Botvinick Alexander Lerchner OCL 88 529 0 22 Jan 2019
Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs Nicholas Watters Loic Matthey Christopher P. Burgess Alexander Lerchner CoGe 92 169 0 21 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,114 0 11 Oct 2018
Relational Deep Reinforcement Learning V. Zambaldi David Raposo Adam Santoro V. Bapst Yujia Li ... Victoria Langston Razvan Pascanu M. Botvinick Oriol Vinyals Peter W. Battaglia OffRL 159 221 0 05 Jun 2018
Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects Adam R. Kosiorek Hyunjik Kim Ingmar Posner Yee Whye Teh BDL 93 258 0 05 Jun 2018
Neural Discrete Representation Learning Aaron van den Oord Oriol Vinyals Koray Kavukcuoglu BDL SSL OCL 228 5,061 0 02 Nov 2017
Neural Expectation Maximization Klaus Greff Sjoerd van Steenkiste Jürgen Schmidhuber OCL 127 286 0 11 Aug 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 730 132,199 0 12 Jun 2017
Visual Interaction Networks Nicholas Watters Andrea Tacchetti T. Weber Razvan Pascanu Peter W. Battaglia Daniel Zoran PINN 3DH 96 279 0 05 Jun 2017
A simple neural network module for relational reasoning Adam Santoro David Raposo David Barrett Mateusz Malinowski Razvan Pascanu Peter W. Battaglia Timothy Lillicrap GNN NAI 189 1,615 0 05 Jun 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning Justin Johnson B. Hariharan Laurens van der Maaten Li Fei-Fei C. L. Zitnick Ross B. Girshick CoGe 311 2,386 0 20 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Yash Goyal Tejas Khot D. Summers-Stay Dhruv Batra Devi Parikh CoGe 345 3,270 0 02 Dec 2016
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models S. M. Ali Eslami N. Heess T. Weber Yuval Tassa David Szepesvari Koray Kavukcuoglu Geoffrey E. Hinton 3DV BDL OCL 129 551 0 28 Mar 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.2K 194,322 0 10 Dec 2015
Exploring Models and Data for Image Question Answering Mengye Ren Ryan Kiros R. Zemel 80 718 0 08 May 2015
VQA: Visual Question Answering Aishwarya Agrawal Jiasen Lu Stanislaw Antol Margaret Mitchell C. L. Zitnick Dhruv Batra Devi Parikh CoGe 214 5,497 0 03 May 2015
DRAW: A Recurrent Neural Network For Image Generation Karol Gregor Ivo Danihelka Alex Graves Danilo Jimenez Rezende Daan Wierstra GAN DRL 173 1,961 0 16 Feb 2015
Microsoft COCO: Common Objects in Context Nayeon Lee Michael Maire Serge J. Belongie Lubomir Bourdev Ross B. Girshick James Hays Pietro Perona Deva Ramanan C. L. Zitnick Piotr Dollár ObjD 422 43,777 0 01 May 2014