ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.15589
  4. Cited By
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
v1v2v3v4v5 (latest)

Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models

22 July 2024
Amir Mohammad Karimi Mamaghan
Samuele Papa
Karl Henrik Johansson
Stefan Bauer
Andrea Dittadi
    OCL
ArXiv (abs)PDFHTML

Papers citing "Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models"

37 / 87 papers shown
Title
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
703
6,121
0
29 Apr 2021
GENESIS-V2: Inferring Unordered Object Representations without Iterative
  Refinement
GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement
Martin Engelcke
Oiwi Parker Jones
Ingmar Posner
OCL
77
119
0
20 Apr 2021
Grounding Physical Concepts of Objects and Events Through Dynamic Visual
  Reasoning
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Zhenfang Chen
Jiayuan Mao
Jiajun Wu
Kwan-Yee K. Wong
J. Tenenbaum
Chuang Gan
VGen
81
94
0
30 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
967
29,731
0
26 Feb 2021
Attention over learned object embeddings enables complex visual
  reasoning
Attention over learned object embeddings enables complex visual reasoning
David Ding
Felix Hill
Adam Santoro
Malcolm Reynolds
M. Botvinick
OCL
102
71
0
15 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
667
41,369
0
22 Oct 2020
Improving Generative Imagination in Object-Centric World Models
Improving Generative Imagination in Object-Centric World Models
Zhixuan Lin
Yi-Fu Wu
Skand Peri
Bofeng Fu
Jindong Jiang
Sungjin Ahn
OCL
109
81
0
05 Oct 2020
Object-Centric Learning with Slot Attention
Object-Centric Learning with Slot Attention
Francesco Locatello
Dirk Weissenborn
Thomas Unterthiner
Aravindh Mahendran
G. Heigold
Jakob Uszkoreit
Alexey Dosovitskiy
Thomas Kipf
OCL
225
856
0
26 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
841
42,332
0
28 May 2020
Decision-Making with Auto-Encoding Variational Bayes
Decision-Making with Auto-Encoding Variational Bayes
Romain Lopez
Pierre Boyeau
Nir Yosef
Michael I. Jordan
Jeffrey Regier
BDL
470
10,591
0
17 Feb 2020
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial
  Attention and Decomposition
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition
Zhixuan Lin
Yi-Fu Wu
Skand Peri
Weihao Sun
Gautam Singh
Fei Deng
Jindong Jiang
Sungjin Ahn
BDLOCL3DPC
168
250
0
08 Jan 2020
Contrastive Learning of Structured World Models
Contrastive Learning of Structured World Models
Thomas Kipf
Elise van der Pol
Max Welling
OCLDRL
81
285
0
27 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
456
20,298
0
23 Oct 2019
SCALOR: Generative World Models with Scalable Object Representations
SCALOR: Generative World Models with Scalable Object Representations
Jindong Jiang
Sepehr Janghorbani
Gerard de Melo
Sungjin Ahn
OCLDRL
90
133
0
06 Oct 2019
LAVAE: Disentangling Location and Appearance
LAVAE: Disentangling Location and Appearance
Andrea Dittadi
Ole Winther
OCLBDLDRL
118
6
0
25 Sep 2019
Recurrent Independent Mechanisms
Recurrent Independent Mechanisms
Anirudh Goyal
Alex Lamb
Jordan Hoffmann
Shagun Sodhani
Sergey Levine
Yoshua Bengio
Bernhard Schölkopf
91
337
0
24 Sep 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSLVLM
237
3,693
0
06 Aug 2019
GENESIS: Generative Scene Inference and Sampling with Object-Centric
  Latent Representations
GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations
Martin Engelcke
Adam R. Kosiorek
Oiwi Parker Jones
Ingmar Posner
OCL
124
307
0
30 Jul 2019
Multi-Object Representation Learning with Iterative Variational
  Inference
Multi-Object Representation Learning with Iterative Variational Inference
Klaus Greff
Raphael Lopez Kaufman
Rishabh Kabra
Nicholas Watters
Christopher P. Burgess
Daniel Zoran
Loic Matthey
M. Botvinick
Alexander Lerchner
OCLSSL
106
509
0
01 Mar 2019
MONet: Unsupervised Scene Decomposition and Representation
MONet: Unsupervised Scene Decomposition and Representation
Christopher P. Burgess
Loic Matthey
Nicholas Watters
Rishabh Kabra
I. Higgins
M. Botvinick
Alexander Lerchner
OCL
88
529
0
22 Jan 2019
Spatial Broadcast Decoder: A Simple Architecture for Learning
  Disentangled Representations in VAEs
Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs
Nicholas Watters
Loic Matthey
Christopher P. Burgess
Alexander Lerchner
CoGe
92
169
0
21 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,114
0
11 Oct 2018
Relational Deep Reinforcement Learning
Relational Deep Reinforcement Learning
V. Zambaldi
David Raposo
Adam Santoro
V. Bapst
Yujia Li
...
Victoria Langston
Razvan Pascanu
M. Botvinick
Oriol Vinyals
Peter W. Battaglia
OffRL
159
221
0
05 Jun 2018
Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects
Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects
Adam R. Kosiorek
Hyunjik Kim
Ingmar Posner
Yee Whye Teh
BDL
93
258
0
05 Jun 2018
Neural Discrete Representation Learning
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDLSSLOCL
228
5,061
0
02 Nov 2017
Neural Expectation Maximization
Neural Expectation Maximization
Klaus Greff
Sjoerd van Steenkiste
Jürgen Schmidhuber
OCL
127
286
0
11 Aug 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
730
132,199
0
12 Jun 2017
Visual Interaction Networks
Visual Interaction Networks
Nicholas Watters
Andrea Tacchetti
T. Weber
Razvan Pascanu
Peter W. Battaglia
Daniel Zoran
PINN3DH
96
279
0
05 Jun 2017
A simple neural network module for relational reasoning
A simple neural network module for relational reasoning
Adam Santoro
David Raposo
David Barrett
Mateusz Malinowski
Razvan Pascanu
Peter W. Battaglia
Timothy Lillicrap
GNNNAI
189
1,615
0
05 Jun 2017
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
  Visual Reasoning
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
311
2,386
0
20 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
345
3,270
0
02 Dec 2016
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
S. M. Ali Eslami
N. Heess
T. Weber
Yuval Tassa
David Szepesvari
Koray Kavukcuoglu
Geoffrey E. Hinton
3DVBDLOCL
129
551
0
28 Mar 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,322
0
10 Dec 2015
Exploring Models and Data for Image Question Answering
Exploring Models and Data for Image Question Answering
Mengye Ren
Ryan Kiros
R. Zemel
80
718
0
08 May 2015
VQA: Visual Question Answering
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
214
5,497
0
03 May 2015
DRAW: A Recurrent Neural Network For Image Generation
DRAW: A Recurrent Neural Network For Image Generation
Karol Gregor
Ivo Danihelka
Alex Graves
Danilo Jimenez Rezende
Daan Wierstra
GANDRL
173
1,961
0
16 Feb 2015
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
422
43,777
0
01 May 2014
Previous
12