ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.20772
117
0

MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning

27 May 2025
Hongjia Liu
Rongzhen Zhao
Haohan Chen
Joni Pajarinen
    OCLVLM
ArXiv (abs)PDFHTML
Main:9 Pages
6 Figures
Bibliography:5 Pages
6 Tables
Appendix:4 Pages
Abstract

Learning object-level, structured representations is widely regarded as a key to better generalization in vision and underpins the design of next-generation Pre-trained Vision Models (PVMs). Mainstream Object-Centric Learning (OCL) methods adopt Slot Attention or its variants to iteratively aggregate objects' super-pixels into a fixed set of query feature vectors, termed slots. However, their reliance on a static slot count leads to an object being represented as multiple parts when the number of objects varies. We introduce MetaSlot, a plug-and-play Slot Attention variant that adapts to variable object counts. MetaSlot (i) maintains a codebook that holds prototypes of objects in a dataset by vector-quantizing the resulting slot representations; (ii) removes duplicate slots from the traditionally aggregated slots by quantizing them with the codebook; and (iii) injects progressively weaker noise into the Slot Attention iterations to accelerate and stabilize the aggregation. MetaSlot is a general Slot Attention variant that can be seamlessly integrated into existing OCL architectures. Across multiple public datasets and tasks--including object discovery and recognition--models equipped with MetaSlot achieve significant performance gains and markedly interpretable slot representations, compared with existing Slot Attention variants.

View on arXiv
@article{liu2025_2505.20772,
  title={ MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning },
  author={ Hongjia Liu and Rongzhen Zhao and Haohan Chen and Joni Pajarinen },
  journal={arXiv preprint arXiv:2505.20772},
  year={ 2025 }
}
Comments on this paper