We propose PseudoNeg-MAE, a novel self-supervised learning framework that enhances global feature representation of point cloud masked autoencoder by making them both discriminative and sensitive to transformations. Traditional contrastive learning methods focus on achieving invariance, discarding transformation-specific information. Recent approaches incorporate transformation sensitivity by explicitly modeling relationships between original and transformed inputs. However, they report an invariant-collapse phenomenon, where the predictor degenerates into identity mappings, resulting in latent representations that have limited variation across transformations. We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations. PseudoNeg-MAE uses a parametric network COPE, which learns the localized displacements caused by transformations within the latent space. However, jointly training COPE with the MAE leads to undesirable trivial solutions where COPE outputs collapse to an identity. To address this, we propose a loss that uses transformation-conditioned pseudo-negatives, to penalize such trivial invariant solutions. We validate PseudoNeg-MAE on shape classification and relative pose estimation tasks, where it achieves competitive performance on the ModelNet40 and ScanObjectNN datasets under challenging evaluation protocols and demonstrates superior accuracy in estimating relative poses compared to supervised methods.
View on arXiv@article{mahendren2025_2409.15832, title={ PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings }, author={ Sutharsan Mahendren and Saimunur Rahman and Piotr Koniusz and Tharindu Fernando and Sridha Sridharan and Clinton Fookes and Peyman Moghadam }, journal={arXiv preprint arXiv:2409.15832}, year={ 2025 } }