ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.16698
14
0

SIDE: Semantic ID Embedding for effective learning from sequences

20 June 2025
Dinesh Ramasamy
Shakti Kumar
Chris Cadonic
Jiaxin Yang
Sohini Roychowdhury
Esam Abdel Rhman
Srihari Reddy
    VLM
ArXiv (abs)PDFHTML
Main:5 Pages
4 Figures
Bibliography:2 Pages
6 Tables
Abstract

Sequence-based recommendations models are driving the state-of-the-art for industrial ad-recommendation systems. Such systems typically deal with user histories or sequence lengths ranging in the order of O(10^3) to O(10^4) events. While adding embeddings at this scale is manageable in pre-trained models, incorporating them into real-time prediction models is challenging due to both storage and inference costs. To address this scaling challenge, we propose a novel approach that leverages vector quantization (VQ) to inject a compact Semantic ID (SID) as input to the recommendation models instead of a collection of embeddings. Our method builds on recent works of SIDs by introducing three key innovations: (i) a multi-task VQ-VAE framework, called VQ fusion that fuses multiple content embeddings and categorical predictions into a single Semantic ID; (ii) a parameter-free, highly granular SID-to-embedding conversion technique, called SIDE, that is validated with two content embedding collections, thereby eliminating the need for a large parameterized lookup table; and (iii) a novel quantization method called Discrete-PCA (DPCA) which generalizes and enhances residual quantization techniques. The proposed enhancements when applied to a large-scale industrial ads-recommendation system achieves 2.4X improvement in normalized entropy (NE) gain and 3X reduction in data footprint compared to traditional SID methods.

View on arXiv
@article{ramasamy2025_2506.16698,
  title={ SIDE: Semantic ID Embedding for effective learning from sequences },
  author={ Dinesh Ramasamy and Shakti Kumar and Chris Cadonic and Jiaxin Yang and Sohini Roychowdhury and Esam Abdel Rhman and Srihari Reddy },
  journal={arXiv preprint arXiv:2506.16698},
  year={ 2025 }
}
Comments on this paper