ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.03699
60
0

Scaling Transformers for Discriminative Recommendation via Generative Pretraining

4 June 2025
Chunqi Wang
Bingchao Wu
Z. Chen
Lei Shen
Bing Wang
Xiaoyi Zeng
ArXiv (abs)PDFHTML
Main:8 Pages
8 Figures
Bibliography:2 Pages
9 Tables
Appendix:1 Pages
Abstract

Discriminative recommendation tasks, such as CTR (click-through rate) and CVR (conversion rate) prediction, play critical roles in the ranking stage of large-scale industrial recommender systems. However, training a discriminative model encounters a significant overfitting issue induced by data sparsity. Moreover, this overfitting issue worsens with larger models, causing them to underperform smaller ones. To address the overfitting issue and enhance model scalability, we propose a framework named GPSD (\textbf{G}enerative \textbf{P}retraining for \textbf{S}calable \textbf{D}iscriminative Recommendation), drawing inspiration from generative training, which exhibits no evident signs of overfitting. GPSD leverages the parameters learned from a pretrained generative model to initialize a discriminative model, and subsequently applies a sparse parameter freezing strategy. Extensive experiments conducted on both industrial-scale and publicly available datasets demonstrate the superior performance of GPSD. Moreover, it delivers remarkable improvements in online A/B tests. GPSD offers two primary advantages: 1) it substantially narrows the generalization gap in model training, resulting in better test performance; and 2) it leverages the scalability of Transformers, delivering consistent performance gains as models are scaled up. Specifically, we observe consistent performance improvements as the model dense parameters scale from 13K to 0.3B, closely adhering to power laws. These findings pave the way for unifying the architectures of recommendation models and language models, enabling the direct application of techniques well-established in large language models to recommendation models.

View on arXiv
@article{wang2025_2506.03699,
  title={ Scaling Transformers for Discriminative Recommendation via Generative Pretraining },
  author={ Chunqi Wang and Bingchao Wu and Zheng Chen and Lei Shen and Bing Wang and Xiaoyi Zeng },
  journal={arXiv preprint arXiv:2506.03699},
  year={ 2025 }
}
Comments on this paper