23
0

Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models

Main:8 Pages
7 Figures
Bibliography:3 Pages
6 Tables
Appendix:5 Pages
Abstract

Optimizing the presentation of search and recommendation results is crucial to enhancing user experience and engagement. Whole Page Optimization (WPO) plays a pivotal role in this process, as it directly influences how information is surfaced to users. While Pre-trained Large Language Models (LLMs) have demonstrated remarkable capabilities in generating coherent and contextually relevant content, fine-tuning these models for complex tasks like WPO presents challenges. Specifically, the need for extensive human-annotated data to mitigate issues such as hallucinations and model instability can be prohibitively expensive, especially in large-scale systems that interact with millions of items daily. In this work, we address the challenge of fine-tuning LLMs for WPO by using user feedback as the supervision. Unlike manually labeled datasets, user feedback is inherently noisy and less precise. To overcome this, we propose a reward-based fine-tuning approach, PageLLM, which employs a mixed-grained reward mechanism that combines page-level and item-level rewards. The page-level reward evaluates the overall quality and coherence, while the item-level reward focuses on the accuracy and relevance of key recommendations. This dual-reward structure ensures that both the holistic presentation and the critical individual components are optimized. We validate PageLLM on both public and industrial datasets. PageLLM outperforms baselines and achieves a 0.44\% GMV increase in an online A/B test with over 10 million users, demonstrating its real-world impact.

View on arXiv
@article{wang2025_2506.09084,
  title={ Enhanced Whole Page Optimization via Mixed-Grained Reward Mechanism-Adapted Language Models },
  author={ Xinyuan Wang and Liang Wu and Yanjie Fu },
  journal={arXiv preprint arXiv:2506.09084},
  year={ 2025 }
}
Comments on this paper