ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.15085
36
0

Hierarchical Attention Fusion of Visual and Textual Representations for Cross-Domain Sequential Recommendation

21 April 2025
Wangyu Wu
Zhenhong Chen
Siqi Song
Xianglin Qiua
Xiaowei Huang
Fei Ma
Jimin Xiao
    HAI
ArXivPDFHTML
Abstract

Cross-Domain Sequential Recommendation (CDSR) predicts user behavior by leveraging historical interactions across multiple domains, focusing on modeling cross-domain preferences through intra- and inter-sequence item relationships. Inspired by human cognitive processes, we propose Hierarchical Attention Fusion of Visual and Textual Representations (HAF-VT), a novel approach integrating visual and textual data to enhance cognitive modeling. Using the frozen CLIP model, we generate image and text embeddings, enriching item representations with multimodal data. A hierarchical attention mechanism jointly learns single-domain and cross-domain preferences, mimicking human information integration. Evaluated on four e-commerce datasets, HAF-VT outperforms existing methods in capturing cross-domain user interests, bridging cognitive principles with computational models and highlighting the role of multimodal data in sequential decision-making.

View on arXiv
@article{wu2025_2504.15085,
  title={ Hierarchical Attention Fusion of Visual and Textual Representations for Cross-Domain Sequential Recommendation },
  author={ Wangyu Wu and Zhenhong Chen and Siqi Song and Xianglin Qiua and Xiaowei Huang and Fei Ma and Jimin Xiao },
  journal={arXiv preprint arXiv:2504.15085},
  year={ 2025 }
}
Comments on this paper