ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.11108
18
0

History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM

8 June 2025
Andrew Kiruluta
Andreas Lemos
Priscilla Burity
    LRM
ArXiv (abs)PDFHTML
Main:14 Pages
2 Figures
Bibliography:2 Pages
4 Tables
Abstract

We present CAGSR-vLLM-MTC, an extension of our Self-Supervised Cross-Attention-Guided Reinforcement (CAGSR) framework, now implemented on the high-performance vLLM runtime, to address both multi-turn dialogue and chain-of-thought reasoning. Building upon our original single-turn approach, we first instrumented vLLM's C++/CUDA kernels to asynchronously capture per-layer, per-head cross-attention weights during generation. We then generalized our self-supervised reward function to accumulate attention signals over entire conversation histories and intermediate chain-of-thought steps. We discuss practical trade-offs, including an entropy-based clamping mechanism to prevent attention collapse on early context, and outline future directions for multi-party dialogues and hierarchical reasoning.

View on arXiv
@article{kiruluta2025_2506.11108,
  title={ History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM },
  author={ Andrew Kiruluta and Andreas Lemos and Priscilla Burity },
  journal={arXiv preprint arXiv:2506.11108},
  year={ 2025 }
}
Comments on this paper