ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.20016
29
0

TTPA: Token-level Tool-use Preference Alignment Training Framework with Fine-grained Evaluation

26 May 2025
Chengrui Huang
Shen Gao
Zhengliang Shi
Dongsheng Wang
Shuo Shang
ArXiv (abs)PDFHTML
Main:8 Pages
5 Figures
Bibliography:3 Pages
10 Tables
Appendix:5 Pages
Abstract

Existing tool-learning methods usually rely on supervised fine-tuning, they often overlook fine-grained optimization of internal tool call details, leading to limitations in preference alignment and error discrimination. To overcome these challenges, we propose Token-level Tool-use Preference Alignment Training Framework (TTPA), a training paradigm for constructing token-level tool-use preference datasets that align LLMs with fine-grained preferences using a novel error-oriented scoring mechanism. TTPA first introduces reversed dataset construction, a method for creating high-quality, multi-turn tool-use datasets by reversing the generation flow. Additionally, we propose Token-level Preference Sampling (TPS) to capture fine-grained preferences by modeling token-level differences during generation. To address biases in scoring, we introduce the Error-oriented Scoring Mechanism (ESM), which quantifies tool-call errors and can be used as a training signal. Extensive experiments on three diverse benchmark datasets demonstrate that TTPA significantly improves tool-using performance while showing strong generalization ability across models and datasets.

View on arXiv
@article{huang2025_2505.20016,
  title={ TTPA: Token-level Tool-use Preference Alignment Training Framework with Fine-grained Evaluation },
  author={ Chengrui Huang and Shen Gao and Zhengliang Shi and Dongsheng Wang and Shuo Shang },
  journal={arXiv preprint arXiv:2505.20016},
  year={ 2025 }
}
Comments on this paper