MAIN: Mutual Alignment Is Necessary for instruction tuning

17 April 2025

Abstract

Instruction tuning has enabled large language models (LLMs) to achieve remarkable performance, but its success heavily depends on the availability of large-scale, high-quality instruction-response pairs. However, current methods for scaling up data generation often overlook a crucial aspect: the alignment between instructions and responses. We hypothesize that high-quality instruction-response pairs are not defined by the individual quality of each component, but by the extent of their alignment with each other. To address this, we propose a Mutual Alignment Framework (MAIN) that ensures coherence between the instruction and response through mutual constraints. Experiments demonstrate that models such as LLaMA and Mistral, fine-tuned within this framework, outperform traditional methods across multiple benchmarks. This approach underscores the critical role of instruction-response alignment in enabling scalable and high-quality instruction tuning for LLMs.

View on arXiv

@article{yang2025_2504.12913,
  title={ MAIN: Mutual Alignment Is Necessary for instruction tuning },
  author={ Fanyi Yang and Jianfeng Liu and Xin Zhang and Haoyu Liu and Xixin Cao and Yuefeng Zhan and Hao Sun and Weiwei Deng and Feng Sun and Qi Zhang },
  journal={arXiv preprint arXiv:2504.12913},
  year={ 2025 }
}

Comments on this paper