Instruction tuning has enabled large language models (LLMs) to achieve remarkable performance, but its success heavily depends on the availability of large-scale, high-quality instruction-response pairs. However, current methods for scaling up data generation often overlook a crucial aspect: the alignment between instructions and responses. We hypothesize that high-quality instruction-response pairs are not defined by the individual quality of each component, but by the extent of their alignment with each other. To address this, we propose a Mutual Alignment Framework (MAIN) that ensures coherence between the instruction and response through mutual constraints. Experiments demonstrate that models such as LLaMA and Mistral, fine-tuned within this framework, outperform traditional methods across multiple benchmarks. This approach underscores the critical role of instruction-response alignment in enabling scalable and high-quality instruction tuning for LLMs.
View on arXiv@article{yang2025_2504.12913, title={ MAIN: Mutual Alignment Is Necessary for instruction tuning }, author={ Fanyi Yang and Jianfeng Liu and Xin Zhang and Haoyu Liu and Xixin Cao and Yuefeng Zhan and Hao Sun and Weiwei Deng and Feng Sun and Qi Zhang }, journal={arXiv preprint arXiv:2504.12913}, year={ 2025 } }