AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes

While knowledge distillation has become a mature field for compressing large language models (LLMs) into smaller ones by aligning their outputs or internal representations, the distillation of LLM-based agents, which involve planning, memory, and tool use, remains relatively underexplored. Existing agent distillation methods typically replay full teacher trajectories or imitate step-by-step teacher tool usage, but they often struggle to train student agents to dynamically plan and act in novel environments. We propose AgentDistill, a novel, training-free agent distillation framework that enables efficient and scalable knowledge transfer via direct reuse of Model-Context-Protocols (MCPs), which are structured and reusable task-solving modules autonomously generated by teacher agents. The reuse of these distilled MCPs enables student agents to generalize their capabilities across domains and solve new problems with minimal supervision or human intervention. Experiments on biomedical and mathematical benchmarks demonstrate that our distilled student agents, built on small language models, can achieve performance comparable to advanced systems using large LLMs such as OctoTools (GPT-4o), highlighting the effectiveness of our framework in building scalable and cost-efficient intelligent agents.
View on arXiv@article{qiu2025_2506.14728, title={ AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes }, author={ Jiahao Qiu and Xinzhe Juan and Yimin Wang and Ling Yang and Xuan Qi and Tongcheng Zhang and Jiacheng Guo and Yifu Lu and Zixin Yao and Hongru Wang and Shilong Liu and Xun Jiang and Liu Leqi and Mengdi Wang }, journal={arXiv preprint arXiv:2506.14728}, year={ 2025 } }