18
0

Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques

Abstract

Large language models have transformed natural language processing, yet supervised fine-tuning (SFT) remains computationally intensive. This paper formally proves that capabilities acquired through SFT can be approximated by a base transformer model using inference-time techniques, specifically in-context learning (ICL), without altering model parameters, under idealized assumptions including unbounded computational resources and access to the fine-tuning dataset. We extend these results to practical scenarios with finite context lengths and partial dataset access. For text generation tasks with fixed output length ll, datasets of size O(mVε2logmδ)\mathrm{O}\left( \frac{m V}{\varepsilon^2} \log \frac{m}{\delta} \right) or, with bounded context, O(llogVε2log1δ)\mathrm{O}\left( \frac{l \log V}{\varepsilon^2} \log \frac{1}{\delta} \right) suffice to approximate fine-tuned behavior across mm contexts within error ε\varepsilon, where VV is the vocabulary size and δ\delta is the failure probability. For linear classification, datasets of size O(dε)\mathrm{O}\left( \frac{d}{\varepsilon} \right) or, with fixed context, O(1ε2log1δ)\mathrm{O}\left( \frac{1}{\varepsilon^2} \log \frac{1}{\delta} \right) are sufficient, where dd is the input dimension. Grounded in the Turing completeness of transformers, these results provide a theoretical foundation for resource-efficient deployment of large language models, with practical techniques like retrieval-augmented generation bridging theory to real-world applications.

View on arXiv
@article{sharma2025_2506.08060,
  title={ Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques },
  author={ Asankhaya Sharma },
  journal={arXiv preprint arXiv:2506.08060},
  year={ 2025 }
}
Comments on this paper