ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.11852
31
0

Fast Training Dataset Attribution via In-Context Learning

14 August 2024
Milad Fotouhi
M. T. Bahadori
Oluwaseyi Feyisetan
P. Arabshahi
David Heckerman
ArXivPDFHTML
Abstract

We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned large language models (LLMs). We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM outputs with and without provided context, and (2) a mixture distribution model approach that frames the problem of identifying contribution scores as a matrix factorization task. Our empirical comparison demonstrates that the mixture model approach is more robust to retrieval noise in in-context learning, providing a more reliable estimation of data contributions.

View on arXiv
@article{fotouhi2025_2408.11852,
  title={ Fast Training Dataset Attribution via In-Context Learning },
  author={ Milad Fotouhi and Mohammad Taha Bahadori and Oluwaseyi Feyisetan and Payman Arabshahi and David Heckerman },
  journal={arXiv preprint arXiv:2408.11852},
  year={ 2025 }
}
Comments on this paper