ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.04555
34
3

dattri\texttt{dattri}dattri: A Library for Efficient Data Attribution

6 October 2024
Junwei Deng
Ting-Wei Li
Shiyuan Zhang
Shixuan Liu
Yijun Pan
Hao Huang
Xinhe Wang
Pingbang Hu
Xingjian Zhang
Jiaqi W. Ma
    TDI
ArXivPDFHTML
Abstract

Data attribution methods aim to quantify the influence of individual training samples on the prediction of artificial intelligence (AI) models. As training data plays an increasingly crucial role in the modern development of large-scale AI models, data attribution has found broad applications in improving AI performance and safety. However, despite a surge of new data attribution methods being developed recently, there lacks a comprehensive library that facilitates the development, benchmarking, and deployment of different data attribution methods. In this work, we introduce dattri\texttt{dattri}dattri, an open-source data attribution library that addresses the above needs. Specifically, dattri\texttt{dattri}dattri highlights three novel design features. Firstly, dattri\texttt{dattri}dattri proposes a unified and easy-to-use API, allowing users to integrate different data attribution methods into their PyTorch-based machine learning pipeline with a few lines of code changed. Secondly, dattri\texttt{dattri}dattri modularizes low-level utility functions that are commonly used in data attribution methods, such as Hessian-vector product, inverse-Hessian-vector product or random projection, making it easier for researchers to develop new data attribution methods. Thirdly, dattri\texttt{dattri}dattri provides a comprehensive benchmark framework with pre-trained models and ground truth annotations for a variety of benchmark settings, including generative AI settings. We have implemented a variety of state-of-the-art efficient data attribution methods that can be applied to large-scale neural network models, and will continuously update the library in the future. Using the developed dattri\texttt{dattri}dattri library, we are able to perform a comprehensive and fair benchmark analysis across a wide range of data attribution methods. The source code of dattri\texttt{dattri}dattri is available at https://github.com/TRAIS-Lab/dattri.

View on arXiv
Comments on this paper