91
0
v1v2 (latest)

Opportunities and Challenges of Frontier Data Governance With Synthetic Data

Main:4 Pages
Bibliography:4 Pages
Abstract

Synthetic data, or data generated by machine learning models, is increasingly emerging as a solution to the data access problem. However, its use introduces significant governance and accountability challenges, and potentially debases existing governance paradigms, such as compute and data governance. In this paper, we identify 3 key governance and accountability challenges that synthetic data poses - it can enable the increased emergence of malicious actors, spontaneous biases and value drift. We thus craft 3 technical mechanisms to address these specific challenges, finding applications for synthetic data towards adversarial training, bias mitigation and value reinforcement. These could not only counteract the risks of synthetic data, but serve as critical levers for governance of the frontier in the future.

View on arXiv
@article{thakur2025_2503.17414,
  title={ Opportunities and Challenges of Frontier Data Governance With Synthetic Data },
  author={ Madhavendra Thakur and Jason Hausenloy },
  journal={arXiv preprint arXiv:2503.17414},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.