The problem of detecting fake data inspires the following seemingly simple mathematical question. Sample a data point from the standard normal distribution in . An adversary observes and corrupts it by adding a vector , where they can choose any vector from a fixed set of the adversary's "tricks", and where is a fixed radius. The adversary's choice of may depend on the true data . The adversary wants to hide the corruption by making the fake data statistically indistinguishable from the real data . What is the largest radius for which the adversary can create an undetectable fake? We show that for highly symmetric sets , the detectability radius is approximately twice the scaled Gaussian width of . The upper bound actually holds for arbitrary sets and generalizes to arbitrary, non-Gaussian distributions of real data . The lower bound may fail for not highly symmetric , but we conjecture that this problem can be solved by considering the focused version of the Gaussian width of , which focuses on the most important directions of .
View on arXiv