Maximum entropy distributions with discrete support in dimensions arise in machine learning, statistics, information theory, and theoretical computer science. While structural and computational properties of max-entropy distributions have been extensively studied, basic questions such as: Do max-entropy distributions over a large support (e.g., ) with a specified marginal vector have succinct descriptions (polynomial-size in the input description)? and: Are entropy maximizing distributions "stable" under the perturbation of the marginal vector? have resisted a rigorous resolution. Here we show that these questions are related and resolve both of them. Our main result shows a bound on the bit complexity of -optimal dual solutions to the maximum entropy convex program -- for very general support sets and with no restriction on the marginal vector. Applications of this result include polynomial time algorithms to compute max-entropy distributions over several new and old polytopes for any marginal vector in a unified manner, a polynomial time algorithm to compute the Brascamp-Lieb constant in the rank-1 case. The proof of this result allows us to show that changing the marginal vector by changes the max-entropy distribution in the total variation distance roughly by a factor of -- even when the size of the support set is exponential. Together, our results put max-entropy distributions on a mathematically sound footing -- these distributions are robust and computationally feasible models for data.
View on arXiv