Subsampled Newton methods approximate Hessian matrices through subsampling techniques, alleviating the cost of forming Hessian matrices but using sufficient curvature information. However, previous results require samples to approximate Hessians, where is the dimension of data points, making it less practically feasible for high-dimensional data. The situation is deteriorated when is comparably as large as the number of data points , which requires to take the whole dataset into account, making subsampling useless. This paper theoretically justifies the effectiveness of subsampled Newton methods on high dimensional data. Specifically, we prove only samples are needed in the approximation of Hessian matrices, where is the -ridge leverage and can be much smaller than as long as . Additionally, we extend this result so that subsampled Newton methods can work for high-dimensional data on both distributed optimization problems and non-smooth regularized problems.
View on arXiv