Input Perturbations for Adaptive Control and Learning

This work studies adaptive algorithms for simultaneous regulation and estimation of MIMO linear dynamical systems. Efficient practical control policies that utilize input signals perturbations are designed and analyzed. We show that a perturbed greedy algorithm guarantees non-asymptotic regret bounds of (nearly) square-root magnitude with respect to time. More generally, we establish high probability finite time bounds on both the regret and the learning accuracy under arbitrary input perturbations. The settings where greedy policies attain the information theoretic lower bound of logarithmic regret are also discussed. To obtain the results, state-of-the-art tools from martingale theory together with the recently introduced method of policy decomposition are leveraged.
View on arXiv