Efficiently Vectorized MCMC on Modern Accelerators

20 March 2025

Abstract

With the advent of automatic vectorization tools (e.g., JAX's $\texttt{vmap}$ ), writing multi-chain MCMC algorithms is often now as simple as invoking those tools on single-chain code. Whilst convenient, for various MCMC algorithms this results in a synchronization problem -- loosely speaking, at each iteration all chains running in parallel must wait until the last chain has finished drawing its sample. In this work, we show how to design single-chain MCMC algorithms in a way that avoids synchronization overheads when vectorizing with tools like $\texttt{vmap}$ by using the framework of finite state machines (FSMs). Using a simplified model, we derive an exact theoretical form of the obtainable speed-ups using our approach, and use it to make principled recommendations for optimal algorithm design. We implement several popular MCMC algorithms as FSMs, including Elliptical Slice Sampling, HMC-NUTS, and Delayed Rejection, demonstrating speed-ups of up to an order of magnitude in experiments.

View on arXiv

@article{dance2025_2503.17405,
  title={ Efficiently Vectorized MCMC on Modern Accelerators },
  author={ Hugh Dance and Pierre Glaser and Peter Orbanz and Ryan Adams },
  journal={arXiv preprint arXiv:2503.17405},
  year={ 2025 }
}

Comments on this paper