Custom Precision Based Architectures for Accelerating Parallel Tempering MCMC in FPGAs

Grigorios Mingas and Christos-Savvas Bouganis
Imperial College London


Abstract

Markov Chain Monte Carlo (MCMC) is a family of algorithms which is used to draw samples from arbitrary probability distributions in order to estimate - otherwise intractable - integrals. When the distribution is complex, simple MCMC becomes inefficient and advanced, computationally intensive MCMC methods are employed to make sampling possible. This work proposes three streaming FPGA architectures to accelerate Parallel Tempering, a widely adopted, advanced MCMC method which is designed to sample from multimodal distributions. The proposed architectures exploit the characteristics of FPGAs and especially their flexibility to use custom arithmetic precision. The first architecture is based on the fact that Parallel Tempering (and MCMC in general) can be robust to reductions in the precision used to evaluate the sampling distribution. The other two architectures demonstrate that, even when extremely low precision is employed, sampling errors can be "corrected" or avoided altogether. The use of reduced precision allows for area savings, which translate into significant gains in sampling throughput. Speedups of two to three orders of magnitude compared to software and up to 2.33x compared to a GPGPU implementation are achieved without any compromise in sampling quality when performing Bayesian inference for a mixture model, opening the way for the handling of previously intractable problems.