Fixed Point Lanczos: Sustaining TFLOP-equivalent Performance in FPGAs for Scientific Computing

Juan L Jerez,  George A Constantinides,  Eric C Kerrigan
Imperial college


Abstract

We consider the problem of enabling fixed-point implementations of linear algebra kernels to match the strengths of the field-programmable gate array (FPGA). Algorithms for solving linear equations, finding eigenvalues or finding singular values are typically nonlinear and recursive making the problem of establishing analytical bounds on variable dynamic range non-trivial. Current approaches fail to provide tight bounds for this type of algorithms. We use as a case study one of the most important kernels in scientific computing, the Lanczos iteration, which lies at the heart of well known methods such as conjugate gradient and minimum residual, and we show how we can modify the algorithm to allow us to apply standard linear algebra analysis to prove tight analytical bounds on all variables of the process, regardless of the properties of the original matrix. It is shown that the numerical behaviour of fixed-point implementations of the modified problem can be chosen to be at least as good as a double precision floating point implementation. Using this approach it is possible to get sustained FPGA performance very close to the peak general-purpose graphics processing unit (GPGPU) performance in FPGAs of comparable size when solving a single problem. If there are several independent problems to solve simultaneously it is possible to exceed the peak floating-point performance of a GPGPU, obtaining approximately 1, 2 or 4 TFLOPs for error tolerances of 10−7, 10−5 and 10−3, respectively, in a Virtex 7 FPGA.