Matrix Inversion on a Many-Core Platform

Zhangfan Zhao
Masters Thesis
VLSI Computation Laboratory
Department of Electrical and Computer Engineering
University of California, Davis
Technical Report ECE-VCL-2021-1, VLSI Computation Laboratory, University of California, Davis, 2021.


Matrix operations are a fundamental problem of scientific computation and industry computation, which are widely used in many applications. Among them, the inversion of matrices plays an essential role in multiple-input and multiple-output (MIMO) systems, image signal processing, least-squares analysis, etc. With dramatically increasing data sizes, the speed of inverting matrices usually becomes the key that affects the overall system performance. Therefore, this thesis proposes a many-core matrix inversion method based on Gaussian Jordan Elimination (GJE), which includes two implementations: a 603-processor design using only on-chip memory with a 16-bit fixed point and a 635-processor design using external off-chip memory with a 32-bit fixed point and a 32-bit float point. Details of the parallel algorithm based on the GJE are presented. All the unique programs loaded to the many-core platform and the mapping of the parallel architecture are described. The accuracy of using different data types are analyzed. Due to the word length and the computation complexity, the accuracy of the 16-bit fixed point is 2−1, the accuracy of the 32-bit fixed point is 2−6 and the accuracy of the 32-bit float point is 2−9. Due to the limitation of on-chip memory size, the implementation that uses only on-chip memory cannot invert the large matrices.

Therefore, the proposed implementation that uses off-chip memory with the 32- bit float point is compared against a general-purpose processor (i7-9700k) and a graphics processing unit (GPU) chip (NVIDIA GTX1070). The designs for the many-core chip, general-purpose processor and GPU are evaluated using the metrics of throughput per area (MatInv/sec/mm2) and matrix inversions per energy (MatInv/J). Since different fabrication technologies are used, throughput, area and energy dissipation for all platforms are scaled to 14 nm. The improvement in throughput per area achieved from experiments is 20–60× among all simulated matrices versus the general-purpose processor implementation, and 3.7–19× versus the GPU implementation. The improvement in matrix inversions per energy achieved from experiments is 45–131× versus the general-purpose processor implementation, and 8.5–41× versus the GPU implementation



Zhangfan Zhao, "Matrix Inversion on a Many-Core Platform," Masters Thesis, Technical Report ECE-VCL-2021-1, VLSI Computation Laboratory, ECE Department, University of California, Davis, March 2021.

BibTeX entry

   author      = {Zhangfan Zhao},
   title       = {Matrix Inversion on a Many-Core Platform},
   school      = {University of California, Davis},
   year        = 2021,
   address     = {Davis, CA, USA},
   month       = mar,
   note        = {\url{}}

VCL Lab | ECE Dept. | UC Davis