High Performance and Energy Efficient Multi-core Systems for DSP Applications

Zhiyi Yu
Ph.D. Dissertation
Computer Engineering Research Laboratory
Department of Electrical and Computer Engineering
University of California, Davis
Technical Report ECE-CE-2007-5, Computer Engineering Research Laboratory, University of California, Davis, 2007.

Abstract

This dissertation investigates the architectural design, physical implementation, result evaluation, and feature analysis of a multi-core processor for DSP applications. The system is composed of a 2-D array of simple single-issue programmable processors interconnected by a reconfigurable mesh network, and processors operate completely asynchronously with respect to each other in a Globally Asynchronous Locally Synchronous fashion. The processor is called Asynchronous Array of simple Processors (AsAP). A 6 x 6 array has been fabricated in a 0.18 μm CMOS technology. The physical design concerns timing issues for robust implementations, and takes full advantages of their potential scalability. Each processor occupies 0.66 mm², is fully functional at a clock rate of 520 to 540 MHz under 1.8 V, and dissipates 94 mW while the clock is 100% active. Compared to the high performance TI C62x DSP processor, AsAP achieves performance 0.8 to 9.6 times greater, energy efficiency 10 to 75 times greater, with an area 7 to 19 times smaller. The system is also easily scalable, and is well-suited to future fabrication technologies.

An asymmetric interprocessor communication architecture is proposed. It assigns different buffer resources to the nearest neighbor interconnect and the long distance interconnect, can reduce the communication circuitry area by approximately 2 to 4 times compared to the traditional Network on Chip (NoC), with similar routing capability. A wide design exploration space is investigated, including supporting long distance communication in GALS systems, static/dynamic routing, varying numbers of ports (buffers) for the processing core, and varying numbers of links at each edge.

The use of GALS style typically introduces performance penalties due to additional communication latency between clock domains. GALS chip multiprocessors with large inter-processor FIFOs as AsAP can inherently hide much of the GALS performance penalty, and the penalty can even be driven to zero. Furthermore, adaptive clock and voltage scaling for each processor provides an approximately 40% power savings without any performance reduction.

Paper

PDF (980 KB)

Reference

Zhiyi Yu, "High Performance and Energy Efficient Multi-core Systems for DSP Applications," Technical Report ECE-CE-2007-5, Computer Engineering Research Laboratory, ECE Department, University of California, Davis, 2007.

BibTeX entry

@phdthesis{zhyyu:phdthesis,
   author      = {Zhiyi Yu},
   title       = {High Performance and Energy Efficient Multi-core Systems for
                  DSP Applications},
   school      = {University of California},
   year        = 2007,
   address     = {Davis, CA, USA},
   month       = Oct,
   note        = {\url{http://www.ece.ucdavis.edu/vcl/pubs/theses/2007-5}}
   }

Support Acknowledgment

This work was supported in part by Intel Corporation, UC MICRO, the National Science Foundation under Grant No. 0430090 and CAREER Award 0546907, SRC GRC Grant 1598.001, IntellaSys Corporation, ST Microelectronics, S Machines, MOSIS, Artisan, and a University of California, Davis, Faculty Research Grant. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF) or other sponsors.

VCL Lab | ECE Dept. | UC Davis