Asynchronous Array of Simple Processors (AsAP) project
"You know you have achieved perfection in design, not when you have nothing
more to add, but when you have nothing more to take away"
- Antoine de Saint-Exupéry
"People who are really serious about software should make their
own hardware"
- Alan Kay
Members of the VCL are currently focused on the circuits, functional
units, architecture, interconnection network, algorithms, applications,
and chip design for a high-performance and energy-efficient
processing system targeting computationally-demanding applications.
The single-chip processing system is comprised of a large number of
fine-grain asynchronously-operating programmable processors connected
by a reconfigurable network.
We have designed 36-processor and 167-processor chips that have been
fabricated and found to be fully functional. We believe they are
among the highest clock rate fabricated processors designed in any
university--in fact, we believe the 167-core chip is the fastest.
Details of the chips and applications were presented at/in: ISSCC, ISVLSI,
HotChips, ICCD, IEEE MICRO, IEEE TVLSI, IEEE JSSC, ISCAS, Symp on
VLSI Circuits, SiPS, Asilomar, EURASIP, and many other venues. A complete
list of publications can be found at:
www.ece.ucdavis.edu/vcl/#Publications
and a concise summary can be found on the
AsAP wikipedia page.
Target Applications
- Digital signal processing (DSP)
- Embedded
- Multimedia
- Enterprise kernels (project recently started)
- Scientific kernels (project recently started)
Project Objectives
- High energy efficiency (at all times)
- High performance (capable of)
- High throughput
- Low latency
- Small circuit area (relatively)
- Easy to program (relatively)
- Well suited for future fabrication technologies
- Billions of transistors
- Many faulty circuits across a chip
- Large variations across chip
- Very high one-time fabrication and design costs
- High design costs favor tiled homogeneous architectures
- High fabrication/mask costs per chip tape-out
- Hardware works well over a broad range of applications
Key features
- Very small processor tiles
- Vast numbers of processors per chip
- Processors can be used for non-traditional purposes in highly efficient implementations
- Processors dissipate very little energy per workload when active
- Processors dissipate very low power when idle
- Essentially no algorithm-specific hardware in the programmable processors
AsAP 1 chip (36 processors)
Key Achievements
- Highest clock rate processor designed in a
university (at the time of publication)
- Fully functional
Architecture
The first generation AsAP processor contains 36 identical processors with
independent clock domains. Each processor is a reduced complexity
programmable DSP with small memories, which can dramatically increase
system area efficiency and energy efficiency. Each processor can
receive data from any two neighbors and send data to any of its four
neighbors. The block diagram of AsAP processor is shown below.
Below is a photo micrograph of the fully-functional single-chip 36-core AsAP
processor array.
Chip design
We used a number of CAD tools from Cadence and Synopsys
for our chip design.
This page contains an overview of our
CAD tool flow
including progress and issues.
Here are some topics and issues
we considered before the tape out.
Test board
The AsAP test board is the custom-designed printed circuit board shown
on the right and is designed to work with a commercial Memec FPGA board
shown on the left.
Applications
Several DSP tasks and applications such as FFT, JPEG core encoder and
802.11a/802.11g wireless transmitter are mapped onto AsAP processor.
802.11a/802.11g implementation using 22 processors is
shown below. It consumes 407 mW at 300 MHz and achieve 30% of 54 Mb/s
performance. These results are around 10 times higher performance and
35x - 75x lower energy dissipation than 8-way VLIW TI C62x (according
one implementation reported at ICC02).
Results (First Generation)
AsAP processor operates at 475 MHz; and each processor dissipates
32 mW while executing applications, 84 mW while 100% active,
and 144 mW worst-case at 1.8 V. Most of AsAP's area (66%) is for the
core which is a high area utilization.
Each processor occupies 0.66 mm2,
which is more than 20 times smaller than
the other traditional processors such as ARM. AsAP processor also
achieves more than 5 times higher performance density and energy
efficiency compared with others, as shown at below.
AsAP 2 chip (167 processors)
Key Features
- 164 programmable processors
- Configurable Fast Fourier Transform (FFT) processor
- Configurable Viterbi decoder processor
- Configurable video motion estimation processor
- 3 16 KB shared memories
- Per-processor dynamic voltage scaling
- Per-processor dynamic clock frequency scaling
- All processors and shared memories clocked by fully-independent
clock oscillators
- Circuit-switched long-distance-capable inter-processor network
Key Achievements
- Highest clock rate processor designed in a
university (1.2 GHz)
- Fully functional
Below is the die micrograph of the fully-functional single-chip 167-processor
AsAPs2 array processor.
Key data
Overall Chip |
CMOS Technology |
65 nm ST Microelectronics low-leakage |
Transistors |
55 million |
Area |
39.4 mm2 |
Single Programmable Processor Tile |
Transistors |
325,000 |
Area |
0.17 mm2 |
Max clock frequency |
1.2 GHz @ 1.3 V |
Power (100% active) |
62 mW @ 1.2 GHz, 1.3 V |
47 mW @ 1.06 GHz, 1.2 V |
3.4 mW @ 260 MHz, 0.75 V
(Equivalent to 1.0 Tera-op/sec @ 6.5 Watts) |
608 μW @ 66 MHz, 0.675 V |
Fast Fourier Transform (FFT) Accelerator |
Area |
1.01 mm2 |
Max clock frequency |
866 MHz @ 1.3 V |
Viterbi Decoder Accelerator |
Area |
0.17 mm2 |
Max clock frequency |
894 MHz @ 1.3 V |
Video Motion Estimation Accelerator |
Area |
0.67 mm2 |
Max clock frequency |
938 MHz @ 1.3 V |
(3) 16 KB Shared Memories |
Area |
0.34 mm2 |
Max clock frequency |
1.3 GHz @ 1.3 V |
Development boards
Work has begun on two
development
boards: one for high-speed AsAP array emulation on an FPGA, and the
other to host our planned CMOS chip.
Key features for both boards include:
- On-board D/A converter(s)
- On-board A/D converter(s)
- Simple interface to a workstation for programming/configuration
- Simple interface to a workstation for data in and data out (may utilize
the same interface using on-board memory for buffering)
- FPGA-only board: Sufficient on-board RAM for future non-AsAP projects
- FPGA-only board: Sufficient CLBs/slices for at least 9 AsAP processors,
ideal goal is 19, 20, or 22 (802.11a transmitter)
Information on the AsAP version 1 development board can be found at: http://www.ece.ucdavis.edu/vcl/asap/asap_v1/asap_ver1.shtml.
Measurements and Characterization
Here is our checklist of things to measure and
characterize in the AsAP1 and AsAP2 chips.
Acknowledgments
This material is based upon work supported by Intel Corporation,
UC MICRO,
the National Science Foundation under Grant No. 0430090
and CAREER grant No. 0546907,
and
a UCD Faculty Research Grant.
Any opinions, findings and conclusions or recomendations expressed in
this material are those of the author(s) and do not necessarily reflect
the views of the National Science Foundation (NSF).
VCL
| ECE Dept.
| UC Davis
Last update: April 12, 2013
Keywords:
many-core, multi-core, array processor, homogenous, heterogeneous,
NoC, network on chip, interconnect, mesh,
GALS, globally asynchronous locally synchronous,
electrical engineering, computer engineering,
university, academic, department, group, lab, laboratory,
research development,
chip, VLSI, CMOS, circuit,
low power, energy efficient, FFT, DCT, viterbi, FIR, IIR,
compression, communication, coding, convolution, correlation, encryption,
image, video, JPEG, multimedia, wireless, OFDM, radar, sonor, medical imaging,
MRI, magnetic resonance imaging, biological imaging,
802.11a, 802.11g, wireless LAN, transmitter, receiver.
"But why would Mr. Jobs even have tried to skirt the law, given how much
was at stake? Mr. Isaacson said that he couldn't comment on specific
cases, but noted that "over and over, people referred to his reality
distortion field." Mr. Isaacson added, "The rules just didn't apply to
him, whether he was getting a license plate that let him use handicapped
parking or building products that people said weren't possible. Most
of the time he was right, and he got away with it."
"Mr. Lam, of The Wirecutter, said Mr. Jobs's seeming indifference to the
law wasn't unusual in Silicon Valley. "Look at Bill Gates," he said. "He
was arrested for speeding and driving without a license. And Microsoft
had its problems with antitrust law. It's just a characteristic of
young tech entrepreneurs to look at the rules and question them. You
can't get into this game without a healthy distaste for the status quo."
-NYT, "Steve Jobs, a Genius at Pushing Boundaries," 2014/05/02