The chip consists of a 10x8 2D mesh network of cores and nominally operates at 4 GHz. Each core, called a tile, contains a processing engine and a 5-port wormhole-switched router with mesochronous interfaces, with a bandwidth of 80 GB/s and latency of 1.25 ns at 4 GHz. The processing engine in each tile contains two independent, 9-stage pipeline, single-precision floating-point multiplyaccumulator units, 3 KB of single-cycle instruction memory and 2 KB of data memory. Each FPMAC unit is capable of performing 2 single-precision floating-point operations per cycle. Each tile has thus an estimated peak performance of 16 GFLOPS at the standard configuration of 4 GHz. A 96-bit very long instruction word encodes up to eight operations per cycle. The custom instruction set includes instructions to send and receive packets into/from the chip's network and well as instructions for sleeping and waking a particular tile. Underneath each tile, a 256 KB SRAM module was 3D stacked, thus bringing memory nearer to the processor to increase overall memory bandwidth to 1 TB/s, at the expense of higher cost, thermal stress and latency, and a small total capacity of 20 MB. The network of Polaris was shown to have a bisection bandwidth of 1.6 Tbit/s at 3.16 GHz and 2.92 Tbit/s at 5.67 GHz. Other prominent features of the Teraflops Research chip include its fine-grained power management with 21 independent sleep regions on a tile and dynamic tile sleep, and very high energy efficiency with 27 GFLOPS/W theoretical peak at 0.6 V and 19.4 GFLOPS/W actual for stencil at 0.75 V.
Instruction type
Latency
FPMAC
9
LOAD/STORE
2
SEND/RECEIVE
2
JUMP/BRANCH
1
STALL/WFD
?
SLEEP/WAKE
6
Power
Source
0.60 V
1.0 GHz
0.32 TFLOPS
11 W
110 °C
0.675 V
1.0 GHz
0.32 TFLOPS
15.6 W
80 °C
0.70 V
1.5 GHz
0.48 TFLOPS
25 W
110 °C
0.70 V
1.35 GHz
0.43 TFLOPS
18 W
80 °C
0.75 V
1.6 GHz
0.51 TFLOPS
21 W
80 °C
0.80 V
2.1 GHz
0.67 TFLOPS
42 W
110 °C
0.80 V
2.0 GHz
0.64 TFLOPS
26 W
80 °C
0.85 V
2.4 GHz
0.77 TFLOPS
32 W
80 °C
0.90 V
2.6 GHz
0.83 TFLOPS
70 W
110 °C
0.90 V
2.85 GHz
0.91 TFLOPS
45 W
80 °C
0.95 V
3.16 GHz
1.0 TFLOPS
62 W
80 °C
1.00 V
3.13 GHz
1.0 TFLOPS
98 W
110 °C
1.00 V
3.8 GHz
1.22 TFLOPS
78 W
80 °C
1.05 V
4.2 GHz
1.34 TFLOPS
82 W
80 °C
1.10 V
3.5 GHz
1.12 TFLOPS
135 W
110 °C
1.10 V
4.5 GHz
1.44 TFLOPS
105 W
80 °C
1.15 V
4.8 GHz
1.54 TFLOPS
128 W
80 °C
1.20 V
4.0 GHz
1.28 TFLOPS
181 W
110 °C
1.20 V
5.1 GHz
1.63 TFLOPS
152 W
80 °C
1.25 V
5.3 GHz
1.70 TFLOPS
165 W
80 °C
1.30 V
4.4 GHz
1.39 TFLOPS
?
110 °C
1.30 V
5.5 GHz
1.76 TFLOPS
210 W
80 °C
1.35 V
5.67 GHz
1.81 TFLOPS
230 W
80 °C
1.40 V
4.8 GHz
1.52 TFLOPS
?
110 °C
Issues
Intel aimed to help software development for the new exotic architecture by creating a new programming model, especially for the chip, called Ct. The model never gained the following Intel hoped for and has been eventually incorporated into Intel Array Building Blocks, a now defunct C++ library.