Adapteva
Adapteva is a fabless semiconductor company focusing on low power many core microprocessor design. The company was the second company to announce a design with 1,000 specialized processing cores on a single integrated circuit.
Adapteva was founded in 2008 with the goal of bringing a ten times advancement in floating-point performance per watt for the mobile device market. Products are based on its Epiphany multi-core multiple instruction, multiple data architecture and its Parallella Kickstarter project promoting "a supercomputer for everyone" in September 2012.
The company name is a combination of "adapt" and the Hebrew word "Teva" meaning nature.
History
Adapteva was founded in March 2008, by Andreas Olofsson. The company was founded with the goal of bringing a 10× advancement in floating-point processing energy efficiency for the mobile device market. In May 2009, Olofsson had a prototype of a new type of massively parallel multi-core computer architecture. The initial prototype was implemented in 65 nm and had 16 independent microprocessor cores. The initial prototypes enabled Adapteva to secure US$1.5 million in series-A funding from BittWare, a company from Concord, New Hampshire, in October 2009.Adapteva's first commercial chip product started sampling to customers in early May 2011 and they soon thereafter announced the capability to put up to 4,096 cores on a single chip.
The Epiphany III, was announced in October 2011 using 28 nm and 65 nm manufacturing processes.
Products
Adapteva's main product family is the Epiphany scalable multi-core MIMD architecture. The Epiphany architecture could accommodate chips with up to 4,096 RISC out-of-order microprocessors, all sharing a single 32-bit flat memory space. Each RISC processor in the Epiphany architecture is superscalar with 64× 32-bit unified register file microprocessor operating up to 1 GHz and capable of 2 GFLOPS. Epiphany's RISC processors use a custom instruction set architecture optimised for single-precision floating-point, but are programmable in high level ANSI C using a standard GNU-GCC tool chain. Each RISC processor has 32 KB of local memory. Code and stack space should be in that local memory; in addition temporary data should fit there for full speed. Data can also be used from other processor cores local memory at a speed penalty, or off-chip RAM with much larger speed penalty.The memory architecture is does not employ explicit hierarchy of hardware caches, similar to the Sony/Toshiba/IBM Cell processor, but with the additional benefit of off-chip and inter-core loads and stores being supported. It is a hardware implementation of partitioned global address space.
This eliminated the need for complex cache coherency hardware, which places a practical limit on the number of cores in a traditional multicore system. The design allows the programmer to leverage greater foreknowledge of independent data access patterns to avoid the runtime cost of figuring this out. All processor nodes are connected through a network on chip, allowing efficient message passing.
Scalability
The architecture is designed to scale almost indefinitely, with 4 e-links allowing multiple chips to be combined in a grid topology, allowing for systems with thousands of cores.Multi-core coprocessors
On August 19, 2012, Adapteva posted some specifications and information about Epiphany multi-core coprocessors.Technical info for | E16G301 | E64G401 |
Cores | 16 | 64 |
Core MHz | 1000 | 800 |
Core GFLOPS | 2 | 1.6 |
"Sum GHz" | 16 | 51.2 |
Sum GFLOPS | 32 | 102 |
mm² | 8.96 | 8.2 |
nm | 65 | 28 |
W def. | 0.9 | 1.4 |
W max. | 2 | 2 |
In September 2012, a 16-core version, the Epiphany-III, was produced using 65 nm and engineering samples of 64-core Epiphany-IV were produced using 28 nm GlobalFoundries process.
The primary markets for the Epiphany multi-core architecture include:
- Smartphone applications such as real-time facial recognition, speech recognition, translation, and augmented reality.
- Next generation supercomputers requiring drastically better energy efficiency to allow systems to scale to exaflop computing levels.
- Floating-point acceleration in embedded systems based on field-programmable gate array architectures.
Parallella project
Size of board is planned to be.
The Kickstarter campaign raised US$898,921. Raising US$3 million goal was unsuccessful, so no 64-core version of Parallella will be mass-produced. Kickstarter users having donated more than US$750 will get "parallella-64" variant with 64-core coprocessor.
Epiphany V
By 2016, the firm had taped out a 1024-core 64-bit variant of their Epiphany architecture that featured: larger local stores, 64-bit addressing, double-precision floating-point arithmetic or SIMD single-precision, and 64-bit integer instructions, implemented in the 16 nm process node. This design included instruction set enhancements aimed at deep-learning and cryptography applications. In July 2017, Adapteva's founder became a DARPA program manager and announced that the Epiphany V was "unlikely" to become available as a commercial product.Performance
Joel Hruska from ExtremeTech had the following opinion about the 64-core Parallella project, prior to the 1024-core design: "Adapteva is drastically overselling what the Epiphany IV can actually deliver. 16–64 tiny cores with small amounts of memory, no local caches, and a relatively low clock speed can still be useful in certain workloads, but contributors aren't buying a supercomputer — they're buying the real-world equivalent of a self-sealing stem bolt."The criticism that the Epiphany chips cannot provide anywhere near the performance of modern supercomputers is nevertheless correct: actually, Epiphany chips with 16-cores or 64-cores and or 100 GFLOPs in single-precision, respectively, do not even match the floating-point performance of modern desktop PC processors a fact that is acknowledged by Adapteva.
However, the latest Parallella boards with E16 Epiphany chips can be compared to many historic supercomputers in terms of raw performance, and can certainly be used for parallel code development. The architectural similarities to supercomputers make the Parallella a potentially useful development system, compared to traditional SMP machines.
The point being that for a power envelope of 5 W and in terms of GFLOPS/mm2 of chip die space, the current E16 Epiphany chips provide vastly more performance than anything else available to date, with an architecture designed to scale, and applicable to more than just embarrassingly parallel GPU tasks.. It is also suitable for DSP-like tasks where data could be fed directly on chip, making it ideal for robotics & other intelligent sensor applications. The architecture also allows parallella boards to be combined into a cluster with a fast inter-chip 'eMesh' interconnect, extending the logical grid of cores.
The 16-core Parallella has roughly 5.0 GFLOPs/W, and the 64-core Epiphany-IV made with 28 nm estimated as 50 GFLOPs/W, and 32-board system based on them has 15 GFLOPS/W. For comparison, top GPUs from AMD and Nvidia reached 10 GFLOPs/W for single-precision in 2009–2011 timeframe.