Zen (first generation microarchitecture)
Zen is the codename for the first iteration in a family of computer processor microarchitectures of the same name from AMD. It was first used with their Ryzen series of CPUs in February 2017. The first Zen-based preview system was demonstrated at E3 2016, and first substantially detailed at an event hosted a block away from the Intel Developer Forum 2016. The first Zen-based CPUs codenamed "Summit Ridge" reached the market in early March 2017, Zen-derived Epyc server processors launched in June 2017 and Zen-based APUs arrived in November 2017.
Zen is a clean sheet design that differs from AMD's previous long-standing Bulldozer architecture. Zen-based processors use a 14 nm FinFET process, are reportedly more energy efficient, and can execute significantly more instructions per cycle. SMT has been introduced, allowing each core to run two threads. The cache system has also been redesigned, making the L1 cache write-back. Zen processors use three different sockets: desktop and mobile Ryzen chips use the AM4 socket, bringing DDR4 support; the high-end desktop Zen-based Threadripper chips support quad-channel DDR4 RAM and offer 64 PCIe 3.0 lanes, using the TR4 socket; and Epyc server processors offer 128 PCI 3.0 lanes and octa-channel DDR4 using the SP3 socket.
Zen is based on a SoC design. The memory, PCIe, SATA, and USB controllers are incorporated into the same chip as the processor cores. This has advantages in bandwidth and power, at the expense of chip complexity and die area. This SoC design allows the Zen microarchitecture to scale from laptops and small-form factor mini PCs to high-end desktops and servers.
By 2020, 260 million Zen cores have already been shipped by AMD.
Design
According to AMD, the main focus of Zen is on increasing per-core performance.New or improved features include:
- The L1 cache has been changed from write-through to write-back, allowing for lower latency and higher bandwidth.
- SMT architecture allows for two threads per core, a departure from the CMT design used in the previous Bulldozer architecture. This is a feature previously offered in some IBM, Intel and Oracle processors.
- A fundamental building block for all Zen-based CPUs is the Core Complex consisting of four cores and their associated caches. Processors with more than four cores consist of multiple CCXs connected by Infinity Fabric. Processors with non-multiple-of-four core counts have some cores disabled.
- Four ALUs, two AGUs/load–store units, and two floating-point units per core.
- Newly introduced "large" micro-operation cache.
- Each SMT core can dispatch up to six micro-ops per cycle.
- Close to 2× faster L1 and L2 bandwidth, with total L3 cache bandwidth up 5×.
- Clock gating.
- Larger retire, load, and store queues.
- Improved branch prediction using a hashed perceptron system with Indirect Target Array similar to the Bobcat microarchitecture, something that has been compared to a neural network by AMD engineer Mike Clark.
- The branch predictor is decoupled from the fetch stage.
- A dedicated stack engine for modifying the stack pointer, similar to that of Intel Haswell and Broadwell processors.
- Move elimination, a method that reduces physical data movement to reduce power consumption.
- Binary compatibility with Intel's Skylake microarchitecture :
- * RDSEED support, a set of high-performance hardware random number generator instructions introduced in Broadwell.
- * Support for the SMAP, SMEP, XSAVEC/XSAVES/XRSTORS, and CLFLUSHOPT instructions.
- * ADX support.
- * SHA support.
- CLZERO instruction for clearing a cache line. Useful for handling ECC-related Machine-check exceptions.
- PTE coalescing, which combines 4kiB page tables into 32kiB page size.
- "Pure Power".
- Smart Prefetch.
- Precision Boost.
- eXtended Frequency Range, an automated overclocking feature which boosts clock speeds beyond the advertised turbo frequency.
Each Zen core can decode four instructions per clock cycle and includes a micro-op cache which feeds two schedulers, one each for the integer and floating point segments. Each core has two address generation units, four integer units, and four floating point units. Two of the floating point units are adders, and two are multiply-adders. However, using multiply-add-operations may prevent simultaneous add operation in one of the adder units. There are also improvements in the branch predictor. The L1 cache size is 64 KiB for instructions per core and 32 KiB for data per core. The L2 cache size 512 KiB per core, and the L3 is 1–2 MB per core. L3 caches offer 5× the bandwidth of previous AMD designs.
History and development
AMD began planning the Zen microarchitecture shortly after re-hiring Jim Keller in August 2012. AMD formally revealed Zen in 2015.The team in charge of Zen was led by Keller and Zen Team Leader Suzanne Plummer. The Chief Architect of Zen was AMD Senior Fellow Michael Clark.
Zen was originally planned for 2017 following the ARM64-based K12 sister core, but on AMD's 2015 Financial Analyst Day it was revealed that K12 was delayed in favor of the Zen design, to allow it to enter the market within the 2016 timeframe, with the release of the first Zen-based processors expected for October 2016.
In November 2015, a source inside AMD reported that Zen microprocessors had been tested and "met all expectations" with "no significant bottlenecks found".
In December 2015, it was rumored that Samsung may have been contracted as a fabricator for AMD's 14 nm FinFET processors, including both Zen and AMD's then-upcoming Polaris GPU architecture. This was clarified by AMD's July 2016 announcement that products had been successfully produced on Samsung's 14 nm FinFET process. AMD stated Samsung would be used "if needed", arguing this would reduce risk for AMD by decreasing dependence on any one foundry.
In December 2019, AMD started putting out first generation Ryzen products built using the second generation Zen+ architecture.
Advantages over predecessors
Manufacturing process
Processors based on Zen use 14 nm FinFET silicon. These processors are reportedly produced at GlobalFoundries Prior to Zen, AMD's smallest process size was 28 nm, as utilized by their Steamroller and Excavator microarchitectures. The immediate competition, Intel's Skylake and Kaby Lake microarchitecture, are also fabricated on 14 nm FinFET; though Intel planned to begin the release of 10 nm parts later in 2017. In comparison to Intel's 14 nm FinFET, AMD claimed in February 2017 the Zen cores would be 10% smaller. Intel has later announced in July 2018 that 10nm mainstream processors should not be expected before the second half of 2019.For identical designs, these die shrinks would use less current at the same frequency. As CPUs are usually power limited, smaller transistors allow for either lower power at the same frequency, or higher frequency at the same power.
Performance
One of Zen's major goals in 2016 was to focus on performance per-core, and it was targeting a 40% improvement in instructions per cycle over its predecessor. Excavator, in comparison, offered 4–15% improvement over previous architectures. AMD announced the final Zen microarchitecture actually achieved 52% improvement in IPC over Excavator. The inclusion of SMT also allows each core to process up to two threads, increasing processing throughput by better use of available resources.The Zen processors also employ sensors across the chip to dynamically scale frequency and voltage. This allows for the maximum frequency to be dynamically and automatically defined by the processor itself based upon available cooling.
AMD has demonstrated an 8-core/16-thread Zen processor outperforming an equally-clocked Intel Broadwell-E processor in Blender rendering and HandBrake benchmarks.
Zen supports AVX2 but it requires two clock cycles to complete each AVX2 instruction compared to Intel's one. This difference was corrected in Zen 2.
Memory
Zen supports DDR4 memory and ECC.Pre-release reports stated APUs using the Zen architecture would also support High Bandwidth Memory. However, the first demonstrated APU did not use HBM. Previous APUs from AMD relied on shared memory for both the GPU and the CPU.
Power consumption and heat output
Processors built at the 14 nm node on FinFET silicon should show reduced power consumption and therefore heat over their 28 nm and 32 nm non-FinFET predecessors, or be more computationally powerful at equivalent heat output/power consumption.Zen also uses clock gating, reducing the frequency of underutilized portions of the core to save power. This comes from AMD's SenseMI technology, using sensors across the chip to dynamically scale frequency and voltage.
Enhanced security and virtualization support
Zen added support for AMD's Secure Memory Encryption and AMD's Secure Encrypted Virtualization. Secure Memory Encryption is real-time memory encryption done per page table entry. Encryption occurs on a hardware AES engine and keys are managed by the onboard "Security" Processor at boot time to encrypt each page, allowing any DDR4 memory to be encrypted. AMD SME also makes the contents of the memory more resistant to memory snooping and cold boot attacks.The Secure Encrypted Virtualization feature allows the memory contents of a virtual machine to be transparently encrypted with a key unique to the guest VM. The memory controller contains a high-performance encryption engine which can be programmed with multiple keys for use by different VMs in the system. The programming and management of these keys is handled by the AMD Secure Processor firmware which exposes an API for these tasks.
Connectivity
Incorporating much of the southbridge into the SoC, the Zen CPU includes SATA, USB, and PCI Express NVMe links. This can be augmented by available Socket AM4 chipsets which add connectivity options including additional SATA and USB connections, and support for AMD's Crossfire and Nvidia's SLI.AMD, in announcing its Radeon Instinct line, argued that the upcoming Zen-based Naples server CPU would be particularly suited for building deep learning systems. The 128 PCIe lanes per Naples CPU allows for eight Instinct cards to connect at PCIe x16 to a single CPU. This compares favorably to the Intel Xeon line, with only 40 PCIe lanes.
Features
CPUs
APUs
Products
The Zen architecture is used in the current-generation desktop Ryzen CPUs. It is also in Epyc server processors, and APUs.The first desktop processors without graphics processing units were initially expected to start selling at the end of 2016, according to an AMD roadmap; with the first mobile and desktop processors of the AMD Accelerated Processing Unit type following in late 2017. AMD officially delayed Zen until Q1 of 2017. In August 2016, an early demonstration of the architecture showed an 8-core/16-thread engineering sample CPU at 3.0 GHz.
In December 2016, AMD officially announced the desktop CPU line under the Ryzen brand for release in Q1 2017. It also confirmed Server processors would be released in Q2 2017, and mobile APUs in H2 2017.
On March 2, 2017, AMD officially launched the first Zen architecture-based octacore Ryzen desktop CPUs. The final clock speeds and TDPs for the 3 CPUs released in Q1 of 2017 demonstrated significant performance-per-watt benefits over the previous K15h architecture. The octacore Ryzen desktop CPUs demonstrated performance-per-watt comparable to Intel's Broadwell octacore CPUs.
In March 2017, AMD also demonstrated an engineering sample of a server CPU based on the Zen architecture. The CPU was configured as a dual-socket server platform with each CPU having 32 cores/64 threads.
Desktop processors
First Generation of Ryzen processors :Desktop APUs
Ryzen APUs are identified by either the G or GE suffix in their name.Mobile APUs
Embedded processors
In February 2018, AMD announced the V1000 series of embedded Zen+Vega APUs with four SKUs.Server processors
AMD announced in March 2017 that it would release a server platform based on Zen, codenamed Naples, in the second quarter of the year. The platform include 1- and 2-socket systems. The CPUs in multi-processor configurations communicate via AMD's Infinity Fabric. Each chip supports eight channels of memory and 128 PCIe 3.0 lanes, of which 64 lanes are used for CPU-to-CPU communication through Infinity Fabric when installed in a dual-processor configuration. AMD officially revealed Naples under the brand name Epyc in May 2017.On June 20, 2017, AMD officially released the Epyc 7000 series CPUs at a launch event in Austin, Texas.