ARM Cortex-A76


The ARM Cortex-A76 is a microarchitecture implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre. ARM states a 25% and 35% increase in integer and floating point performance, respectively.

Design

The Cortex-A76 serves as the successor of the ARM Cortex-A73 and ARM Cortex-A75, though based on a clean sheet design.
The Cortex-A76 frontend is a 4-wide decode out-of-order superscalar design. It can fetch 4 instructions per cycle. And rename and dispatch 4 Mops, and 8 µops per cycle. The out-of-order window size is 128 entries. The backend is 8 execution ports with a pipeline depth of 13 stages and the execution latencies of 11 stages.
The core supports unprivileged 32-bit applications, but privileged applications must utilize the 64-bit ARMv8-A ISA. It also supports Load acquire instructions, Dot Product instructions, PSTATE Speculative Store Bypass Safe bit and the speculation barriers instructions.
Memory bandwidth increased 90% relative to the A75. According to ARM, the A76 is expected to offer twice the performance of an A73 and is targeted beyond mobile workloads. The performance is targeted at "laptop class", including Windows 10 devices, competitive with Intel's Kaby Lake.
The Cortex-A76 support ARM's DynamIQ technology, expected to be used as high-performance cores when used in combination with Cortex-A55 power-efficient cores.

Neoverse N1

On February 20, 2019, Arm announced the Neoverse N1microarchitecture based on the Cortex-A76 redesigned for infrastructure/server applications. The reference design supports up to 64 or 128 Neoverse N1 cores.
Notable changes from the Cortex-A76:
The Cortex-A76 is available as SIP core to licensees, and its design makes it suitable for integration with other SIP cores into one die constituting a system on a chip.

Usage

ARM has also collaborated with Qualcomm for a semi-custom version of the Cortex-A76, used within their high-end Kryo 495 /Kryo 485, and also in their mid-range Kryo 460 and Kryo 470 CPUs. One of the modifications Qualcomm made was increasing reorder buffer to increase the out-of-order window size.
It is also used in the Exynos 990 and Exynos Auto V9. And the MediaTek Helio G90/G90T and Dimensity 800 and Dimensity 820.