FMA4 is supported in AMD processors starting with the Bulldozer architecture. FMA4 was performed in hardware before FMA3 was. Support for FMA4 was removed since Zen 1.
FMA3 and FMA4 instructions have almost identical functionality, but are not compatible. Both contain fused multiply–add instructions for floating-point scalar and SIMD operations, but FMA3 instructions have three operands, while FMA4 ones have four. The FMA operation has the form d = round, where the round function performs a rounding to allow the result to fit within the destination register if there are too many significant bits to fit within the destination. The four-operand form allows a, b, c and d to be four different registers, while the three-operand form requires that d be the same register as a, b or c. The three-operand form makes the code shorter and the hardware implementation slightly simpler, while the four-operand form provides more programming flexibility. See XOP instruction set for more discussion of compatibility issues between Intel and AMD.
Supported commands include VFMADD, VFMADDSUB, VFMSUBADD, VFMSUB, VFNMADD, VFNMSUB. Explicit order of operands is included in the mnemomic using numbers "132", "213", and "231", as well as operand format and size.
* Zen: WikiChip's testing shows FMA4 still appears to work despite not being officially supported and not even reported by CPUID. This has also been confirmed by Agner. But other tests gave wrong results. AMD Official Web Site FMA4 Support Note ZEN CPUs = AMD ThreadRipper 1900x, R7 Pro 1800, 1700, R5 Pro 1600, 1500, R3 Pro 1300, 1200, R3 2200G, R5 2400G.
Intel
* It is uncertain whether future Intel processors will support FMA4, due to Intel's announced change to FMA3.
Excerpt from FMA4
History
The incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time. The history can be summarized as follows:
August 2007: AMD announces the SSE5 instruction set, which includes 3-operand FMA instructions. A new coding scheme is introduced for allowing instructions to have three operands.
April 2008: Intel announces their AVX and FMA instruction sets, including 4-operand FMA instructions. The coding of these instructions uses the new VEX coding scheme, which is more flexible than AMD's DREX scheme.
December 2008: Intel changes the specification for their FMA instructions from 4-operand to 3-operand instructions. The VEX coding scheme is still used.
May 2009: AMD changes the specification of their FMA instructions from the 3-operand DREX form to the 4-operand VEX form, compatible with the April 2008 Intel specification rather than the December 2008 Intel specification.
February 2017 The first generation of AMD Ryzen processors officially supports FMA3, but not FMA4 according to the CPUID instruction. There has been confusion regarding whether FMA4 was implemented or not on this processor due to errata in the initial patch to the GNU Binutils package that has since been rectified. While the FMA4 instructions seem to work according to some tests, they can also give wrong results. Additionally, the initial Ryzen CPUs could be crashed by a particular sequence of FMA3 instructions. It has since been resolved by an updated CPU microcode.
Compiler and assembler support
Different compilers provide different levels of support for FMA:
GCC supports FMA4 with -mfma4 since version 4.5.0 and FMA3 with -mfma since version 4.7.0.