Predication (computer architecture)


In computer science, predication is an architectural feature that provides an alternative to conditional transfer of control, implemented by machine instructions such as conditional branch, conditional call, conditional return, and branch tables. Predication works by executing instructions from both paths of the branch and only permitting those instructions from the taken path to modify architectural state. The instructions from the taken path are permitted to modify architectural state because they have been associated with a predicate, a Boolean value used by the instruction to control whether the instruction is allowed to modify the architectural state or not.

Overview

Most computer programs contain conditional code, which will be executed only under specific conditions depending on factors that cannot be determined beforehand, for example depending on user input. As the majority of processors simply execute the next instruction in a sequence, the traditional solution is to insert branch instructions that allow a program to conditionally branch to a different section of code, thus changing the next step in the sequence. This was sufficient until designers began improving performance by implementing instruction pipelining, a method which is slowed down by branches. For a more thorough description of the problems which arose, and a popular solution, see branch predictor.
Luckily, one of the more common patterns of code that normally relies on branching has a more elegant solution. Consider the following pseudocode:

if condition

else
;

On a system that uses conditional branching, this might translate to machine instructions looking similar to:

branch-if-condition to label1
dosomethingelse
branch-to label2
label1:
dosomething
label2:
...

With predication, all possible branch paths are coded inline, but some instructions execute while others do not. The basic idea is that each instruction is associated with a predicate and that the instruction will only be executed if the predicate is true. The machine code for the above example using predication might look something like this:

dosomething
dosomethingelse

Besides eliminating branches, less code is needed in total, provided the architecture provides predicated instructions. While this does not guarantee faster execution in general, it will if the dosomething and dosomethingelse blocks of code are short enough.
Predication's simplest form is partial predication, where the architecture has conditional move or conditional select instructions. Conditional move instructions write the contents of one register over another only if the predicate's value is true, whereas conditional select instructions choose which of two registers has its contents written to a third based on the predicate's value. A more generalized and capable form is full predication. Full predication has a set of predicate registers for storing predicates and most instructions in the architecture have a register specifier field to specify which predicate register supplies the predicate.

Advantages

The main purpose of predication is to avoid jumps over very small sections of program code, increasing the effectiveness of pipelined execution and avoiding problems with the cache. It also has a number of more subtle benefits:
Predication's primary drawback is in increased encoding space. In typical implementations, every instruction reserves a bitfield for the predicate specifying under what conditions that instruction should have an effect. When available memory is limited, as on embedded devices, this space cost can be prohibitive. However, some architectures such as Thumb-2 are able to avoid this issue. Other detriments are the following:
Predication is most effective when paths are balanced or when the longest path is the most frequently executed, but determining such a path is very difficult at compile time, even in the presence of profiling information.

History

Predicated instructions were popular in European computer designs of the 1950s, including the Mailüfterl, the Zuse Z22, the ZEBRA, and the Electrologica X1. The IBM ACS-1 design of 1967 allocated a "skip" bit in its instruction formats, and the CDC Flexible Processor in 1976 allocated three conditional execution bits in its microinstruction formats.
Hewlett-Packard's PA-RISC architecture had a feature called nullification, which allowed most instructions to be predicated by the previous instruction. IBM's POWER architecture featured conditional move instructions. POWER's successor, PowerPC, dropped these instructions. Digital Equipment Corporation's Alpha architecture also featured conditional move instructions. MIPS gained conditional move instructions in 1994 with the MIPS IV version; and SPARC was extended in Version 9 with conditional move instructions for both integer and floating-point registers.
In the Hewlett-Packard/Intel IA-64 architecture, most instructions are predicated. The predicates are stored in 64 special-purpose predicate registers; and one of the predicate registers is always true so that unpredicated instructions are simply instructions predicated with the value true. The use of predication is essential in IA-64's implementation of software pipelining because it avoids the need for writing separated code for prologs and epilogs.
In the x86 architecture, a family of conditional move instructions were added to the architecture by the Intel Pentium Pro processor. The CMOV instructions copied the contents of the source register to the destination register depending on a predicate supplied by the value of the flag register.
In the ARM architecture, the original 32-bit instruction set provides a feature called conditional execution that allows most instructions to be predicated by one of 13 predicates that are based on some combination of the four condition codes set by the previous instruction. ARM's Thumb instruction set dropped conditional execution to reduce the size of instructions so they could fit in 16 bits, but its successor, Thumb-2 overcame this problem by using a special instruction which has no effect other than to supply predicates for the following four instructions. The 64-bit instruction set introduced in ARMv8-A replaced conditional execution with conditional selection instructions.
Some SIMD instruction sets, like AVX2, have the ability to use a logical mask to conditionally load/store values to memory, a parallel form of the conditional move. This form of prediction is also used in single instruction, multiple threads GPU computing, where a computing core executes the code of a branch with unmet condition but does not commit the results computed in that execution path.