Asynchronous circuit
In digital electronics, an asynchronous circuit, or self-timed circuit, is a sequential digital logic circuit which is not governed by a clock circuit or global clock signal. Instead it often uses signals that indicate completion of instructions and operations, specified by simple data transfer protocols. This type of circuit is contrasted with synchronous circuits, in which changes to the signal values in the circuit are triggered by repetitive pulses called a clock signal. Most digital devices today use synchronous circuits. However asynchronous circuits have the potential to be faster, and may also have advantages in lower power consumption, lower electromagnetic interference, and better modularity in large systems. Asynchronous circuits are an active area of research in digital logic design.
Synchronous vs asynchronous logic
circuits can be divided into combinational logic, in which the output signals depend only on the current input signals, and sequential logic, in which the output depends both on current input and on past inputs. In other words, sequential logic is combinational logic with memory. Virtually all practical digital devices require sequential logic. Sequential logic can be divided into two types, synchronous logic and asynchronous logic.- In synchronous logic circuits, an electronic oscillator generates a repetitive series of equally spaced pulses called the clock signal. The clock signal is applied to all the memory elements in the circuit, called flip-flops. The output of the flip-flops only changes when triggered by the edge of the clock pulse, so changes to the logic signals throughout the circuit all begin at the same time, at regular intervals synchronized by the clock. The output of all memory elements in a circuit is called the state of the circuit. The state of a synchronous circuit changes only on the clock pulse. The changes in signal require a certain amount of time to propagate through the combinational logic gates of the circuit. This is called propagation delay. The period of the clock signal is made long enough so the output of all the logic gates have time to settle to stable values before the next clock pulse. As long as this condition is met, synchronous circuits will operate stably, so they are easy to design.
- In asynchronous circuits, there is no clock signal, and the state of the circuit changes as soon as the inputs change. Since asynchronous circuits don't have to wait for a clock pulse to begin processing inputs, they can be faster than synchronous circuits, and their speed is theoretically limited only by the propagation delays of the logic gates. However, asynchronous circuits are more difficult to design and subject to problems not found in synchronous circuits. This is because the resulting state of an asynchronous circuit can be sensitive to the relative arrival times of inputs at gates. If transitions on two inputs arrive at almost the same time, the circuit can go into the wrong state depending on slight differences in the propagation delays of the gates. This is called a race condition. In synchronous circuits this problem is less severe because race conditions can only occur due to inputs from outside the synchronous system, called asynchronous inputs. Although some fully asynchronous digital systems have been built, today asynchronous circuits are typically used in a few critical parts of otherwise synchronous systems where speed is at a premium, such as signal processing circuits.
Theoretical foundation
Asynchronous logic is the logic required for the design of asynchronous digital systems. These function without a clock signal and so individual logic elements cannot be relied upon to have a discrete true/false state at any given time. Boolean logic is inadequate for this and so extensions are required. Karl Fant developed a theoretical treatment of this in his work Logically determined design in 2005 which used four-valued logic with null and intermediate being the additional values. This architecture is important because it is quasi-delay-insensitive. Scott Smith and Jia Di developed an ultra-low-power variation of Fant's Null Convention Logic that incorporates multi-threshold CMOS. This variation is termed Multi-threshold Null Convention Logic, or alternatively Sleep Convention Logic. Vadim Vasyukevich developed a different approach based upon a new logical operation which he called venjunction. This takes into account not only the current value of an element, but also its history.
Petri nets are an attractive and powerful model for reasoning about asynchronous circuits. However, Petri nets have been criticized for their lack of physical realism. Subsequent to Petri nets other models of concurrency have been developed that can model asynchronous circuits including the Actor model and process calculi.
Benefits
A variety of advantages have been demonstrated by asynchronous circuits, including both quasi-delay-insensitive circuits and less pure forms of asynchronous circuitry which use timing constraints for higher performance and lower area and power:- Robust handling of metastability of arbiters.
- Higher performance function units, which provide average-case completion rather than worst-case completion. Examples include speculative completion which has been applied to design parallel prefix adders faster than synchronous ones, and a high-performance double-precision floating point adder which outperforms leading synchronous designs.
- Early completion of a circuit when it is known that the inputs which have not yet arrived are irrelevant.
- Lower power consumption because no transistor ever transitions unless it is performing useful computation. Epson has reported 70% lower power consumption compared to synchronous design. Also, clock drivers can be removed which can significantly reduce power consumption. However, when using certain encodings, asynchronous circuits may require more area, which can result in increased power consumption if the underlying process has poor leakage properties.
- "Elastic" pipelines, which achieve high performance while gracefully handling variable input and output rates and mismatched pipeline stage delays.
- Freedom from the ever-worsening difficulties of distributing a high-fan-out, timing-sensitive clock signal.
- Better modularity and composability.
- Far fewer assumptions about the manufacturing process are required.
- Circuit speed adapts to changing temperature and voltage conditions rather than being locked at the speed mandated by worst-case assumptions.
- Immunity to transistor-to-transistor variability in the manufacturing process, which is one of the most serious problems facing the semiconductor industry as dies shrink.
- Less severe electromagnetic interference. Synchronous circuits create a great deal of EMI in the frequency band at their clock frequency and its harmonics; asynchronous circuits generate EMI patterns which are much more evenly spread across the spectrum.
- In asynchronous circuits, local signaling eliminates the need for global synchronization which exploits some potential advantages in comparison with synchronous ones. They have shown potential specifications in low power consumption, design reuse, improved noise immunity and electromagnetic compatibility. Asynchronous circuits are more tolerant to process variations and external voltage fluctuations.
- Less stress on the power distribution network. Synchronous circuits tend to draw a large amount of current right at the clock edge and shortly thereafter. The number of nodes switching drops off rapidly after the clock edge, reaching zero just before the next clock edge. In an asynchronous circuit, the switching times of the nodes are not correlated in this manner, so the current draw tends to be more uniform and less bursty.
Disadvantages
- Area overhead caused by an increase in the number of circuit elements. In some cases an asynchronous design may require up to double the resources of a synchronous design, due to addition of completion detection and design-for-test circuits.
- Fewer people are trained in this style compared to synchronous design.
- Synchronous designs are inherently easier to test and debug than asynchronous designs. However, this position is disputed by Fant, who claims that the apparent simplicity of synchronous logic is an artifact of the mathematical models used by the common design approaches.
- Clock gating in more conventional synchronous designs is an approximation of the asynchronous ideal, and in some cases, its simplicity may outweigh the advantages of a fully asynchronous design.
- Performance of asynchronous circuits may be reduced in architectures that require input-completeness.
- Lack of dedicated, asynchronous design-focused commercial EDA tools.
Communication
Protocols
There are two widely used protocol families which differ in the way communications are encoded:- two-phase handshake : Communications are represented by any wire transition; transitions from 0 to 1 and from 1 to 0 both count as communications.
- four-phase handshake : Communications are represented by a wire transition followed by a reset; a transition sequence from 0 to 1 and back to 0 counts as single communication.
Note that these basic distinctions do not account for the wide variety of protocols. These protocols may encode only requests and acknowledgements or also encode the data, which leads to the popular multi-wire data encoding. Many other, less common protocols have been proposed including using a single wire for request and acknowledgment, using several significant voltages, using only pulses or balancing timings in order to remove the latches.
Data encoding
There are two widely used data encodings in asynchronous circuits: bundled-data encoding and multi-rail encodingAnother common way to encode the data is to use multiple wires to encode a single digit: the value is determined by the wire on which the event occurs. This avoids some of the delay assumptions necessary with bundled-data encoding, since the request and the data are not separated anymore.
Bundled-data encoding
Bundled-data encoding uses one wire per bit of data with a request and an acknowledge signal; this is the same encoding used in synchronous circuits without the restriction that transitions occur on a clock edge. The request and the acknowledge are sent on separate wires with one of the above protocols. These circuits usually assume a bounded delay model with the completion signals delayed long enough for the calculations to take place.In operation, the sender signals the availability and validity of data with a request. The receiver then indicates completion with an acknowledgement, indicating that it is able to process new requests. That is, the request is bundled with the data, hence the name "bundled-data".
Bundled-data circuits are often referred to as micropipelines, whether they use a two-phase or four-phase protocol, even if the term was initially introduced for two-phase bundled-data.
Multi-rail encoding
Multi-rail encoding uses multiple wires without a one-to-one relationship between bits and wires and a separate acknowledge signal. Data availability is indicated by the transitions themselves on one or more of the data wires instead of with a request signal as in the bundled-data encoding. This provides the advantage that the data communication is delay-insensitive. Two common multi-rail encodings are one-hot and dual rail. The one-hot encoding represents a number in base n with a communication on one of the n wires. The dual-rail encoding uses pairs of wires to represent each bit of the data, hence the name "dual-rail"; one wire in the pair represents the bit value of 0 and the other represents the bit value of 1. For example, a dual-rail encoded two bit number will be represented with two pairs of wires for four wires in total. During a data communication, communications occur on one of each pair of wires to indicate the data's bits. In the general case, an m n encoding represent data as m words of base n.Dual-rail encoding with a four-phase protocol is the most common and is also called three-state encoding, since it has two valid states and a reset state. Another common encoding, which leads to a simpler implementation than one-hot, two-phase dual-rail is four-state encoding, or level-encoded dual-rail, and uses a data bit and a parity bit to achieve a two-phase protocol.
Asynchronous CPU
Asynchronous CPUs are one of.Unlike a conventional processor, a clockless processor has no central clock to coordinate the progress of data through the pipeline.
Instead, stages of the CPU are coordinated using logic devices called "pipeline controls" or "FIFO sequencers." Basically, the pipeline controller clocks the next stage of logic when the existing stage is complete. In this way, a central clock is unnecessary. It may actually be even easier to implement high performance devices in asynchronous, as opposed to clocked, logic:
- components can run at different speeds on an asynchronous CPU; all major components of a clocked CPU must remain synchronized with the central clock;
- a traditional CPU cannot "go faster" than the expected worst-case performance of the slowest stage/instruction/component. When an asynchronous CPU completes an operation more quickly than anticipated, the next stage can immediately begin processing the results, rather than waiting for synchronization with a central clock. An operation might finish faster than normal because of attributes of the data being processed, or because of the presence of a higher voltage or bus speed setting, or a lower ambient temperature, than 'normal' or expected.
- lower power dissipation for a given performance level, and
- highest possible execution speeds.
Despite the difficulty of doing so, numerous asynchronous CPUs have been built, including:
- the ORDVAC and the ILLIAC I
- the Johnniac
- the WEIZAC
- the ILLIAC II
- The Victoria University of Manchester built Atlas
- The ICL 1906A and 1906S mainframe computers, part of the 1900 series and sold from 1964 for over a decade by ICL
- The Honeywell CPUs 6180 and Series 60 Level 68 upon which Multics ran asynchronously
- Soviet bit-slice microprocessor modules produced as К587, К588 and К1883
- The Caltech Asynchronous Microprocessor, the world-first asynchronous microprocessor ;
- the ARM-implementing AMULET ;
- the asynchronous implementation of MIPS R3000, dubbed ;
- several versions of the XAP processor experimented with different asynchronous design styles: a bundled data XAP, a 1-of-4 XAP, and a 1-of-2 XAP ;
- an ARM-compatible processor designed by Z. C. Yu, S. B. Furber, and L. A. Plana; "designed specifically to explore the benefits of asynchronous design for security sensitive applications";
- the "Network-based Asynchronous Architecture" processor that executes a subset of the MIPS architecture instruction set;
- the ARM996HS processor from Handshake Solutions
- the HT80C51 processor from Handshake Solutions
- the SEAforth multi-core processor from Charles H. Moore.
- the GA144 multi-core processor from Charles H. Moore.
- TAM16: 16-bit asynchronous microcontroller IP core
- the Aspida asyncronous DLX core The asynchronous open-source DLX processor has been successfully implemented both in ASIC and FPGA versions.
DEC PDP-16 Register Transfer Modules allowed the experimenter to construct asynchronous, 16-bit processing elements. Delays for each module were fixed and based on the module's worst-case timing.
The Caltech Asynchronous Microprocessor was the first asynchronous microprocessor. Caltech designed and manufactured the world's first fully Quasi Delay Insensitive processor. During demonstrations, the researchers loaded a simple program which ran in a tight loop, pulsing one of the output lines after each instruction. This output line was connected to an oscilloscope. When a cup of hot coffee was placed on the chip, the pulse rate naturally slowed down to adapt to the worsening performance of the heated transistors. When liquid nitrogen was poured on the chip, the instruction rate shot up with no additional intervention. Additionally, at lower temperatures, the voltage supplied to the chip could be safely increased, which also improved the instruction rate – again, with no additional configuration.
In 2004, Epson manufactured the world's first bendable microprocessor called ACT11, an 8-bit asynchronous chip.
Synchronous flexible processors are slower, since bending the material on which a chip is fabricated causes wild and unpredictable variations in the delays of various transistors, for which worst-case scenarios must be assumed everywhere and everything must be clocked at worst-case speed. The processor is intended for use in smart cards, whose chips are currently limited in size to those small enough that they can remain perfectly rigid.
In 2014, IBM announced a SyNAPSE-developed chip that runs in an asynchronous manner, with one of the highest transistor counts of any chip ever produced. IBM's chip consumes orders of magnitude less power than traditional computing systems on pattern recognition benchmarks.