Double compare-and-swap


Double compare-and-swap is an atomic primitive proposed to support certain concurrent programming techniques. DCAS takes two not necessarily contiguous memory locations and writes new values into them only if they match pre-supplied "expected" values; as such, it is an extension of the much more popular compare-and-swap operation.
DCAS is sometimes confused with the double-width compare-and-swap implemented by instructions such as x86 CMPXCHG16B. DCAS, as discussed here, handles two discontiguous memory locations, typically of pointer size, whereas DWCAS handles two adjacent pointer-sized memory locations.
In his doctoral thesis, Michael Greenwald recommended adding DCAS to modern hardware, showing it could be used to create easy-to-apply yet efficient software transactional memory. Greenwald points out that an advantage of DCAS vs CAS is that higher-order CASn can be implemented in O with DCAS, but requires O time with unary CAS, where p is the number of contending processes.
One of the advantages of DCAS is the ability to implement atomic deques with relative ease.
More recently, however, it has been shown that an STM can be implemented with comparable properties using only CAS. In general however, DCAS is not a silver bullet: implementing lock-free and wait-free algorithms using it is typically just as complex and error-prone as for CAS.
Motorola at one point included DCAS in the instruction set for its 68k series; however, the slowness of DCAS relative to other primitives led to its avoidance in practical contexts., DCAS is not natively supported by any widespread CPUs in production.
The generalization of DCAS to more than two addresses is sometimes called MCAS ; MCAS can be implemented by a nestable LL/SC, but such a primitive is not directly available in hardware. MCAS can be implemented in software in terms of DCAS, in various ways. In 2013, Trevor Brown, Faith Ellen, and Eric Ruppert have implemented in software a multi-address LL/SC extension that while being more restrictive than MCAS enabled them, via some automated code generation, to implement one of the best performing concurrent binary search tree, slightly beating the JDK CAS-based skip list implementation.
In general, DCAS can be provided by a more expressive hardware transactional memory. IBM POWER8 and Intel Intel TSX provide working implementations of transactional memory. Sun's cancelled Rock processor would have supported it as well.