Transactional Synchronization Extensions
Transactional Synchronization Extensions, also called Transactional Synchronization Extensions New Instructions, is an extension to the x86 instruction set architecture that adds hardware transactional memory support, speeding up execution of multi-threaded software through lock elision. According to different benchmarks, TSX/TSX-NI can provide around 40% faster applications execution in specific workloads, and 4-5 times more database transactions per second.
TSX/TSX-NI was documented by Intel in February 2012, and debuted in June 2013 on selected Intel microprocessors based on the Haswell microarchitecture. Haswell processors below 45xx as well as R-series and K-series SKUs do not support TSX/TSX-NI. In August 2014, Intel announced a bug in the TSX/TSX-NI implementation on current steppings of Haswell, Haswell-E, Haswell-EP and early Broadwell CPUs, which resulted in disabling the TSX/TSX-NI feature on affected CPUs via a microcode update.
In 2016, a side-channel timing attack was found by abusing the way TSX/TSX-NI handles transactional faults in order to break KASLR on all major operating systems.
Support for TSX/TSX-NI emulation is provided as part of the Intel Software Development Emulator. There is also experimental support for TSX/TSX-NI emulation in a QEMU fork.
Features
TSX/TSX-NI provides two software interfaces for designating code regions for transactional execution. Hardware Lock Elision is an instruction prefix-based interface designed to be backward compatible with processors without TSX/TSX-NI support. Restricted Transactional Memory is a new instruction set interface that provides greater flexibility for programmers.TSX/TSX-NI enables optimistic execution of transactional code regions. The hardware monitors multiple threads for conflicting memory accesses, while aborting and rolling back transactions that cannot be successfully completed. Mechanisms are provided for software to detect and handle failed transactions.
In other words, lock elision through transactional execution uses memory transactions as a fast path where possible, while the slow path is still a normal lock.
Hardware Lock Elision
Hardware Lock Elision adds two new instruction prefixes,XACQUIRE
and XRELEASE
. These two prefixes reuse the opcodes of the existing REPNE
/ REPE
prefixes. On processors that do not support TSX/TSX-NI, REPNE
/ REPE
prefixes are ignored on instructions for which the XACQUIRE
/ XRELEASE
are valid, thus enabling backward compatibility.The
XACQUIRE
prefix hint can only be used with the following instructions with an explicit LOCK
prefix: ADD
, ADC
, AND
, BTC
, BTR
, BTS
, CMPXCHG
, CMPXCHG8B
, DEC
, INC
, NEG
, NOT
, OR
, SBB
, SUB
, XOR
, XADD
, and XCHG
. The XCHG
instruction can be used without the LOCK
prefix as well.The
XRELEASE
prefix hint can be used both with the instructions listed above, and with the MOV mem, reg
and MOV mem, imm
instructions.HLE allows optimistic execution of a critical section by skipping the write to a lock, so that the lock appears to be free to other threads. A failed transaction results in execution restarting from the
XACQUIRE
-prefixed instruction, but treating the instruction as if the XACQUIRE
prefix were not present.Restricted Transactional Memory
Restricted Transactional Memory is an alternative implementation to HLE which gives the programmer the flexibility to specify a fallback code path that is executed when a transaction cannot be successfully executed.RTM adds three new instructions:
XBEGIN
, XEND
and XABORT
. The XBEGIN
and XEND
instructions mark the start and the end of a transactional code region; the XABORT
instruction explicitly aborts a transaction. Transaction failure redirects the processor to the fallback code path specified by the XBEGIN
instruction, with the abort status returned in the EAX
register.EAX register bit position | Meaning |
0 | Set if abort caused by XABORT instruction. |
1 | If set, the transaction may succeed on a retry. This bit is always clear if bit 0 is set. |
2 | Set if another logical processor conflicted with a memory address that was part of the transaction that aborted. |
3 | Set if an internal buffer overflowed. |
4 | Set if debug breakpoint was hit. |
5 | Set if an abort occurred during execution of a nested transaction. |
23:6 | Reserved. |
31:24 | XABORT argument. |
XTEST
instruction
TSX/TSX-NI provides a new XTEST
instruction that returns whether the processor is executing a transactional region.Implementation
Intel's TSX/TSX-NI specification describes how the transactional memory is exposed to programmers, but withholds details on the actual transactional memory implementation. Intel specifies in its developer's and optimization manuals that Haswell maintains both read-sets and write-sets at the granularity of a cache line, tracking addresses in the L1 data cache of the processor. Intel also states that data conflicts are detected through the cache coherence protocol.Haswell's L1 data cache has an associativity of eight. This means that in this implementation, a transactional execution that writes to nine distinct locations mapping to the same cache set will abort. However, due to micro-architectural implementations, this does not mean that fewer accesses to the same set are guaranteed to never abort. Additionally, in CPU configurations with Hyper-Threading Technology, the L1 cache is shared between the two threads on the same core, so operations in a sibling logical processor of the same core can cause evictions.
Independent research points into Haswell’s transactional memory most likely being a deferred update system using the per-core caches for transactional data and register checkpoints. In other words, Haswell is more likely to use the cache-based transactional memory system, as it is a much less risky implementation choice. On the other hand, Intel's Skylake or later may combine this cache-based approach with memory ordering buffer for the same purpose, possibly also providing multi-versioned transactional memory that is more amenable to speculative multithreading.
In August 2014, Intel announced that a bug exists in the TSX/TSX-NI implementation on Haswell, Haswell-E, Haswell-EP and early Broadwell CPUs, which resulted in disabling the TSX/TSX-NI feature on affected CPUs via a microcode update. The bug was fixed in F-0 steppings of the vPro-enabled Core M-5Y70 Broadwell CPU in November 2014.
The bug was found and then reported during a diploma thesis in the School of Electrical and Computer Engineering of the National Technical University of Athens.