SSE4

SSE4 is a SIMD CPU instruction set used in the Intel Core microarchitecture and AMD K10. It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in Beijing, in the presentation. SSE4 is fully compatible with software written for previous generations of Intel 64 and IA-32 architecture microprocessors. All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4.

SSE4 subsets

Intel SSE4 consists of 54 instructions. A subset consisting of 47 instructions, referred to as SSE4.1 in some Intel documentation, is available in Penryn. Additionally, SSE4.2, a second subset consisting of the 7 remaining instructions, is first available in Nehalem-based Core i7. Intel credits feedback from developers as playing an important role in the development of the instruction set.
Starting with Barcelona-based processors, AMD introduced the SSE4a instruction set, which has 4 SSE4 instructions and 4 new SSE instructions. These instructions are not found in Intel's processors supporting SSE4.1 and AMD processors only started supporting Intel's SSE4.1 and SSE4.2 in the Bulldozer-based FX processors. With SSE4a the misaligned SSE feature was also introduced which meant unaligned load instructions were as fast as aligned versions on aligned addresses. It also allowed disabling the alignment check on non-load SSE operations accessing memory. Intel later introduced similar speed improvements to unaligned SSE in their Nehalem processors, but did not introduce misaligned access by non-load SSE instructions until AVX.

Name confusion

What is now known as SSSE3, introduced in the Intel Core 2 processor line, was referred to as SSE4 by some media until Intel came up with the SSSE3 moniker. Internally dubbed Merom New Instructions, Intel originally did not plan to assign a special name to them, which was criticized by some journalists. Intel eventually cleared up the confusion and reserved the SSE4 name for their next instruction set extension.
Intel is using the marketing term HD Boost to refer to SSE4.

New instructions

Unlike all previous iterations of SSE, SSE4 contains instructions that execute operations which are not specific to multimedia applications. It features a number of instructions whose action is determined by a constant field and a set of instructions that take XMM0 as an implicit third operand.
Several of these instructions are enabled by the single-cycle shuffle engine in Penryn.

SSE4.1

These instructions were introduced with Penryn microarchitecture, the 45 nm shrink of Intel's Core microarchitecture. Support is indicated via the CPUID.01H:ECX.SSE41 flag.

Instruction	Description
MPSADBW	Compute eight offset sums of absolute differences, four at a time ; this operation is important for some HD codecs, and allows an 8×8 block difference to be computed in fewer than seven cycles. One bit of a three-bit immediate operand indicates whether y₀.. y₁₀ or y₄.. y₁₄ should be used from the destination operand, the other two whether x₀..x₃, x₄..x₇, x₈..x₁₁ or x₁₂..x₁₅ should be used from the source.
PHMINPOSUW	Sets the bottom unsigned 16-bit word of the destination to the smallest unsigned 16-bit word in the source, and the next-from-bottom to the index of that word in the source.
PMULDQ	Packed signed multiplication on two sets of two out of four packed integers, the 1st and 3rd per packed 4, giving two packed 64-bit results.
PMULLD	Packed signed multiplication, four packed sets of 32-bit integers multiplied to give 4 packed 32-bit results.
DPPS, DPPD	Dot product for AOS data. This takes an immediate operand consisting of four bits to select which of the entries in the input to multiply and accumulate, and another four to select whether to put 0 or the dot-product in the appropriate field of the output.
BLENDPS, BLENDPD, BLENDVPS, BLENDVPD, PBLENDVB, PBLENDW	Conditional copying of elements in one location with another, based on the bits in an immediate operand, and on the bits in register XMM0.
PMINSB, PMAXSB, PMINUW, PMAXUW, PMINUD, PMAXUD, PMINSD, PMAXSD	Packed minimum/maximum for different integer operand types
ROUNDPS, ROUNDSS, ROUNDPD, ROUNDSD	Round values in a floating-point register to integers, using one of four rounding modes specified by an immediate operand
INSERTPS, PINSRB, PINSRD/PINSRQ, EXTRACTPS, PEXTRB, PEXTRD/PEXTRQ	The INSERTPS and PINSR instructions read 8, 16 or 32 bits from an x86 register or memory location and inserts it into a field in the destination register given by an immediate operand. EXTRACTPS and PEXTR read a field from the source register and insert it into an x86 register or memory location. For example, PEXTRD eax, , 1; EXTRACTPS , xmm1, 1 stores the first field of xmm1 in the address given by the first field of xmm0.
PMOVSXBW, PMOVZXBW, PMOVSXBD, PMOVZXBD, PMOVSXBQ, PMOVZXBQ, PMOVSXWD, PMOVZXWD, PMOVSXWQ, PMOVZXWQ, PMOVSXDQ, PMOVZXDQ	Packed sign/zero extension to wider types
PTEST	This is similar to the TEST instruction, in that it sets the Z flag to the result of an AND between its operands: ZF is set, if DEST AND SRC is equal to 0. Additionally it sets the C flag if AND SRC equals zero. This is equivalent to setting the Z flag if none of the bits masked by SRC are set, and the C flag if all of the bits masked by SRC are set.
PCMPEQQ	Quadword compare for equality
PACKUSDW	Convert signed DWORDs into unsigned WORDs with saturation.
MOVNTDQA	Efficient read from write-combining memory area into SSE register; this is useful for retrieving results from peripherals attached to the memory bus.

SSE4.2

SSE4.2 added STTNI, several new instructions that perform character searches and comparison on two operands of 16 bytes at a time. These were designed to speed up the parsing of XML documents. It also added a CRC32 instruction to compute cyclic redundancy checks as used in certain data transfer protocols. These instructions were first implemented in the Nehalem-based Intel Core i7 product line and complete the SSE4 instruction set. Support is indicated via the CPUID.01H:ECX.SSE42 flag.

Instruction	Description
CRC32	Accumulate CRC32C value using the polynomial 0x11EDC6F41.
PCMPESTRI	Packed Compare Explicit Length Strings, Return Index
PCMPESTRM	Packed Compare Explicit Length Strings, Return Mask
PCMPISTRI	Packed Compare Implicit Length Strings, Return Index
PCMPISTRM	Packed Compare Implicit Length Strings, Return Mask
PCMPGTQ	Compare Packed Signed 64-bit data For Greater Than

POPCNT and LZCNT

These instructions operate on integer rather than SSE registers, because they are not SIMD instructions, but appear at the same time and although introduced by AMD with the SSE4a instruction set, they are counted as separate extensions with their own dedicated CPUID bits to indicate support. Intel implements POPCNT beginning with the Nehalem microarchitecture and LZCNT beginning with the Haswell microarchitecture. AMD implements both beginning with the Barcelona microarchitecture.
AMD calls this pair of instructions Advanced Bit Manipulation.

Instruction	Description
POPCNT	Population count. Support is indicated via the CPUID.01H:ECX.POPCNT flag.
LZCNT	Leading zero count. Support is indicated via the CPUID.80000001H:ECX.ABM flag.

The encoding of lzcnt is similar enough to bsr that if lzcnt is performed on a CPU not supporting it such as Intel CPU's prior to Haswell, it will perform the bsr operation instead of raising an invalid instruction error despite the different result values of lzcnt and bsr.
Trailing zeros can be counted using the bsf or tzcnt instructions.

SSE4a

The SSE4a instruction group was introduced in AMD's Barcelona microarchitecture. These instructions are not available in Intel processors. Support is indicated via the CPUID.80000001H:ECX.SSE4A flag.

Instruction	Description
EXTRQ/INSERTQ	Combined mask-shift instructions.
MOVNTSD/MOVNTSS	Scalar streaming store instructions.

Supporting CPUs

Intel
* Silvermont processors
* Goldmont processors
* Goldmont Plus processors
* Tremont processors
* Penryn processors
* Nehalem processors and Westmere processors
* Sandy Bridge processors and newer
* Haswell processors and newer
AMD
* K10-based processors
* "Cat" low-power processors
** Bobcat-based processors
** Jaguar-based processors and newer
** Puma-based processors and newer
* "Heavy Equipment" processors
** Bulldozer-based processors
** Piledriver-based processors
** Steamroller-based processors
** Excavator-based processors and newer
* Zen-based processors
* Zen+-based processors
* Zen2-based processors
VIA
* Nano 3000, X2, QuadCore processors
* Nano QuadCore C4000-series processors
* Eden X4 processors
Zhaoxin
* ZX-C processors and newer

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...