SSSE3
From Wikipedia, the free
encyclopedia
Not to be
confused with SSE3.
Supplemental
Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction
set created by Intel and is the
fourth iteration of the SSE technology.
Contents
History
SSSE3 was first
introduced with Intel processors based on the Core microarchitecture on 26 June
2006 with the "Woodcrest" Xeons.
SSSE3 has been
referred to by the codenames Tejas New
Instructions (TNI) or Merom New
Instructions (MNI) for the first processor designs intended to support it.
Functionality
SSSE3 contains
16 new discrete instructions.
Each
instruction can act on 64-bit MMX or 128-bit XMM registers. Therefore, Intel's materials
refer to 32 new instructions.
According to
Intel:
SSSE3 provide
32 instructions (represented by 14 mnemonics) to accelerate computations on
packed integers. These include:
- Twelve
instructions that perform horizontal addition or subtraction operations.
- Six
instructions that evaluate absolute values.
- Two
instructions that perform multiply and add operations and speed up the
evaluation of dot products.
- Two
instructions that accelerate packed-integer multiply operations and
produce integer values with scaling.
- Two
instructions that perform a byte-wise, in-place shuffle according to the
second shuffle control operand.
- Six
instructions that negate packed integers in the destination operand if the
signs of the corresponding element in the source operand is less than
zero.
- Two
instructions that align data from the composite of two operands.
CPUs with SSSE3
New Instructions
In the table
below, satsw(X) (read as 'saturate to signed word') takes a signed integer X,
and converts it to −32768 if it's less than −32768, to +32767 if it's greater
than 32767, and leaves it unchanged otherwise. As normal for the Intel
architecture, bytes are 8 bits, words 16 bits, and dwords 32 bits; 'register'
refers to an MMX or XMM vector register.
PSIGNB,
PSIGNW, PSIGND
|
Packed Sign
|
Negate the elements of a register
of bytes, words or dwords if the sign of the corresponding elements of
another register is negative.
|
PABSB,
PABSW, PABSD
|
Packed Absolute Value
|
Fill the elements of a register of
bytes, words or dwords with the absolute values of the elements of another
register
|
PALIGNR
|
Packed Align Right
|
take two registers, concatenate
their values, and pull out a register-length section from an offset given by
an immediate value encoded in the instruction.
|
PSHUFB
|
Packed Shuffle Bytes
|
takes registers of bytes A = [a0
a1 a2 ...] and B = [b0 b1 b2
...] and replaces A with [ab0 ab1 ab2 ...];
except that it replaces the ith entry with 0 if the top bit of bi
is set.
|
PMULHRSW
|
Packed Multiply High with Round
and Scale
|
treat the sixteen-bit words in
registers A and B as signed 15-bit fixed-point numbers between −1 and 1 (e.g.
0x4000 is treated as 0.5 and 0xa000 as −0.75), and multiply them together
with correct rounding.
|
PMADDUBSW
|
Multiply and Add Packed Signed and
Unsigned Bytes
|
Take the bytes in registers A and
B, multiply them together, add pairs, signed-saturate and store. I.e. [a0 a1
a2 …] pmaddubsw [b0 b1 b2 …] = [satsw(a0b0+a1b1) satsw(a2b2+a3b3) …]
|
PHSUBW,
PHSUBD
|
Packed Horizontal Subtract (Words
or Doublewords)
|
takes registers A = [a0 a1 a2 …]
and B = [b0 b1 b2 …] and outputs [a0−a1 a2−a3 … b0−b1 b2−b3 …]
|
PHSUBSW
|
Packed Horizontal Subtract and
Saturate Words
|
like PHSUBW, but outputs
[satsw(a0−a1) satsw(a2−a3) … satsw(b0−b1) satsw(b2−b3) …]
|
PHADDW,
PHADDD
|
Packed Horizontal Add (Words or
Doublewords)
|
takes registers A = [a0 a1 a2 …]
and B = [b0 b1 b2 …] and outputs [a0+a1 a2+a3 … b0+b1 b2+b3 …]
|
PHADDSW
|
Packed Horizontal Add and Saturate
Words
|
like PHADDW, but outputs
[satsw(a0+a1) satsw(a2+a3) … satsw(b0+b1) satsw(b2+b3) …]
|
See also
References
External links
Multimedia extensions
|
|
|
|
|
|
|
|
|
|
|
x86 :
Instructions (Year Introduced); Italics = AMD exclusive; Year
= Superseded
|
|