Supported SIMD instruction sets and operations

Supported SIMD instruction sets and operations
Hardware vector arithmetic (SIMD)

Prev:	SIMD vector words
Next:	Numerical accuracy of SIMD primitives

At present, the SIMD support makes use of a subset of SSE up to SSE4.1. The subset used depends on the current CPU type.

SSE1 only supports single-precision SIMD (float-4).

SSE2 introduces double-precision SIMD (double-2) and integer SIMD (all types). Integer SIMD is missing a few features; in particular, the vmin and vmax operations only work on uchar-16 and short-8.

SSE3 introduces horizontal adds (summing all components of a single vector register), which are useful for computing dot products. Where available, SSE3 operations are used to speed up sum, vdot, norm-sq, norm, and distance.

SSSE3 introduces vabs for char-16, short-8 and int-4.

SSE4.1 introduces vmin and vmax for all remaining integer types, a faster instruction for vdot, and a few other things.

On PowerPC, or older x86 chips without SSE, software fallbacks are used for all high-level vector operations. SIMD code can run with no loss in functionality, just decreased performance.

The primitives in the math.vectors.simd.intrinsics vocabulary do not have software fallbacks, but they should not be called directly in any case.