Since SIMD vectors are heap-allocated objects, it is important to write code in a style which is conducive to the compiler being able to inline generic dispatch and eliminate allocation.
If the inputs to a 
math.
vectors word are statically known to be SIMD vectors, the call is converted into an SIMD primitive, and the output is then also known to be an SIMD vector (or scalar, depending on the operation); this information propagates forward within a single word (together with any inlined words and macro expansions). Any intermediate values which are not stored into collections, or returned from the word, are furthermore unboxed.
To check if optimizations are being performed, pass a quotation to the 
optimizer-report. and 
optimized. words in the 
compiler.
tree.
debugger vocabulary, and look for calls to 
Low-level SIMD primitives as opposed to high-level 
Vector operations.
For example, in the following, no SIMD operations are used at all, because the compiler's propagation pass does not consider dynamic variable usage:
USING: compiler.tree.debugger math.vectors
math.vectors.simd ;
SYMBOLS: x y ;
[
    float-4{ 1.5 2.0 3.7 0.4 } x set
    float-4{ 1.5 2.0 3.7 0.4 } y set
    x get y get v+
] optimizer-report.
The following word benefits from SIMD optimization, because it begins with an unsafe declaration:
USING: compiler.tree.debugger kernel.private
math.vectors math.vectors.simd ;
IN: simd-demo
: interpolate ( v a b -- w )
    { float-4 float-4 float-4 } declare
    [ v* ] [ [ 1.0 ] dip n-v v* ] bi-curry* bi v+ ;
\ interpolate optimizer-report.
Note that using 
declare is not recommended. Safer ways of getting type information for the input parameters to a word include defining methods on a generic word (the value being dispatched upon has a statically known type in the method body), as well as using 
Compiler specialization hints and 
inline declarations.
Here is a better version of the 
interpolate words above that uses hints:
USING: compiler.tree.debugger hints
math.vectors math.vectors.simd ;
IN: simd-demo
: interpolate ( v a b -- w )
    [ v* ] [ [ 1.0 ] dip n-v v* ] bi-curry* bi v+ ;
HINTS: interpolate float-4 float-4 float-4 ;
\ interpolate optimizer-report. 
This time, the optimizer report lists calls to both SIMD primitives and high-level vector words, because hints cause two code paths to be generated. The 
optimized. word can be used to make sure that the fast code path consists entirely of calls to primitives.
If the 
interpolate word was to be used in several places with different types of vectors, it would be best to declare it 
inline.
In the 
interpolate word, there is still a call to the 
<tuple-boa> primitive, because the return value at the end is being boxed on the heap. In the next example, no memory allocation occurs at all because the SIMD vectors are stored inside a struct class (see 
Struct classes); also note the use of inlining:
USING: compiler.tree.debugger math.vectors math.vectors.simd ;
IN: simd-demo
STRUCT: actor
{ id int }
{ position float-4 }
{ velocity float-4 }
{ acceleration float-4 } ;
GENERIC: advance ( dt object -- )
: update-velocity ( dt actor -- )
    [ acceleration>> n*v ] [ velocity>> v+ ] [ ] tri
    velocity<< ; inline
: update-position ( dt actor -- )
    [ velocity>> n*v ] [ position>> v+ ] [ ] tri
    position<< ; inline
M: actor advance ( dt actor -- )
    [ >float ] dip
    [ update-velocity ] [ update-position ] 2bi ;
M\ actor advance optimized.
The 
compiler.
cfg.
debugger vocabulary can give a lower-level picture of the generated code, that includes register assignments and other low-level details. To look at low-level optimizer output, call 
regs. on a word or quotation:
USE: compiler.tree.debugger
M\ actor advance regs.
Example of a high-performance algorithms that use SIMD primitives can be found in the following vocabularies: