[patch] Generic SIMD rscvmul

Wed Mar 29 13:09:03 UTC 2006

This patch implements rscvmul (real-scalar * complex-vector element-wise 
multiply (!)) using our generic SIMD framework and adds expression 
evaluators to use it.

On the GTRI Xeon machines, this boosts the performance of float rscvmul 
from ~140 MFLOPS to ~2500 MFLOPS.  Since rscvmul gets used for scaling 
with the FFTW backend, this boosts FFT w/scaling performance from ~2100 
MFLOPS to ~5000 MFLOPS.

This patch also reverts to using non-streaming SIMD stores in the vmul 
routine.  The streaming stores get better performance (~10%) for very 
large vectors that do not fit in cache, while the non-streaming stores 
get way better performance (~100%) for vectors that do fit into the caches.

Patch applied.

				-- Jules

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rscvmul.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060329/16134284/attachment.ksh>