[patch] Generic SIMD rscvmul
Jules Bergmann
jules at codesourcery.com
Wed Mar 29 13:09:03 UTC 2006
This patch implements rscvmul (real-scalar * complex-vector element-wise
multiply (!)) using our generic SIMD framework and adds expression
evaluators to use it.
On the GTRI Xeon machines, this boosts the performance of float rscvmul
from ~140 MFLOPS to ~2500 MFLOPS. Since rscvmul gets used for scaling
with the FFTW backend, this boosts FFT w/scaling performance from ~2100
MFLOPS to ~5000 MFLOPS.
This patch also reverts to using non-streaming SIMD stores in the vmul
routine. The streaming stores get better performance (~10%) for very
large vectors that do not fit in cache, while the non-streaming stores
get way better performance (~100%) for vectors that do fit into the caches.
Patch applied.
-- Jules
--
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rscvmul.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20060329/16134284/attachment.ksh>
More information about the vsipl++
mailing list