[patch] Fast convolution enhancments
Don McCoy
don at codesourcery.com
Sat Apr 7 23:27:33 UTC 2007
The attached patch adds support for interleaved-complex fast convolution
with unique coefficients for each row of input/output. This matches
the way the problem is framed for the HPEC Challenge benchmarks.
It also supports coefficients that are already transformed from the time
domain into the frequency domain. The benchmarks may be run either
way. As expected, transforming them first is a big win performance-wise
(30-40%).
The good news is that the performance of
out = inv_fftm_(vmmul<0>(weights_, for_fftm_(in)));
should match this
out = inv_fftm_(weights_ * for_fftm_(in)));
even though the latter transfers about twice as much data to the SPEs as
the former, due to the fact that it transfers one row of input data and
one row of weights for each row of output. Fortunately, the DMA
bandwidth limit has not yet been reached, so this has little or no
impact on performance.
Support for the second expression will be posted in a separate patch.
Regards,
--
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fcmc2.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070407/43916a14/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fcmc2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070407/43916a14/attachment-0001.ksh>
More information about the vsipl++
mailing list