[patch] Fast convolution enhancments

Don McCoy don at codesourcery.com
Sat Apr 7 23:27:33 UTC 2007


The attached patch adds support for interleaved-complex fast convolution 
with unique coefficients for each row of input/output.   This matches 
the way the problem is framed for the HPEC Challenge benchmarks.

It also supports coefficients that are already transformed from the time 
domain into the frequency domain.  The benchmarks may be run either 
way.  As expected, transforming them first is a big win performance-wise 
(30-40%).

The good news is that the performance of

    out = inv_fftm_(vmmul<0>(weights_, for_fftm_(in)));

should match this

    out = inv_fftm_(weights_ * for_fftm_(in)));

even though the latter transfers about twice as much data to the SPEs as 
the former, due to the fact that it transfers one row of input data and 
one row of weights for each row of output.  Fortunately, the DMA 
bandwidth limit has not yet been reached, so this has little or no 
impact on performance.

Support for the second expression will be posted in a separate patch.

Regards,

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fcmc2.changes
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070407/43916a14/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fcmc2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070407/43916a14/attachment-0001.ksh>


More information about the vsipl++ mailing list