[vsipl++] [patch] Fast convolution enhancments

Mon Apr 9 15:37:28 UTC 2007

Don McCoy wrote:
 > The attached patch adds support for interleaved-complex fast convolution
 > with unique coefficients for each row of input/output.   This matches
 > the way the problem is framed for the HPEC Challenge benchmarks.

Don,

This looks good.  I have a couple of minor comments below, but
otherwise, please check it in.

				thanks,
				-- Jules

 > Index: src/vsip/opt/cbe/ppu/fastconv.cpp
 > ===================================================================

 > +    // Note: for a matrix of coefficients, unique rows are transferred.
 > +    // For the normal case, the address is constant because the same
 > +    // vector is sent repeatedly.

Is a single vector really sent repeatedly?  Shouldn't this be:

"... the address is constant because a single vector is sent once and
used repeatedly."

 > +    params.ea_kernel += (dim == 1 ? 0 : sizeof(T) * my_rows * length);
 >      params.ea_input  += sizeof(T) * my_rows * length;
 >      params.ea_output += sizeof(T) * my_rows * length;
 >    }

 > Index: src/vsip/opt/cbe/ppu/fastconv.hpp
 > ===================================================================

 >  public:
 >    template <typename Block>
 > -  Fastconv_base(Vector<T, Block> coeffs, length_type input_size,
 > +  Fastconv_base(Vector<T, Block> coeffs, Domain<dim> input_size,

It should be more efficient to pass Domains as const references.  This
avoids the need to call Domain's copy constructor.

 > +  template <typename Block>
 > +  Fastconv_base(Matrix<T, Block> coeffs, Domain<dim> input_size,

Here too

 > +  // Member data.
 > +  Domain<dim> input_size_;

Is input_size_ used?

 > +  kernel_view_type kernel_;
 >    bool transform_kernel_;
 >    length_type size_;
 >    aligned_array<T> twiddle_factors_;

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705