[vsipl++] [patch] Fast convolution enhancments
Jules Bergmann
jules at codesourcery.com
Mon Apr 9 15:37:28 UTC 2007
Don McCoy wrote:
> The attached patch adds support for interleaved-complex fast convolution
> with unique coefficients for each row of input/output. This matches
> the way the problem is framed for the HPEC Challenge benchmarks.
Don,
This looks good. I have a couple of minor comments below, but
otherwise, please check it in.
thanks,
-- Jules
> Index: src/vsip/opt/cbe/ppu/fastconv.cpp
> ===================================================================
> + // Note: for a matrix of coefficients, unique rows are transferred.
> + // For the normal case, the address is constant because the same
> + // vector is sent repeatedly.
Is a single vector really sent repeatedly? Shouldn't this be:
"... the address is constant because a single vector is sent once and
used repeatedly."
> + params.ea_kernel += (dim == 1 ? 0 : sizeof(T) * my_rows * length);
> params.ea_input += sizeof(T) * my_rows * length;
> params.ea_output += sizeof(T) * my_rows * length;
> }
> Index: src/vsip/opt/cbe/ppu/fastconv.hpp
> ===================================================================
> public:
> template <typename Block>
> - Fastconv_base(Vector<T, Block> coeffs, length_type input_size,
> + Fastconv_base(Vector<T, Block> coeffs, Domain<dim> input_size,
It should be more efficient to pass Domains as const references. This
avoids the need to call Domain's copy constructor.
> + template <typename Block>
> + Fastconv_base(Matrix<T, Block> coeffs, Domain<dim> input_size,
Here too
> + // Member data.
> + Domain<dim> input_size_;
Is input_size_ used?
> + kernel_view_type kernel_;
> bool transform_kernel_;
> length_type size_;
> aligned_array<T> twiddle_factors_;
--
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
More information about the vsipl++
mailing list