[vsipl++] [patch] SSAR optimizations for HPEC

Tue Nov 18 23:04:16 UTC 2008

Jules Bergmann wrote:
> This patch includes the optimizations done for HPEC
>  - use half-fast convolution
>  - 128-byte alignment of data
>  - place large data sets in huge page memory
>  - remove fftshifts from compute sections
>
Some minor comments below.

> +  typedef vsip::impl::Layout<2, row2_type, 
> vsip::impl::Stride_unit_align<128>,
> +                     vsip::impl::dense_complex_type>
> +        row_layout_type;
> +  typedef vsip::impl::Layout<2, col2_type, 
> vsip::impl::Stride_unit_align<128>,
> +                 vsip::impl::dense_complex_type>
> +        col_layout_type;

Is this solely to get the desired alignment?  Aren't we getting that 
anyway by specifying  --with-alignment=128 at configure time?  Or is 
there another reason to use Fast_block<>?

> @@ -98,16 +131,19 @@
>  
>  template <typename T>
>  Kernel1_base<T>::Kernel1_base(scalar_f scale, length_type n,
> ...
> {
>    using vsip_csl::matlab::fftshift;
> +  using vsip_csl::matlab::fftshift_0;
> +  using vsip_csl::matlab::fd_fftshift_1;
>    using vsip_csl::load_view_as;
It was not clear to me what these were until I saw the comments in the 
matlab file.  I'm not sure if we shouldn't either rename them, or just 
add a comment to make it clear why we are doing half of the shift here 
(and where the other half of the shift is taking place).  Maybe both?

For example, would it be less confusing to say

    fftshift<row>(...)

to do a swap of the top half of rows with the bottom half?  Likewise 
with fftshift<col>.  We could fix it so that fftshift() (with no 
template parameter) would default to doing both (aka a 'normal' fftshift).

Just a suggestion.  If you don't think it more clear that way, then I 
think a simple comment would suffice.

> @@ -253,12 +291,15 @@
>    typedef Fft<const_Vector, complex<T>, complex<T>, fft_fwd, 
> by_reference> row_fft_type;
>    typedef Fft<const_Vector, complex<T>, complex<T>, fft_inv, 
> by_reference> inv_fft_type;
>    typedef Fftm<complex<T>, complex<T>, row, fft_fwd, by_reference> 
> row_fftm_type;
> +  typedef Fftm<complex<T>, complex<T>, row, fft_fwd, by_value> 
> val_row_fftm_type;
>    typedef Fftm<complex<T>, complex<T>, col, fft_fwd, by_reference> 
> col_fftm_type;
> +  typedef Fftm<complex<T>, complex<T>, col, fft_fwd, by_value> 
> val_col_fftm_type;
>    typedef Fftm<complex<T>, complex<T>, row, fft_inv, by_reference> 
> inv_row_fftm_type;
> +  typedef Fftm<complex<T>, complex<T>, row, fft_inv, by_value> 
> val_inv_row_fftm_type;
>    typedef Fftm<complex<T>, complex<T>, col, fft_inv, by_reference> 
> inv_col_fftm_type;
>  
If we are not using the by_ref types, can we just get rid of them?

>    // 62. (n by mc array of complex numbers) signal compressed along
>    //     slow-time (note that to view 'sCompr' it will need to be
>    //     fftshifted first.)
> -  s_compr_ = s_filt_ * s_compr_filt_;
> +  {
> +    Scope<user> scope("ft-halfast",  fast_time_filter_ops_);
I think this should be "half-fast" ... as in a few other places.

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712