[vsipl++] [patch] SSAR optimizations for HPEC
Don McCoy
don at codesourcery.com
Tue Nov 18 23:04:16 UTC 2008
Jules Bergmann wrote:
> This patch includes the optimizations done for HPEC
> - use half-fast convolution
> - 128-byte alignment of data
> - place large data sets in huge page memory
> - remove fftshifts from compute sections
>
Some minor comments below.
> + typedef vsip::impl::Layout<2, row2_type,
> vsip::impl::Stride_unit_align<128>,
> + vsip::impl::dense_complex_type>
> + row_layout_type;
> + typedef vsip::impl::Layout<2, col2_type,
> vsip::impl::Stride_unit_align<128>,
> + vsip::impl::dense_complex_type>
> + col_layout_type;
Is this solely to get the desired alignment? Aren't we getting that
anyway by specifying --with-alignment=128 at configure time? Or is
there another reason to use Fast_block<>?
> @@ -98,16 +131,19 @@
>
> template <typename T>
> Kernel1_base<T>::Kernel1_base(scalar_f scale, length_type n,
> ...
> {
> using vsip_csl::matlab::fftshift;
> + using vsip_csl::matlab::fftshift_0;
> + using vsip_csl::matlab::fd_fftshift_1;
> using vsip_csl::load_view_as;
It was not clear to me what these were until I saw the comments in the
matlab file. I'm not sure if we shouldn't either rename them, or just
add a comment to make it clear why we are doing half of the shift here
(and where the other half of the shift is taking place). Maybe both?
For example, would it be less confusing to say
fftshift<row>(...)
to do a swap of the top half of rows with the bottom half? Likewise
with fftshift<col>. We could fix it so that fftshift() (with no
template parameter) would default to doing both (aka a 'normal' fftshift).
Just a suggestion. If you don't think it more clear that way, then I
think a simple comment would suffice.
> @@ -253,12 +291,15 @@
> typedef Fft<const_Vector, complex<T>, complex<T>, fft_fwd,
> by_reference> row_fft_type;
> typedef Fft<const_Vector, complex<T>, complex<T>, fft_inv,
> by_reference> inv_fft_type;
> typedef Fftm<complex<T>, complex<T>, row, fft_fwd, by_reference>
> row_fftm_type;
> + typedef Fftm<complex<T>, complex<T>, row, fft_fwd, by_value>
> val_row_fftm_type;
> typedef Fftm<complex<T>, complex<T>, col, fft_fwd, by_reference>
> col_fftm_type;
> + typedef Fftm<complex<T>, complex<T>, col, fft_fwd, by_value>
> val_col_fftm_type;
> typedef Fftm<complex<T>, complex<T>, row, fft_inv, by_reference>
> inv_row_fftm_type;
> + typedef Fftm<complex<T>, complex<T>, row, fft_inv, by_value>
> val_inv_row_fftm_type;
> typedef Fftm<complex<T>, complex<T>, col, fft_inv, by_reference>
> inv_col_fftm_type;
>
If we are not using the by_ref types, can we just get rid of them?
> // 62. (n by mc array of complex numbers) signal compressed along
> // slow-time (note that to view 'sCompr' it will need to be
> // fftshifted first.)
> - s_compr_ = s_filt_ * s_compr_filt_;
> + {
> + Scope<user> scope("ft-halfast", fast_time_filter_ops_);
I think this should be "half-fast" ... as in a few other places.
--
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712
More information about the vsipl++
mailing list