[vsipl++] patch: support sal-fft
Jules Bergmann
jules at codesourcery.com
Fri Feb 17 17:26:36 UTC 2006
Stefan Seefeld wrote:
> The attached patch adds support for SAL as a new backend for our FFT
> engine.
> This new backend does not support 3D ffts, and block sizes that are not
> powers of two.
> I therefor (conditionally) masked those tests in tests/fft.cpp that SAL
> does not
> support. This can be reverted as soon as the fft_dispatch framework is
> in place,
> i.e. multiple fft backends are supported in parallel.
That sounds fine.
>
> I believe the code in src/vsip/impl/sal/fft.h can be tidied up quite a bit,
> though I'd prefer to do that only once the code is refactored, i.e. for now
> focus on feature completeness.
Stefan,
This looks good.
We need to avoid copying data, unless it is necessary to pack/unpack it,
align it, or transpose it. In particular,for complex to complex FFTs we
should use fft_coptx() instead of fft_copx().
-- Jules
>
> Regards,
> Stefan
>
>
> ------------------------------------------------------------------------
> +#elif defined(VSIP_IMPL_SAL_FFT)
> + // In some contexts, SAL destroys the input data itself, and sometimes
> + // we have to modify it to 'pack' data into the format SAL expects
> + // (see SAL Tutorial for details).
> + // Therefor, we always copy the input.
> + static const bool must_copy = true;
For complex-to-complex FFTs, there are non-clobbering variants of some
of the functions (fft_coptx instead of fft_copx) that we should use
instead of copying the data. For real-to-complex and complex-to-real it
still might be necessary to copy.
> + // SAL cannot handle non-unit strides properly as 'complex' isn't
> + // a real (packed) datatype, so the stride would be applied to the real/imag
> + // offset, too.
> + static const vsip::dimension_type transpose_target = 1;
> #else
> static const bool must_copy = false;
> static const vsip::dimension_type transpose_target = axis;
> #endif
> -
> typename impl::Maybe_force_copy<
> must_copy,typename local_type::block_type,
> typename impl::Maybe_transpose<2,axis,transpose_target>::type>::type
> @@ -833,7 +890,7 @@
> impl::Ext_data<in_block_type>(this->in_temp_).data());
>
> const bool native_order = (axis == transpose_target);
> -
> +
> this->core_->scale_ = this->scale_;
> this->core_->runs_ = local_inout.size(1-axis);
> this->core_->stride_ = 1;
> Index: src/vsip/impl/sal/fft.hpp
> +
> +template <vsip::dimension_type D, typename inT, typename outT, bool doFftm>
> +inline void
> +from_to(Fft_core<D, inT, outT, doFftm>& self, inT const* in, outT* out)
> + VSIP_NOTHROW
> +{
> + assert(0 && "Sorry, operation not yet supported for this type !");
> + // TBD
This shouldn't make it out into a release, but to be on the safe size,
you should really say:
VSIP_IMPL_THROW(vsip::impl::unimplemented("..."));
> +}
> +
> +// 1D real -> complex forward fft
> +
> +inline void
> +from_to(Fft_core<1, float, std::complex<float>, false>& self,
> + float const* in, std::complex<float>* out)
> + VSIP_NOTHROW
> +{
> + FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
> + float *out_ = reinterpret_cast<float*>(out);
We reserve the "_" suffix for member variables. Perhaps you could call
the parameter "arg_out" and then use "out" for the cast?
> + fft_ropx(&setup, const_cast<float*>(in), 1, out_, 1,
> + self.size_[0], FFT_FORWARD, sal::ESAL);
> + // unpack the data (see SAL reference for details).
> + int const N = (1 << self.size_[0]) + 2;
> + out_[N - 2] = out_[1];
> + out_[1] = 0.f;
> + out_[N - 1] = 0.f;
> + // forward fft_ropx is scaled up by 2.
> + float scale = 0.5f;
> + vsmulx(out_, 1, &scale, out_, 1, N, sal::ESAL);
> +}
> +inline void
> +from_to(Fft_core<1, std::complex<float>, std::complex<float>, false>& self,
> + std::complex<float> const *in, std::complex<float> *out) VSIP_NOTHROW
> +{
> + FFT_setup setup = reinterpret_cast<FFT_setup>(self.plan_);
> + COMPLEX *in_ = reinterpret_cast<COMPLEX *>(const_cast<std::complex<float>*>(in));
> + COMPLEX *out_ = reinterpret_cast<COMPLEX *>(out);
> + long stride = 2;
> + long direction = self.is_forward_ ? FFT_FORWARD : FFT_INVERSE;
> + fft_copx(&setup, in_, stride, out_, stride, self.size_[0],
> + direction, sal::ESAL);
We should use fft_coptx() instead to avoid clobbering the input.
Likewise for fft_copdtx.
More information about the vsipl++
mailing list