[vsipl++] patch: FFT refactored

Fri Apr 14 16:31:11 UTC 2006

Stefan Seefeld wrote:
> Please find attached a patch containing a first step towards a refactored
> FFT implementation. This patch factors out different backend into their
> respective implementation (and subdirectory, for simpler maintenance).
> Once finished, different backends can be enabled via configure at the same
> time, and a compile-/runtime-dispatcher will instantiate the appropriate
> backend for a given FFT(M) object.
> 
> Here is a short list of the new files:
> 
> src/vsip/impl/fft.hpp : Contains the new public Fft(m) API.
> src/vsip/impl/fft/backend.hpp : Contains the backend interface definition.
> src/vsip/impl/fft/factory.hpp : Contains the generic backend factory bits.
> src/vsip/impl/fft/util.hpp : Contains some utility templates.
> src/vsip/impl/fft/workspace.hpp : Contains the code responsible for 
> temporary buffers.
> src/vsip/impl/fftw3/ : Directory containing the fftw3 bridge (eventually).
> src/vsip/impl/ipp/ : Directory containing IPP glue code (eventually).
> src/vsip/impl/sal/ : Directory containing SAL glue code (eventually).
> 
> The SAL binding is complete as far as the fft.cpp and fftm.cpp tests are
> concerned (these new bindings directly support split complex transforms).
> 
> However, a number of stubs are still empty, or even wrong. To fill / fix
> them I would prefer to start by writing more tests to get better coverage
> of all the supported parameters (non-square matrixes, notably, as well as
> subviews where strides differ from sizes), before moving forward.
> 
> This new code is mostly independent of existing files, i.e. it can coexist
> and even be tested with minimal changes to the existing sources / build 
> system.
> 
> Thanks,
>         Stefan

Stefan,

The big picture here looks good.  I have a few comments below, but they 
are very minor.  I'm looking forward to the updated patch!

Also, here are the ideas on using Ext_data that I promised

For a 1-dimensional FFT, you might have something like this:

    template <typename In_LP,
              typename Out_LP,
              typename InBlockT,
              typename OutBlockT>
    actual_by_ref(InBlockT const& in, OutBlockT& out)
    {
      Ext_data<InBlockT, In_LP> in_ext(in);
      Ext_data<OutBlockT, Out_LP> out_ext(out);

      backend->by_reference(in_ext.data(), in_ext.stride(0),
                            out_ext.data(), out_ext.stride(0),
                            in.size(1, 0));
    }

    template <typename T,
              typename Block1,
              typename Block2>
    by_ref(const_Vector<T, Block1> in, Vector<T, Block2> out)
    {
      // Assumptions
      //  - if backend supports non-unit-stride, input and output
      //    can have different strides
      //  - input and output must have same split/interleaved format.

      // Get layout policies for in and out:
      typedef typename Block_layout<Block1>::type in_LP;
      typedef typename Block_layout<Block2>::type out_LP;

      if (backend_->supports_split_or_interleaved())
      {
        // Backend supports either split or interleaved, but
        // requires both input and output to have same complex
        // format.  We'll use the output complex format for both:
        typedef typename out_LP::complex_type complex_type;

        // If the backend requires data to be unit-stride, force
        //    unit-stride.
        // Also, if the blocks do not support direct-access, then
        //    were going to copy, so force unit-stride.
        if (backend_->requires_unit_stride() ||
            Ext_data_cost<Block1>::value != 0 ||
            Ext_data_cost<Block2>::value != 0)
        {
          // Use same layout for both in and out
          typedef Layout<1, row1_type, Stride_unit, complex_type>
                  use_LP;

          actual_by_ref<use_LP, use_LP>(in.block(), out.block());
        }
        else
        {
          // We want to use the input block's existing layout, except we
          // want to force the complex_type to match the output block's.
          typedef typename Adjust_layout<
                  Layout<1, row1_type, Any_type, complex_type>,
                  in_LP>::type
              use_in_LP;

          typedef typename out_LP use_out_LP;

          actual_by_ref<use_in_LP, use_out_LP>(in.block(), out.block());
        }
      else if (backend->supports_only_split())
      {
        typedef Cmplx_split_fmt complex_type;

        ... repeat everything else ... ugly ...
      }
      else // if (backend->supports_only_interleaved())
      {
        typedef Cmplx_inter_fmt complex_type;

        ... repeat everything else ... ugly ...
      }
    }

This extends to 2-dim by checking whether the backend requires row-major 
or column-major, but the number of code paths explodes.

If we had a Runtime_ext_data, we could do:

    // assume Runtime_layout looks something like this:
    struct Runtime_layout
    {
      dimension_type Dim;
      enum_dim_order_type dim_order;
      enum_packing_type   packing;
      enum_complex_type   cmplx;
    };

    template <typename T,
              typename Block1,
              typename Block2>
    by_ref(const_Vector<T, Block1> in, Vector<T, Block2> out)
    {
      Runtime_layout in_layout  = layout(in.block());
      Runtime_layout out_layout = layout(out.block());

      // Setup complex format: split or interleaved.
      if (backend_->supports_only_split())
      {
        in_layout.cmplx  = cmplx_split_format;
        out_layout.cmplx = cmplx_split_format;
      }
      else if (backend_->supports_only_inter())
      {
        in_layout.cmplx  = cmplx_inter_format;
        out_layout.cmplx = cmplx_inter_format;
      }
      else // backend_->supports_split_or_inter())
      {
        // force both to have same format:
        in_layout.cmplx = out_layout.cmplx;
      }

      if (backend->supports_only_row_major())
      {
        in_layout.dim_order = row2_value;
        ...
      }
      else if (backend->supports_only_col_major())
      {
        ...
      }
      else
      {
        Q: do we require input and output to have same dim-order?
      }

      ... check unit-stride & adjust in similar way ...

      // Finally, use Runtime_ext_data
      Runtime_ext_data<Block1> in_ext(in.block(), in_layout);
      Runtime_ext-data<Block2> out_ext(out.block(), out_layout);

      backend->by_reference(in_ext.data(), in_ext.stride(0), ...);
    }

The wrinkle here is that the complex format (split vs interleaved) 
actually changes the type returned by Ext_data::data(), so we probably 
need to leave that handled at compile-time.

You could envision pushing the updating of runtime type into the backend:

    template <typename T,
              typename Block1,
              typename Block2>
    by_ref(const_Vector<T, Block1> in, Vector<T, Block2> out)
    {
      Runtime_layout in_layout  = layout(in.block());
      Runtime_layout out_layout = layout(out.block());

      // let backend modify these to its liking ...
      // i.e. maybe it supports different dim-order for input and output,
      //      maybe it doesn't.
      backend_->adjust_layout(in_layout, out_layout);

      Runtime_ext_data<Block1> in_ext(in.block(), in_layout);
      Runtime_ext-data<Block2> out_ext(out.block(), out_layout);

      backend->by_reference(in_ext.data(), in_ext.stride(0), ...);
    }

> Index: src/vsip/impl/fft.hpp
> ===================================================================
> RCS file: src/vsip/impl/fft.hpp
> diff -N src/vsip/impl/fft.hpp
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ src/vsip/impl/fft.hpp	7 Mar 2006 03:55:20 -0000
> @@ -0,0 +1,284 @@
> +/* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
> +
> +/** @file    vsip/impl/fft.hpp
> +    @author  Stefan Seefeld
> +    @date    2006-02-20
> +    @brief   VSIPL++ Library: Fft & Fftm class definitions.
> +*/
> +
> +#ifndef VSIP_IMPL_FFT_HPP
> +#define VSIP_IMPL_FFT_HPP
> +
> +#include <vsip/support.hpp>
> +#include <vsip/impl/config.hpp>
> +#include <vsip/impl/signal-types.hpp>
> +#include <vsip/impl/type_list.hpp>
> +#include <vsip/impl/fft/backend.hpp>
> +#include <vsip/impl/fft/factory.hpp>
> +#include <vsip/impl/fft/workspace.hpp>
> +#ifdef VSIP_IMPL_HAVE_SAL
> +#include <vsip/impl/sal/fft.hpp>
> +#endif
> +#ifdef VSIP_IMPL_HAVE_IPP
> +#include <vsip/impl/ipp/fft.hpp>
> +#endif
> +#if defined(VSIP_IMPL_FFTW3)
> +#include <vsip/impl/fftw3/fft.hpp>
> +#endif
> +
> +namespace vsip
> +{
> +
> +const int fft_fwd = -2;
> +const int fft_inv = -1;

Why not use fft_fwd = -1 and fft_inv = 1 ?  Those are more standard, we 
might be able to pass those values directly to some libraries.

> +
> +namespace impl
> +{
> +namespace fft
> +{

> +
> +template <typename inT,
> +	  typename outT,
> +	  int A,
> +	  int D,
> +	  unsigned nT,
> +	  alg_hint_type ahT>
> +class Fftm<inT, outT, A, D, by_value, nT, ahT>
> +  : public impl::Fft_base<2, inT, outT, 1 - A, by_value>

I'm not convinced that sharing the same base between Fft and Fftm really 
gains us much.  That said, if it is working and you are happy with it, 
that is fine, there is no need to change.

Also, it looks like Fft_base doesn't do as much (the backend really does 
the heavy lifting), making this less of an issue.

> Index: src/vsip/impl/metaprogramming.hpp
> ===================================================================
> RCS file: /home/cvs/Repository/vpp/src/vsip/impl/metaprogramming.hpp,v
> retrieving revision 1.11
> diff -u -r1.11 metaprogramming.hpp
> --- src/vsip/impl/metaprogramming.hpp	11 Jan 2006 16:22:45 -0000	1.11
> +++ src/vsip/impl/metaprogramming.hpp	7 Mar 2006 03:55:20 -0000
> @@ -125,6 +125,9 @@
>  struct Int_type
>  {};
>  
> +struct false_type { static const bool value = false; };
> +struct true_type  { static const bool value = true; };
> +

Shouldn't we name these 'False_type' and 'True_type' to avoid confusion 
with typedefs?

> Index: src/vsip/impl/fft/workspace.hpp
> ===================================================================
> RCS file: src/vsip/impl/fft/workspace.hpp
> diff -N src/vsip/impl/fft/workspace.hpp
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ src/vsip/impl/fft/workspace.hpp	7 Mar 2006 03:55:20 -0000
> @@ -0,0 +1,240 @@
> +/* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
> +
> +/** @file    vsip/impl/fft/workspace.cpp
> +    @author  Stefan Seefeld
> +    @date    2006-02-21
> +    @brief   VSIPL++ Library: FFT common infrastructure used by all 
> +    implementations.
> +*/

> +
> +/// workspace for column-wise FFTMs (and column-first 2D FFTs). As all backends
> +/// support unit-stride in the major dimension, this is optimized for col-major
> +/// storage.
> +template <typename inT, typename outT>
> +class workspace<2, inT, outT, 0>
> +{
> +  // TODO: Does this really have to be a block ?
> +  // A raw array ought to be sufficient...

Yes, a raw array would work fine.  The way input_buffer_ and 
output_buffer_ are used to allocate buffers for the in_ext and out_ext 
objects is a bit of a hack.  The pointer returned by Ext_data::data() is 
only guaranteed to be valid during the lifetime of the Ext_data object.

Two benefits of this approach are: the blocks provide exception handling 
for bad_alloc, and the block will get the size of the buffer right, in 
particular if you use Stride_unit_align<> packing format.

I would probably use a raw array here.

> +  typedef Fast_block<2, inT,
> +		     Layout<2, tuple<1,0,2>, Stride_unit_dense, Cmplx_inter_fmt>,
> +		     Local_map> in_buffer_type;
> +  typedef Fast_block<2, outT,
> +		     Layout<2, tuple<1,0,2>, Stride_unit_dense, Cmplx_inter_fmt>,
> +		     Local_map> out_buffer_type;
> +
> +public:
> +  workspace(Domain<2> const &in, Domain<2> const &out)
> +    : input_buffer_(in), output_buffer_(out) {}
> +  
> +  template <typename BE, typename Block0, typename Block1>
> +  void by_reference(BE *backend, Block0 const &in, Block1 &out)
> +  {
> +    typedef typename Block_layout<Block0>::layout_type in_l;
> +    typedef typename Block_layout<Block1>::layout_type out_l;
> +
> +    typedef Layout<2, tuple<1,0,2>, Stride_unit, typename in_l::complex_type>
> +      in_trans_layout;
> +    typedef Layout<2, tuple<1,0,2>, Stride_unit, typename out_l::complex_type>
> +      out_trans_layout;
> +
> +    typedef typename Adjust_layout<typename Block0::value_type,
> +      in_trans_layout, in_l>::type
> +      in_layout;
> +    typedef typename Adjust_layout<typename Block0::value_type,
> +      out_trans_layout, out_l>::type
> +      out_layout;
> +
> +    Ext_data<Block0, in_layout,No_count_policy,Copy_access_tag>
> +      in_ext(in, SYNC_IN, Ext_data<in_buffer_type>(input_buffer_).data());
> +    Ext_data<Block1, out_layout,No_count_policy,Copy_access_tag>
> +      out_ext(out, SYNC_OUT, Ext_data<out_buffer_type>(output_buffer_).data());
> +    // If this is a real FFT we need to make sure we pass N, not N/2+1 as size.
> +    length_type rows = std::max(in_ext.size(0), out_ext.size(0));
> +    length_type cols = std::max(in_ext.size(1), out_ext.size(1));
> +    // These blocks are col-major, so we always accept them if their rows have
> +    // unit-stride.
> +    if (in_ext.stride(0) == 1 && out_ext.stride(0) == 1)
> +      backend->by_reference(in_ext.data(), in_ext.stride(0), in_ext.stride(1),
> +			    out_ext.data(), out_ext.stride(0), out_ext.stride(1),
> +			    rows, cols);
> +  }
> +  template <typename BE, typename BlockT>
> +  void in_place(BE *backend, BlockT &inout)
> +  {
> +    typedef typename Block_layout<BlockT>::layout_type l;
> +    typedef Layout<2, tuple<1,0,2>, Stride_unit, typename l::complex_type>
> +      trans_layout;
> +    typedef typename Adjust_layout<typename BlockT::value_type,
> +      trans_layout, l>::type
> +      layout;
> +    Ext_data<BlockT, layout, No_count_policy,Copy_access_tag> 
> +      inout_ext(inout, SYNC_INOUT, Ext_data<in_buffer_type>(input_buffer_).data());
> +    // This block is col-major, so we always accept it if its rows have
> +    // unit-stride.
> +    if (inout_ext.stride(0) == 1)
> +      backend->in_place(inout_ext.data(), inout_ext.stride(0), inout_ext.stride(1),
> +			inout_ext.size(0), inout_ext.size(1));
> +  }
> +private:
> +  in_buffer_type input_buffer_;
> +  out_buffer_type output_buffer_;
> +};

> Index: src/vsip/impl/sal/fft.cpp
> ===================================================================
> RCS file: src/vsip/impl/sal/fft.cpp
> diff -N src/vsip/impl/sal/fft.cpp
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ src/vsip/impl/sal/fft.cpp	7 Mar 2006 03:55:21 -0000
> @@ -0,0 +1,1050 @@
> +/* Copyright (c) 2006 by CodeSourcery, LLC.  All rights reserved. */
> +
> +/** @file    vsip/impl/sal/fft.cpp
> +    @author  Stefan Seefeld
> +    @date    2006-02-20
> +    @brief   VSIPL++ Library: FFT wrappers and traits to bridge with 
> +             Mercury's SAL.
> +*/
> +

> +
> +template <dimension_type D, bool single> struct fft_base;
> +
> +template <dimension_type D> struct fft_base<D, true /* single precision */>
> +{
> +  typedef float rtype;
> +  typedef COMPLEX ctype;
> +  typedef COMPLEX_SPLIT ztype;
> +
> +  fft_base(Domain<D> const &dom, long options, rtype scale)
> +    : scale_(scale)
> +  {
> +    length_type size = get_sizes(dom, size_, l2size_);
> +    unsigned long nbytes = 0;
> +    fft_setup(size, options, &setup_, &nbytes);
> +    buffer_ = alloc_align<ctype>(32, dom.size());
> +  }
> +  ~fft_base() 
> +  {
> +    free_align(buffer_);
> +    fft_free(&setup_);
> +  }
> +  
> +  void scale(std::complex<rtype> *data, length_type size, rtype s)
> +  {
> +    rtype *d = reinterpret_cast<rtype*>(data);
> +    vsmulx(d, 1, &s, d, 1, 2 * size, ESAL);
> +  }
> +  void scale(rtype *data, length_type size, rtype s)
> +  {
> +    vsmulx(data, 1, &s, data, 1, size, ESAL);
> +  }

You could add a scale for split complex data.  It wouldn't remove any of 
the functions in class fft below, but it would make them more similar.

> +
> +template <typename T, int Axis, bool Fwd>
> +class impl<1, std::complex<T>, std::complex<T>, Axis, Fwd>
> +  : private fft_base<1, precision<T>::single>,
> +    public fft::backend<1, std::complex<T>, std::complex<T>, Axis, Fwd>
> +{
> +public:
> +  impl(Domain<1> const &dom, T scale)
> +    : fft_base<1, precision<T>::single>(dom, 0, scale) 
> +  {
> +  }
> +
> +  virtual void in_place(std::complex<T> *data,
> +			stride_type stride, length_type size)
> +  {
> +    assert(stride == 1);
> +    assert(size == this->size_[0]);
> +    cip(data, Fwd ? FFT_FORWARD : FFT_INVERSE);
> +    if (!almost_equal(this->scale_, T(1.)))
> +      scale(data, this->size_[0], this->scale_);
> +  }
> +
> +  virtual void in_place(std::pair<T *, T *> data,
> +			stride_type stride, length_type size)
> +  {
> +    assert(size == this->size_[0]);
> +    zip(data, stride, Fwd ? FFT_FORWARD : FFT_INVERSE);
> +    if (!almost_equal(this->scale_, T(1.)))

If fft_base provided a scale' for split, this would be identical to the 
previous function.

> +    {
> +      scale(data.first, this->size_[0], this->scale_);
> +      scale(data.second, this->size_[0], this->scale_);
> +    }
> +  }
> +

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705