[vsipl++] [patch] Support for Cell FFT's up to 4K points.

Mon Feb 12 22:22:01 UTC 2007

Don McCoy wrote:
> The attached patch provides support for 1-D complex-complex FFTs up to
> 4K points in length.  This implementation limits it to 4K to save stack
> space, even though the underlying SPE library routine (libfft) allows up
> to 8K points.
> 
> Regards,

> -  Fft_impl(Domain<1> const &dom)
> +  Fft_impl(Domain<1> const &dom, rtype scale) VSIP_THROW((std::bad_alloc))

Since any throw specifier other than 'throw ()' will lead to worse code, I
think we should not use it if we can, in particular in non-public-API code.

> +      : scale_(scale),
> +        W_(alloc_align<ctype>(128, dom.size()/4))
>    {
> -    // TBD

I believe it would be good to compute the twiddle factors here, see below.

>    }
> -  virtual void in_place(ctype *inout, stride_type s, length_type l)
> +  virtual ~Fft_impl()
>    {
> -    // TBD
> +    delete(W_);
>    }
> +
> +  virtual bool supports_scale() { return true;}
> +  virtual void in_place(ctype *inout, stride_type stride, length_type length)
> +  {
> +    assert(stride == 1);
> +
> +    compute_twiddle_factors(W_, length);

Since length is known at construction time, why can't we compute the twiddle
factors only once, as opposed to each time we call operator() ?

> +    fft_8K<T, E>(inout, inout, W_, length, this->scale_);
> +  }
>    virtual void in_place(ztype, stride_type, length_type)
>    {
>    }
> @@ -61,13 +113,21 @@
>  			    ctype *out, stride_type out_stride,
>  			    length_type length)
>    {
> -    // TBD
> +    assert(in_stride == 1);
> +    assert(out_stride == 1);
> +
> +    compute_twiddle_factors(W_, length);
> +    fft_8K<T, E>(out, in, W_, length, this->scale_);
>    }
>    virtual void by_reference(ztype, stride_type,
>  			    ztype, stride_type,
>  			    length_type)
>    {
>    }
> +
> +private:
> +  rtype scale_;
> +  ctype* W_;
>  };

What's the reason for you using a raw pointer here, instead of aligned_array<> ?
I believe we should avoid raw pointers if possible, as that's less error-prone.
(Though this particular use is exception-safe.)

> Index: src/vsip/opt/cbe/ppu/alf.hpp
> ===================================================================
> --- src/vsip/opt/cbe/ppu/alf.hpp	(revision 163034)
> +++ src/vsip/opt/cbe/ppu/alf.hpp	(working copy)
> @@ -64,25 +64,28 @@
>    template <typename P>
>    void set_parameters(P const &p) 
>    {
> -    alf_wb_add_param(impl_, const_cast<P *>(&p), sizeof(p), ALF_DATA_BYTE, 0);
> +    assert( alf_wb_add_param(impl_, const_cast<P *>(&p),  
> +        sizeof(p), ALF_DATA_BYTE, 0) >= 0 );

Don't enclose any function doing actual work into assert(), as that will be removed
when compiling with -DNDEBUG. Also, let's be careful (and explicit) with possible
return values: some values may be impossible with correct code (and thus should lead
to an abort(), while others may not, and thus should result in an exception. So, it
would be best to explicitely list (named) return values, and possibly even add a comment
that explains what values we check for and what not. (Who knows, may be ALF's own API
evolves, so we have to carefully make adjustments...)

>    }
>    template <typename D>
>    void add_input(D const *d, unsigned int length)
>    {
>      // The data size is doubled in the case of complex values, because
>      // ALF only understands floats and doubles.
> -    alf_wb_add_io_buffer(impl_, ALF_BUFFER_INPUT, const_cast<D *>(d),
> -                         length * (Is_complex<D>::value ? 2 : 1),
> -                         alf_data_type<D>::value);
> +    assert( alf_wb_add_io_buffer(impl_, ALF_BUFFER_INPUT, 
> +        const_cast<D *>(d),
> +        length * (Is_complex<D>::value ? 2 : 1), 
> +        alf_data_type<D>::value) >= 0 );

Same here.

>    }
>    template <typename D>
>    void add_output(D *d, unsigned int length)
>    {
>      // The data size is doubled in the case of complex values, because
>      // ALF only understands floats and doubles.
> -    alf_wb_add_io_buffer(impl_, ALF_BUFFER_OUTPUT, d,
> -                         length * (Is_complex<D>::value ? 2 : 1),
> -                         alf_data_type<D>::value);
> +    assert( alf_wb_add_io_buffer(impl_, ALF_BUFFER_OUTPUT, 
> +        d,
> +        length * (Is_complex<D>::value ? 2 : 1),
> +        alf_data_type<D>::value) >= 0 );

...and here.

>    }
>  
>  private:
> @@ -148,7 +151,7 @@
>      alf_task_info_t info;
>      alf_task_info_t_CBEA spe_tsk;
>      spe_tsk.spe_task_image = image;
> -    spe_tsk.max_stack_size = 4096; // compute good value !
> +    spe_tsk.max_stack_size = 80*1024;

It would be good to add comments explaining where such numbers are coming from,
so it is easy to understand them in the future.

> Index: src/vsip/opt/cbe/spu/GNUmakefile.inc.in
> ===================================================================
> --- src/vsip/opt/cbe/spu/GNUmakefile.inc.in	(revision 163034)
> +++ src/vsip/opt/cbe/spu/GNUmakefile.inc.in	(working copy)
> @@ -37,7 +37,7 @@
>  CC_SPU_FLAGS := 
>  LD_SPU_FLAGS := -Wl,-N -L$(CBE_SDK_PREFIX)/sysroot/usr/spu/lib
>  CC_EMBED_SPU := ppu-embedspu -m32
> -SPU_LIBS := -lalf
> +SPU_LIBS := -lalf -lfft

Depending on how many other libs we expect to link to, such additions may
be best to make per target, just for clarity.

> Index: src/vsip/opt/cbe/spu/alf_fft_c.c
> ===================================================================
> --- src/vsip/opt/cbe/spu/alf_fft_c.c	(revision 0)
> +++ src/vsip/opt/cbe/spu/alf_fft_c.c	(revision 0)
> @@ -0,0 +1,84 @@
> +/* Copyright (c) 2007 by CodeSourcery.  All rights reserved.
> +
> +   This file is available for license from CodeSourcery, Inc. under the terms
> +   of a commercial license and under the GPL.  It is not part of the VSIPL++
> +   reference implementation and is not available under the BSD license.
> +*/
> +/** @file    vsip/opt/cbe/spu/alf_fft_c.c
> +    @author  Don McCoy
> +    @date    2007-02-03
> +    @brief   VSIPL++ Library: Kernel to compute complex float FFT's.
> +*/
> +
> +#include <stdio.h>
> +#include <alf_accel.h>
> +#include <assert.h>
> +#include <libfft.h>
> +#include "../common.h"

I'd suggest we avoid such relative paths in include directives. <vsip/opt/cbe/common.h>
would be safer, I believe. Else we have to be extra careful in our next attempt to
move things around.

> +  // Perform the FFT, 
> +  //   -- 'in' may be the same as 'out'
> +  if (fftp->direction == fwd_fft)
> +    fft_1d_r2(out, in, W, log2_size);
> +  else
> +    fft_1d_r2_inv(out, in,  W, log2_size, fftp->scale);

Out of curiosity: do these two functions really share all the
essential code ? I'm wondering whether putting them into two
separate kernels would help us cut down the code / stack size.

> +
> +  return 0;
> +}

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718