[vsipl++] [patch] Support for Cell FFT's up to 4K points.

Tue Feb 13 07:56:01 UTC 2007

Jules Bergmann wrote:
> This looks good.  I have a couple of comments below.  Once those are
> addressed, please check it in.
>
Checked in as attached.  I may have to make changes, depending on your 
replies to my questions below...

> [1] It doesn't look like any advantage is being gained by having
> E as a template parameter.  It would reduce code-size to pass
> the FFT direction as a run-time parameter instead.
>
I modified this by doing the latter.

> [2] 'in' and 'out' are good names.  'W' should probably be
> 'twiddle_factors' or something more descriptive.
>
> Also, it would be good to document the expected sizes, since 'in'
> and 'out' are of size 'length', while 'twiddle_factors' is of
> size 'length/4'.
>
Done.

> [3] From VSIPL++ API, it is permissible to scale on both forward and
> inverse FFTs. I suspect this should just pass 'scale' through.
>
Done.  Required a change in the kernel to honor the scaling in both 
cases.  Note the conditional, paraphrased here as 'if (scale == 
(double)1.f)'.  Elsewhere in the library, almost_equal() is used 
instead.  Is that really necessary since the scale factor is normally 
set to that literal value when the user does not desire scaling (as 
opposed to it being the result of a computation, as in other cases)?

> [4] There are other ways to compute twiddle factors iteratively that
> may avoid the 'cos()' call, and hence may be more efficient.
>
Not addressed at this time.

> [5] 128 is probably a good alignment.  However, it is kind of a magic
> number that should be a macro (VSIP_IMPL_CELL_DMAALIGNMENT, pick a
> good name) to call it out.
>
Changed to VSIP_IMPL_ALLOC_ALIGNMENT, and am compiling with 
--with-alignment=128.  I can't think of a compelling reason to make a 
new macro here, but if someone else can, please identify it.

> Was this the alignment problem that was causing the bad twiddle factors?
>
Yes.

> [6] Stefan is right on.  This would be a good time to compute the twiddle
> factors.  Early binding!
>
Done.

> [7] Q for Stefan: how do we handle FFT assignment?  Is there a problem 
> with
> W_ being freed multiple times?
Not addressed at this time. 

> [8] Do we really have a real->complex FFT on the SPE?
>
No, but we could implement these variations fairly easily by putting in 
an extra copy in the dispatcher.  I'll address this soon.

> [9] Should rt_valid also check that the size is a power of 2 (or does
> libfft handle non-power-of-two sizes?)
>
Added.  Thanks.

> Also, I don't see how you could check for unit-stride here since there
> is no layout info.  I need to refresh my memory on rt_valid for FFT
> dispatch a bit.
>
I was looking at the serial expression dispatch code, which of course is 
different.  I'd like to understand how this is handled in the FFT code 
though.

> [10] Do we have an ALF implementation of vmul, or was this our only impl?
>
We've replaced it with the ALF version.

> > +  end = start + 2 * n / 4;  // two complex values for each n, four 
> per vector
>
> [11] While it's not strictly necessary (because this is C, not C++; and
> because the SPEs don't go fast for double), I would put the magic value
> '4' into a const variable (local to the function):
>
Done.

> [12] Are 's' and 'e' being used?
>
Not so far as I can tell.  I removed those statements, they were debug 
statements I inadvertently left in.  Good catch.

Although this did remind me that we are not passing any flags to the SPU 
compiler like -Wall.  I changed this to do the following:

    CC_SPU_FLAGS := @CXXFLAGS@

Which passes any PPU flags from the configure line on to the SPU 
compiler.  Is this ok, at least for now?

> > +    *start++ = spu_mul(spu_sel(e0, e1, mask), vscale);
> > +    *end = spu_mul(spu_sel(s0, s1, mask), vscale);
>
> [13] Can you describe what this loop is doing?  It looks like it is (a)
> scaling by vmul and (b) reversing the vector (which includes using
> spu_sel to swap the two complex values in a SIMD registers).  Looks
> good though!
>
I added a note describing where I got the code from plus a brief 
description.

> [14] we need to allow scaling for forward FFTs.
>
Done.

> [15] why is scale a double?
>
> 'float' should be enough for single-precision FFTs.
>
> Also, while it seems reasonable to use a 'double' to scale
> double-precision FFTs (and we may want to actually do it that
> way when implementing), the VSIPL++ spec defines scale to be
> a 'float' regardless of the FFT precision.  I need to check how
> the C-VSIPL spec defines that because IIRC there was some confusion
> between the two specs here.
>
The reason is so that the parameter struct is the right size (a multiple 
of 16 bytes).

My reasoning here was that if we implement double-precision in the 
future, it would stay the same for both cases.  If the above is true, 
then I'll change it to float and add 4 bytes of padding.

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fft1d_2.diff
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20070213/ff5b2d66/attachment.ksh>