[vsipl++] [patch] Support for Cell FFT's up to 4K points.

Mon Feb 12 22:55:27 UTC 2007

Stefan Seefeld wrote:
>> -  Fft_impl(Domain<1> const &dom)
>> +  Fft_impl(Domain<1> const &dom, rtype scale) VSIP_THROW((std::bad_alloc))
>>     
>
> Since any throw specifier other than 'throw ()' will lead to worse code, I
> think we should not use it if we can, in particular in non-public-API code.
>
>   
The reason for the above is the the alloc below.  I guess I'm not sure 
what the right thing to do here is.

>> +      : scale_(scale),
>> +        W_(alloc_align<ctype>(128, dom.size()/4))
>>    {
>> -    // TBD
>>     
>
> I believe it would be good to compute the twiddle factors here, see below.
>
>   
I agree -- or move them into the kernel to save time DMA'ing them.  I 
will put them in the constructor for now.

>> +
>> +private:
>> +  rtype scale_;
>> +  ctype* W_;
>>  };
>>     
>
> What's the reason for you using a raw pointer here, instead of aligned_array<> ?
> I believe we should avoid raw pointers if possible, as that's less error-prone.
> (Though this particular use is exception-safe.)
>
>   
Probably no other reason than the way the code evolved.

> Don't enclose any function doing actual work into assert(), as that will be removed
> when compiling with -DNDEBUG. Also, let's be careful (and explicit) with possible
> return values: some values may be impossible with correct code (and thus should lead
> to an abort(), while others may not, and thus should result in an exception. So, it
> would be best to explicitely list (named) return values, and possibly even add a comment
> that explains what values we check for and what not. (Who knows, may be ALF's own API
> evolves, so we have to carefully make adjustments...)
>
>   
Yes, that's true.  I was trying to get more informative messages out of 
assert(), but that can better be done by naming the return value 
something informative.  Is that what you meant?

>>    }
>>  
>>  private:
>> @@ -148,7 +151,7 @@
>>      alf_task_info_t info;
>>      alf_task_info_t_CBEA spe_tsk;
>>      spe_tsk.spe_task_image = image;
>> -    spe_tsk.max_stack_size = 4096; // compute good value !
>> +    spe_tsk.max_stack_size = 80*1024;
>>     
>
> It would be good to add comments explaining where such numbers are coming from,
> so it is easy to understand them in the future.
>
>   
I will add a comment.

>   
>> Index: src/vsip/opt/cbe/spu/alf_fft_c.c
>> ...
>> +#include "../common.h"
>>     
>
> I'd suggest we avoid such relative paths in include directives. <vsip/opt/cbe/common.h>
> would be safer, I believe. Else we have to be extra careful in our next attempt to
> move things around.
>
>   
I knew you'd catch that!

I think we need something in the makefile to specify the path, because 
<vsip/opt/cbe/common.h> does not work IIRC.  I'll fix it.

>> +  // Perform the FFT, 
>> +  //   -- 'in' may be the same as 'out'
>> +  if (fftp->direction == fwd_fft)
>> +    fft_1d_r2(out, in, W, log2_size);
>> +  else
>> +    fft_1d_r2_inv(out, in,  W, log2_size, fftp->scale);
>>     
>
> Out of curiosity: do these two functions really share all the
> essential code ? I'm wondering whether putting them into two
> separate kernels would help us cut down the code / stack size.
>
>
>   
fft_1d_r2_inv() does the scaling and reordering needed after calling 
fft_1d_r2() (that is called directly for the forward cases).  It is very 
little additional code, and necessary to have for fast convolution in 
order to avoid reloading the kernel.

That doesn't stop us from putting the individual ones in separate 
kernels if we choose, but I don't think it will make much of a difference.

-- 
Don McCoy
don (at) CodeSourcery
(888) 776-0262 / (650) 331-3385, x712