[vsipl++] [patch] Fast convolution expression templates

Fri Apr 13 19:31:45 UTC 2007

Don McCoy wrote:
> Jules Bergmann wrote:
>> > +  static bool const ct_valid = Type_equal<T, std::complex<float> 
>> >::value;
>>
>> [1] We should enforce that MatBlockT::value_type == complex<float>
>>
> What about CoeffsMatBlockT?  And isn't the type of MatBlockT at least 
> captured somehow as part of fft::Fft_return_functor?

Good questions.

CoeffsMatBlockT should have value_type of T, since the Binary_expr_block 
captures both the block type and value type as template parameters. 
Making the check explicit wouldn't hurt, just slow the compiler down a tad.

The Fft_return_functor explicitly captures the output type (that is the 
second template parameter), but the input type is implicit from 
MatBlockT's value_type.  I.e. it is possible to capture a real->complex 
as a Return_expr_block / Fft_return_functor combination.

> 
> Come to think of it, what about VecBlockT in the previous expression?  
> These are a little tricky -- I could stand to solidify my understanding 
> a bit here.  :)

Good catch!  We should check VecBlockT in the previous expression.  It 
could be a scalar, or a complex<double>, etc.

> 
> 
>> > +
>> > +  static bool rt_valid(DstBlock& dst, SrcBlock const& /*src*/)
>> > +  {
>> > +    return fconv_type::rt_valid_size(dst.size(2, 1));
>>
>> [2] Do we need to enforce any other run-time constaints?  Ext_data 
>> access OK?
>> Unit-stride?  etc.
>>
>> Or are those handled by Fastconv_base?
> Probably both, and no.  The second is through Ext_cost or similar?

The general rule of thumb is we only want a special evaluator to apply if:

1) the blocks all support direct access,

    i.e. check at compile time that:

	Ext_data_cost<BlockT>::value == 0

2) the data is in the format we require (usually lowest order dimension
    unit stride), i.e. check at run time that:

	Ext_data<BlockT> ext(block);
	...
	ext.stride(lowest_order_dim) == 1;

Otherwise, it will be necessary to allocate a temporary and copy data, 
which is usually expensive enough to outweight using the evaluator.

Obviously, in some cases we may want to break that rule of thumb.  For 
the "original" (non-CBE) Fastconv evaluator, neither (1) nor (2) is 
checked.  However, the problem is broken down into smaller problems for 
single rows that are redispatched back to Fft and vmul.  If given data 
with non-optimal layout, the Fft may choose to reorganize a row at a 
time, while vmul will fall back to loop fusion.  Arguably, this should 
be semi-efficient, esp compared to evaluating everything with loop fusion.

For the Cbe evaluator, we only want to use the evaluator when all the 
stars line up correctly.

> 
>>
>> We should definitely check FFT scaling (see ifdef'd out check in
>> opt/expr/eval_fastconv).  IIRC that check was expensive for some
>> reason, although I believe it shouldn't be.  If it proves to be
>> expensive here, we can leave it out for the time being.
>>
> So do we need those checks in *all* evaluators then?

Yes, we should add the check to the FFTM/vmmul/FFTM Cbe evaluator.

   And on that note,
> do we want to add evaluators for the Fc_expr_tag as well (so it will 
> work for non Cell/B.E. platforms)?

Yes!  excellent that would be good!

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705