[vsipl++] [patch] CML bindings for matrix transpose operations

Thu Jun 5 00:39:46 UTC 2008

Don,

I have some small comments and questions (one for Jules). Overall this 
looks good.

Don McCoy wrote:
> This is patterned after the existing serial evaluator for transpose
> operations using SIMD instructions, except that it dispatches operations
> to CML.  One important difference is that it handles split complex as
> well as interleaved.  Matrix copies are also performed (when the block
> layouts match), but only if the strides are unit in the smallest
> dimension, so as to have the potential to use the SPU's.

> +// These macros support scalar and interleaved complex types

I don't find these macros very useful. Most of them you use exactly 
once. And even the ones you use twice would result in simpler code if 
spelled out. For example:

> +
> +#define VSIP_IMPL_CML_TRANS(T, FCN, CML_FCN)    \
> +  inline void                                   \
> +  FCN(                                          \
> +    T* a, ptrdiff_t rsa, ptrdiff_t csa,         \
> +    T* z, ptrdiff_t rsz, ptrdiff_t csz,         \
> +    size_t m, size_t n)                         \
> +  {                                             \
> +    typedef Scalar_of<T>::type CML_T;           \
> +    CML_FCN(                                    \
> +      reinterpret_cast<CML_T*>(a), rsa, csa,    \
> +      reinterpret_cast<CML_T*>(z), rsz, csz,    \
> +      m, n );                                   \
> +  }
> +
> +VSIP_IMPL_CML_TRANS(float,               transpose, cml_mtrans_f)
> +VSIP_IMPL_CML_TRANS(std::complex<float>, transpose, cml_cmtrans_f)
> +#undef VSIP_IMPL_CML_TRANS
> +

actually boils down to

inline void
transpose(float *a, ptrdiff_t rsa, ptrdiff_t csa,
           float *z, ptrdiff_t rsz, ptrdiff_t csz,
           size_t m, size_t n)
{
   cml_mtrans_f(a, rsa, csa, z, rsz, csz, m, n);
}

for the first case, which I find much more readable than the macro
code above.

> +  static bool rt_valid(DstBlock& dst, SrcBlock const& src)
> +  { 
> +    bool rt = true;
> +
> +    // If performing a copy, both source and destination blocks
> +    // must be unit stride.
> +    if (Type_equal<src_order_type, dst_order_type>::value)
> +    {
> +      Ext_data<DstBlock> dst_ext(dst, SYNC_OUT);
> +      Ext_data<SrcBlock> src_ext(src, SYNC_IN);

These objects only exist to check the strides, right ? I'm aware that we 
don't have any SYNC enumerators to indicate 'no-copy', but shouldn't we 
? Using SYNC_OUT and SYNC_IN looks a bit misleading to me, in this 
context. Jules ?

> +
> +      dimension_type const s_dim1 = src_order_type::impl_dim1;
> +      dimension_type const d_dim1 = src_order_type::impl_dim1;

Why two constants, if they hold the same value ?

> +      if (dst_ext.stride(d_dim1) != 1 || src_ext.stride(s_dim1) != 1)
> +        rt = false;
> +    }
> +
> +    return rt; 
> +  }

> +
> +  static void exec(DstBlock& dst, SrcBlock const& src, row2_type, row2_type)
> +  {
> +    vsip::impl::Ext_data<DstBlock> dst_ext(dst, vsip::impl::SYNC_OUT);
> +    vsip::impl::Ext_data<SrcBlock> src_ext(src, vsip::impl::SYNC_IN);
> +

Why the full qualification here (but not above) ? (I know, this is 
really picky, but I like compact and concise code. ;-) )

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718