[vsipl++] [patch] CML bindings for matrix transpose operations
Stefan Seefeld
stefan at codesourcery.com
Thu Jun 5 00:39:46 UTC 2008
Don,
I have some small comments and questions (one for Jules). Overall this
looks good.
Don McCoy wrote:
> This is patterned after the existing serial evaluator for transpose
> operations using SIMD instructions, except that it dispatches operations
> to CML. One important difference is that it handles split complex as
> well as interleaved. Matrix copies are also performed (when the block
> layouts match), but only if the strides are unit in the smallest
> dimension, so as to have the potential to use the SPU's.
> +// These macros support scalar and interleaved complex types
I don't find these macros very useful. Most of them you use exactly
once. And even the ones you use twice would result in simpler code if
spelled out. For example:
> +
> +#define VSIP_IMPL_CML_TRANS(T, FCN, CML_FCN) \
> + inline void \
> + FCN( \
> + T* a, ptrdiff_t rsa, ptrdiff_t csa, \
> + T* z, ptrdiff_t rsz, ptrdiff_t csz, \
> + size_t m, size_t n) \
> + { \
> + typedef Scalar_of<T>::type CML_T; \
> + CML_FCN( \
> + reinterpret_cast<CML_T*>(a), rsa, csa, \
> + reinterpret_cast<CML_T*>(z), rsz, csz, \
> + m, n ); \
> + }
> +
> +VSIP_IMPL_CML_TRANS(float, transpose, cml_mtrans_f)
> +VSIP_IMPL_CML_TRANS(std::complex<float>, transpose, cml_cmtrans_f)
> +#undef VSIP_IMPL_CML_TRANS
> +
actually boils down to
inline void
transpose(float *a, ptrdiff_t rsa, ptrdiff_t csa,
float *z, ptrdiff_t rsz, ptrdiff_t csz,
size_t m, size_t n)
{
cml_mtrans_f(a, rsa, csa, z, rsz, csz, m, n);
}
for the first case, which I find much more readable than the macro
code above.
> + static bool rt_valid(DstBlock& dst, SrcBlock const& src)
> + {
> + bool rt = true;
> +
> + // If performing a copy, both source and destination blocks
> + // must be unit stride.
> + if (Type_equal<src_order_type, dst_order_type>::value)
> + {
> + Ext_data<DstBlock> dst_ext(dst, SYNC_OUT);
> + Ext_data<SrcBlock> src_ext(src, SYNC_IN);
These objects only exist to check the strides, right ? I'm aware that we
don't have any SYNC enumerators to indicate 'no-copy', but shouldn't we
? Using SYNC_OUT and SYNC_IN looks a bit misleading to me, in this
context. Jules ?
> +
> + dimension_type const s_dim1 = src_order_type::impl_dim1;
> + dimension_type const d_dim1 = src_order_type::impl_dim1;
Why two constants, if they hold the same value ?
> + if (dst_ext.stride(d_dim1) != 1 || src_ext.stride(s_dim1) != 1)
> + rt = false;
> + }
> +
> + return rt;
> + }
> +
> + static void exec(DstBlock& dst, SrcBlock const& src, row2_type, row2_type)
> + {
> + vsip::impl::Ext_data<DstBlock> dst_ext(dst, vsip::impl::SYNC_OUT);
> + vsip::impl::Ext_data<SrcBlock> src_ext(src, vsip::impl::SYNC_IN);
> +
Why the full qualification here (but not above) ? (I know, this is
really picky, but I like compact and concise code. ;-) )
Thanks,
Stefan
--
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718
More information about the vsipl++
mailing list