[vsipl++] [patch] CML bindings for matrix transpose operations
Jules Bergmann
jules at codesourcery.com
Thu Jun 5 14:06:13 UTC 2008
Don McCoy wrote:
> This is patterned after the existing serial evaluator for transpose
> operations using SIMD instructions, except that it dispatches operations
> to CML. One important difference is that it handles split complex as
> well as interleaved. Matrix copies are also performed (when the block
> layouts match), but only if the strides are unit in the smallest
> dimension, so as to have the potential to use the SPU's.
Don,
This looks good. I have a couple of minor comments below, but please
check it in.
-- Jules
> +#define VSIP_IMPL_CML_COPY_UNIT(T, FCN, CML_FCN) \
> + inline void \
> + FCN( \
> + T* a, ptrdiff_t rsa, \
> + T* z, ptrdiff_t rsz, \
> + size_t n) \
> + { \
> + typedef Scalar_of<T>::type CML_T; \
> + CML_FCN( \
> + reinterpret_cast<CML_T*>(a), rsa, \
> + reinterpret_cast<CML_T*>(z), rsz, \
> + n * (Is_complex<T>::value ? 2 : 1)); \
> + }
> +
> +VSIP_IMPL_CML_COPY_UNIT(float, copy_unit, cml_vcopy_f)
> +VSIP_IMPL_CML_COPY_UNIT(complex<float>, copy_unit, cml_vcopy_f)
What does the '_unit' suffix indicate? 'copy_unit' appears to take a
vector with a stride. It could easily handle a non-unit stride
vector.
For naming, you might call this function 'vcopy' to indicate that it
copies a vector (as opposed to a matrix).
> + if (s == 'r' && d == 'r') return "Expr_Trans (rr copy)";
> + else if (s == 'r' && d == 'c') return "Expr_Trans (rc trans)";
> + else if (s == 'c' && d == 'r') return "Expr_Trans (cr trans)";
> + else /* (s == 'c' && d == 'c') */ return "Expr_Trans (cc copy)";
You could rename these ^^^^^^^^^^
to something CML specific "Cml_tag matrix (rr copy)" etc.
> + static bool rt_valid(DstBlock& dst, SrcBlock const& src)
> + {
...
> + dimension_type const s_dim1 = src_order_type::impl_dim1;
> + dimension_type const d_dim1 = src_order_type::impl_dim1;
As Stefan points out, this looks suspicious. Did you really mean
*dst*_order_type for the second?
Also, because the copy sub-evaluators require the block to be dense
('assert(dst_ext.stride(0) == dst.size(2, 1))' etc), shouldn't
rt_valid enfore this restriction too?
> +
> + if (dst_ext.stride(d_dim1) != 1 || src_ext.stride(s_dim1) != 1)
> + rt = false;
> + }
> +
> + return rt;
> + }
--
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705
More information about the vsipl++
mailing list