[vsipl++] [patch] SAL dispatch for matrix and vector products

Thu Oct 27 13:28:59 UTC 2005

Don,

Don McCoy wrote:
> I am working on BLAS dispatch as well.  Patch to follow.  This one just 
> includes SAL.  Hopefully I've separated them well.
> 
> Two issues worth pointing out:
> 
>    1) The exec() function checks for row-major ordering because it calls 
> the newer SAL functions (mat_mul) that allow the column-stride to be 
> specified.   I believe the rows must be unit stride.  This is a little 
> less general than the older ones (mulx) which allow non-unit strides.  
> Recently, we heard back from Mercury that the older ones perform better 
> (at this time).  Given that the older ones handle non-unit strides and 
> are faster, should we revert to using those?  If Mercury changes in the 
> future, then we can follow.

Yes, we should revert to the old ones for now.

Because the old and new functions have different dispatch requirements 
(for supported strides and mixing of dimension orderings), we should 
have separate evaluators for each (as opposed to trying to hide the 
different in sal::mmul).

We almost need a Venn diagram to represent the non-overlapping sets of 
functionality:

The old matrix-multiply
  - required all operands to have the same dimension-ordering
  - supported non-unit stride in the minor dimension
  - required the major dimension to be "dense".  I.e. the major dimension
    stride == minor dimenson size * minor dimension stride.
  - provided a special case for unit-stride minor dimension (so called
    "fast" versions)

The new matrix-multiply
  - supports mixing of dimension-ordering (via the transpose flags)
  - requires unit-stride in the minor dimensions
  - allows major dimensions to be non-dense, via the column stride.

> 
>    2) Split-complex products (other than vector-vector) are not handled 
> at this time.  Just a reminder that we were going to discuss how to 
> address this issue sometime.

We should be able to handle this by:

  - providing overloads of sal::mmul for std::pair<T*, T*>, and
  - checking that all the matrices have the same complex format in ct_valid.

Granted, we wont be able to fully exercise this until we get prod 
integrated into the expression templates.

More comments below on the matrix-matrix evaluator.  Some may apply to 
the matrix-vector and vector-matrix evaluators too.

				-- Jules

> ------------------------------------------------------------------------
>

> + 
> + 
> + // SAL evaluator for matrix-matrix products.
> + 
> + template <typename Block0,
> + 	  typename Block1,
> + 	  typename Block2>
> + struct Evaluator<Op_prod_mm, Block0, Op_list_2<Block1, Block2>,
> + 		 Mercury_sal_tag>
> + {
> +   typedef typename Block0::value_type T;

You could move these typedefs here so the new ct_valid conditions below 
fit on a single line:
     typedef typename Block_layout<Block0>::order_type order0_type;
     ...

     typedef typename Block_layout<Block0>::complex_type complex0_type;
     ...

> + 
> +   static bool const ct_valid = 
> +     impl::sal::Sal_traits<T>::valid &&
> +     Type_equal<T, typename Block1::value_type>::value &&
> +     Type_equal<T, typename Block2::value_type>::value &&
> +     // check that direct access is supported
> +     Ext_data_cost<Block0>::value == 0 &&
> +     Ext_data_cost<Block1>::value == 0 &&
> +     Ext_data_cost<Block2>::value == 0;

Assuming that we're going to modify this evaluator to handle the old 
matrix multiply, the ct_valid should also check that all three matrices 
have the same dimension ordering.

Also check that all the matrices have the same complex format (this 
applies to both the old and new multiply).

> + 
> +   static bool rt_valid(Block0& r, Block1 const& a, Block2 const& b)
> +   {
> +     typedef typename Block_layout<Block0>::order_type order0_type;
> +     typedef typename Block_layout<Block1>::order_type order1_type;
> +     typedef typename Block_layout<Block2>::order_type order2_type;
> + 
> +     Ext_data<Block0> ext_r(const_cast<Block0&>(r));
> +     Ext_data<Block1> ext_a(const_cast<Block1&>(a));
> +     Ext_data<Block2> ext_b(const_cast<Block2&>(b));
> + 
> +     bool is_r_row = Type_equal<order0_type, row2_type>::value;
> +     bool is_a_row = Type_equal<order1_type, row2_type>::value;
> +     bool is_b_row = Type_equal<order2_type, row2_type>::value;
> + 
> +     bool unit_stride = false;
> +     if ( is_r_row && is_a_row && is_b_row )
> +       unit_stride = (ext_a.stride(1) == 1) && (ext_b.stride(1) == 1);
> + 
> +     return unit_stride;

For the old matrix multiply, we should check that

	dimension_type const dim0 = order0_type::impl_dim0;
	dimension_type const dim1 = order0_type::impl_dim1;

	stride(dim0) == size(dim1) * stride(dim1)

for each matrix (using impl_dim0 and impl_dim1 should make this work for 
both the row- and column-major cases).

> +   }
> + 
> +   static void exec(Block0& r, Block1 const& a, Block2 const& b)
> +   {
> +     typedef typename Block0::value_type RT;
> + 
> +     typedef typename Block_layout<Block0>::order_type order0_type;
> +     typedef typename Block_layout<Block1>::order_type order1_type;
> +     typedef typename Block_layout<Block2>::order_type order2_type;
> + 
> +     Ext_data<Block0> ext_r(const_cast<Block0&>(r));
> +     Ext_data<Block1> ext_a(const_cast<Block1&>(a));
> +     Ext_data<Block2> ext_b(const_cast<Block2&>(b));
> + 

(For the old matrix multiply) At this point, we can assume the three 
matrices are either all row-major or all column-major.

> +     if (Type_equal<order0_type, row2_type>::value)
> +     {
If row major, compute r = a b
> +       // SAL supports column-stride parameter only (rows must be unit-stride)
> +       sal::mmul( a.size(2, 0), // M
> +                  b.size(2, 1), // N
> +                  a.size(2, 1), // P
> +                  ext_a.data(), ext_a.stride(0),
> +                  ext_b.data(), ext_b.stride(0),
> +                  ext_r.data(), ext_r.stride(0) );
> +     }

If column major, we can use relation:

	(r = a b) <=> (trans(r) = trans(b) trans(a))	

To compute r.