[vsipl++] [patch] Serial Expression Profiling

Mon Aug 14 18:14:30 UTC 2006

Don,

This patch looks good.  There are 2 things I would like to change:

  - First, I would like to move the profiling code from the evaluator
    class specializations into the Dispatch_assign class.  This requires
    some changes to the Evaluator framework, and some help from the
    evaluators themselves, but not much.

    Doing this will reduce the amount of duplication, making it easier
    to add a new evaluator.  It will also give us visibility into
    distributed expressions before they are reduced.

  - Second, the expression name generator is pretty cool, but the
    psuedo-postfix notation seems unintuitive.  Since the framework you
    have is fairly general, it should not be too hard to generate a
    name with proper prefix or infix notation.

This email only discusses the second bullet.  I need to take a look at the
evaluator framework before discussing the first in more detail.

				-- Jules

 > All expressions are tagged in the profiler output with "Expr[/type/]",
 > where type is LF, Dense or Trans.  Following that is the dimensionality
 > (1D, 2D or 3D), a compact representation of the expression and finally
 > the size(s).  For example,  the following expression (where all are the
 > same size and of type Vector<T>):
 >
 >    r = v1 * v2;
 >
 > Gets logged as:
 >
 >    Expr[LF] 1D *SS 262144 : 66929535 : 1 : 262144 : 14.0664
 >
 > The expression is represented as "*SS", meaning "the binary multiply
 > operator applied to two single-precision real values" (again using the
 > BLAS/LAPACK convention of S/D/C/Z for operand types).
 > In general, operators are designated with a 'u', 'b' or 't' for unary,
 > binary and ternary operators respectively, with the exception of the
 > common binary operators, shown in their more familiar +-*/ form.
 > Multiple operators are evaluated in order, therefore
 >
 >    v1 * T(4) + v2 / v3
 >
 > is tagged as:
 >
 >    Expr[LF] 1D *SS/SS+SS 2048 : 1527534 : 1 : 6144 : 14.4451

I think it would be easier to read the expression name if
   - it used prefix or infix notation
   - treated sub-expressions differently from leaves

I.e. the above expression could be:

  prefix: +(*(S,S), /(S,S))
  infix:  (S*S)+(S/S)

I would suggest doing infix first, even though it is harder to read,
and then adding support for infix, since we'll have to support
operators (such as 'hypot') that don't have infix equivalents.

 >
 > Changing it to
 >
 >    (v1 * T(4) + v2) / v3
 >
 > yields:
 >
 >    Expr[LF] 1D *SS+SS/SS 2048 : 1536309 : 1 : 6144 : 14.3626

As an example of how this notation breaks down, (v1 * T(4)) / (v2 + v3) also
has the same name: '*SS+SS/SS'.

An alternative to generating the name in this way is to use the standard
C++ typeinfo (i.e. 'typeid(ExprBlockType).name()').  This is *much* more
verbose and difficult to read than the above, but it would be possible to
clean up in a post-processing step.

 > Index: src/vsip/impl/expr_op_names.hpp
 > ===================================================================
 > +template <template <typename, typename> class BinaryOp,
 > +          typename                            T1,
 > +          typename                            T2>
 > +struct Binary_op_tag
 > +{
 > +  static std::string tag()
 > +  {
 > +    std::ostringstream   st;
 > +    st << Binary_op_name<BinaryOp>::value
 > +       << Type_name<T1>::value
 > +       << Type_name<T2>::value;

You can determine the parameter value types from the block type
and have transform() roll them up, getting rid of the T1 and T2
parameters (see below)

 > +
 > +    return st.str();
 > +  }
 > +};

 > +/// Reduction to generate a tag for the entire expression tree
 > +
 > +struct Reduce_expr_op_name
 > +{
 > +public:
 > +
 > +  template <typename BlockT>
 > +  struct transform
 > +  {
 > +    // Leaf nodes get empty tags
 > +    static std::string tag() { return std::string(); }

Leaf nodes can figure out what they are: 'S', 'C', etc.

	return Type_name<typename BlockT::value_type>::value;

Also, we should specialize Scalar_block so that scalars can be
distinguished from vectors (perhaps 's' for scalar float, 'S' for vector
float?).

 > +  };
 > +
 > +  template <dimension_type            Dim0,
 > +	    template <typename> class Op,
 > +	    typename                  Block,
 > +	    typename                  Type>
 > +  struct transform<Unary_expr_block<Dim0, Op,
 > +                                    Block, Type> const>
 > +  {
 > +    static std::string tag()
 > +    {
 > +      return transform<Block>::tag() + Unary_op_tag<Op, Type>::tag();
 > +    }
 > +  };
 > +
 > +  template <dimension_type                Dim0,
 > +	    template <typename, typename> class Op,
 > +	    typename                      LBlock,
 > +	    typename                      LType,
 > +	    typename                      RBlock,
 > +	    typename                      RType>
 > +  struct transform<Binary_expr_block<Dim0, Op,
 > +                                     LBlock, LType,
 > +                                     RBlock, RType> const>
 > +  {
 > +    static std::string tag()
 > +    {
 > +      return transform<LBlock>::tag() + transform<RBlock>::tag() +
 > +        Binary_op_tag<Op, LType, RType>::tag();

To do prefix notation:

	 return Binary_op_tag<Op>::tag() + std::string("(")
               + transform(RBlock>::tag() + std::string(",")
               + transform<LBlock>::tag() + std::string(")");

To handle a mix of prefix and infix

	  if (Binary_op_tag<Op>::is_infix)
	    return string("(")
                  + transform(RBlock>::tag()
                  + Binary_op_tag<Op>::tag()
		 + transform<LBlock>::tag()
                  + string(")");
           else
	    return Binary_op_tag<Op>::tag() + std::string("(")
		 + transform(RBlock>::tag() + std::string(",")
		 + transform<LBlock>::tag() + std::string(")");

(Bonus points for figuring out how to avoid unnecessary parenthesis
for infix! :)

 > +    }
 > +  };

 > Index: src/vsip/impl/expr_ops_per_point.hpp
 > ===================================================================

 > +//UNARY_OPS_FUNCTOR(bnot)
-> 1 op

 > +//UNARY_OPS_FUNCTOR(ceil)
-> 1 op

 > +//UNARY_OPS_FUNCTOR(conj)
-> 1 op

 > +UNARY_OPS_FUNCTOR(cos,   T,            1);

I think that sin, cos, and tan of a float are more expnsive than 1
floating-point op

 > +UNARY_OPS_FUNCTOR(cos,   complex<T>,  12);

 > +//UNARY_OPS_FUNCTOR(floor)
-> 1 op

 > +//UNARY_OPS_FUNCTOR(imag)
-> 0 op

 > +//UNARY_OPS_FUNCTOR(lnot)
-> 1 op

 > +//UNARY_OPS_FUNCTOR(mag)

mag -> sqrt(R^2 + I^2) -> 3 + sqrt? ops

 > +//UNARY_OPS_FUNCTOR(magsq)

magsq -> R^2 + I^2 -> 3 ops

 > +//UNARY_OPS_FUNCTOR(neg)

neg -> 1 op

 > +//UNARY_OPS_FUNCTOR(real)

real -> 0 op

 > +UNARY_OPS_FUNCTOR(sqrt,  T,            1);
-> sqrt for T is probably more like 8 flops

 > +//BINARY_OPS_FUNCTOR(band)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(bor)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(bxor)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(eq)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(ge)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(gt)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(land)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(le)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(lt)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(lor)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(lxor)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(max)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(maxmgsq)
-> 3+1 ops

 > +//BINARY_OPS_FUNCTOR(min)
-> 1 op

 > +//BINARY_OPS_FUNCTOR(minmgsq)
-> 3+1 ops

 > +//BINARY_OPS_FUNCTOR(ne)
-> 1 op

 > +// The cost is computed by adding the costs for pure real, mixed 
real-complex and
 > +// pure complex adds and multiples for the given equation:
 > +
 > +//  (t1 + t2) * t3
 > +//                                   <  adds  >    <   muls   >
 > +//                                   R   M   C     R   M     C
 > +TERNARY_OPS_FUNCTOR(am, T1, T2, T3,  1 + 0 + 0*2 + 1 + 0*2 + 0*6)
 > +TERNARY_OPS_FUNCTOR(am, T1, T2, C3,  1 + 0 + 0*2 + 0 + 0*2 + 0*6)
 > +TERNARY_OPS_FUNCTOR(am, T1, C2, T3,  0 + 1 + 0*2 + 0 + 1*2 + 0*6)
 > +TERNARY_OPS_FUNCTOR(am, T1, C2, C3,  0 + 1 + 0*2 + 0 + 0*2 + 1*6)
 > +TERNARY_OPS_FUNCTOR(am, C1, T2, T3,  0 + 1 + 0*2 + 0 + 1*2 + 0*6)
 > +TERNARY_OPS_FUNCTOR(am, C1, T2, C3,  0 + 1 + 0*2 + 0 + 0*2 + 1*6)
 > +TERNARY_OPS_FUNCTOR(am, C1, C2, T3,  0 + 0 + 1*2 + 0 + 1*2 + 0*6)
 > +TERNARY_OPS_FUNCTOR(am, C1, C2, C3,  0 + 0 + 1*2 + 0 + 0*2 + 1*6)

good stuff!

-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705