[vsipl++] parallel Generator_expr_block

Jules Bergmann jules at codesourcery.com
Tue Apr 3 21:06:51 UTC 2007


Assem Salama wrote:
 > Everyone,
 >  This patch allows a Generator_expr_block to act as a distributed
 > vector. This allows the user to do dist_vector=ramp(0.0, 1.0, size);

Assem,

Can you also

  - Create a vramp benchmark?  It should cover both local and distributed
    vectors.  Also, for comparison it should cover using an explicit loop.

    case 1: Local_map  vector = ramp(...);
    case 2: Map        vector = ramp(...);
    case 3: Global_map vector = ramp(...);

    case 4: Local_map  for (i=0; i<size; ++i) vector.put(i, ...);
    case 5: Map        for (i=0; i<size; ++i) vector.put(i, ...);
    case 6: Global_map for (i=0; i<size; ++i) vector.put(i, ...);

  - Create a test to cover using ramp in a distributed expression?

    It should cover assignment

      Local_map  vector = ramp(...);
      Map        vector = ramp(...);
      Global_Map vector = ramp(...);

    use in a simple operation

      Local_map  vector += ramp(...);
      Map        vector += ramp(...);
      Global_Map vector += ramp(...);

    and use in more complex operations

    for example (but not limited too):

      Map<...> map_root(1);
      Map<...> map_all(num_processors());

      map_all_vector = map_root_vector + ramp(...);

    and any other tests you can think of.  Bonus points if you break
    it!


 >
 > Thanks,
 > Assem
 >
 >
 > ------------------------------------------------------------------------
 >
 > Index: src/vsip/core/expr/generator_block.hpp
 > ===================================================================
 > --- src/vsip/core/expr/generator_block.hpp	(revision 165174)
 > +++ src/vsip/core/expr/generator_block.hpp	(working copy)
 > @@ -89,8 +89,79 @@
 >    map_type    map_;
 >  };
 >
 > +template <dimension_type Dim>
 > +Index<Dim>
 > +convert_index(Index<Dim> idx, Domain<Dim> const& dom)
 > +{
 > +  Index<Dim> res_idx;
 > +  index_type i;
 > +  for(i=0;i<Dim;i++) {
 > +    res_idx[i] = dom.first() + idx[i]*dom.stride();
 > +  }
 > +  return res_idx;
 > +}

[1] There is already a function that does this called 'domain_nth' in
domain_utils.hpp.  Can you use that instead?

The name derives from the 'impl_nth' member function of Domain<1> that
returns the 'nth' element in the domain (first * n * stride).

Also, in future when creating functions that go into header files,
they should be 'inline'.  Non-template non-inline functions will
cause the function to be defined multiple times if the header is
included by different object files, resulting in a link-time error.
Template non-inline functions are handled OK but the GNU toolchain,
but not so well by other toolchains (in particular with GreenHills/MCOE
you're forced to explicitly specify the instantiatiations you want).
So it is best to avoid template non-inline functions.

 >
 > +template <dimension_type Dim,
 > +	  typename       Generator>
 > +class Subset_block<Generator_expr_block<Dim, Generator> const>
 > +  : public Non_assignable
 > +{

[2] Why do we need to specialize Subset_block for
Generator_expr_block?


 > @@ -158,7 +229,15 @@
 >  }
 >
 >
 > +template <dimension_type Dim, typename Generator>
 > +struct Choose_peb<Generator_expr_block<Dim, Generator> const>
 > +{ typedef Peb_remap_tag type; };
 >
 > +template <dimension_type Dim, typename Generator>
 > +struct Choose_peb<Generator_expr_block<Dim, Generator> >
 > +{ typedef Peb_remap_tag type; };
 > +
 > +

[3] Looks good.

 > @@ -166,7 +245,7 @@
 >  			   Generator_expr_block<Dim, Generator> const>
 >  {
 >  #if 1
 > -  typedef Generator_expr_block<Dim, Generator> const block_type;
 > +  typedef Generator_expr_block<Dim, Generator> block_type;
 >    typedef typename CombineT::template return_type<block_type>::type
 >  		type;
 >    typedef typename CombineT::template tree_type<block_type>::type
 > @@ -208,7 +287,7 @@
 >    Generator_expr_block<Dim, Generator> const& block)
 >  {
 >  #if 1
 > -  return combine.apply_const(block);
 > +  return combine.apply(block);
 >  #else
 >    typedef typename Combine_return_type<
 >      CombineT,

[4] This looks good.  While you're in here fixing things, can you take
out the old #ifdef as well.



 > Index: src/vsip/core/parallel/expr.hpp
 > ===================================================================
 > --- src/vsip/core/parallel/expr.hpp	(revision 165174)
 > +++ src/vsip/core/parallel/expr.hpp	(working copy)
 > @@ -177,8 +177,67 @@
 >    typename View_block_storage<BlockT>::expr_type blk_;
 >  };
 >
 > +template <dimension_type Dim,
 > +	  typename       MapT,
 > +	  typename       BlockT>
 > +class Par_expr_block<Dim, MapT, BlockT, Peb_remap_tag> : Non_copyable
 > +{
 > +public:
 > +  static dimension_type const dim = Dim;
 >
 > +  typedef typename BlockT::value_type           value_type;
 > +  typedef typename BlockT::reference_type       reference_type;
 > +  typedef typename BlockT::const_reference_type const_reference_type;
 > +  typedef MapT                                  map_type;
 >
 > +
 > +  typedef Subset_block<BlockT const>                local_block_type;
 > +  typedef Distributed_block<BlockT, MapT> dst_block_type;
 > +
 > +  typedef typename View_of_dim<Dim, value_type, dst_block_type>::type
 > +		dst_view_type;
 > +  typedef typename View_of_dim<Dim, value_type, BlockT>::const_type
 > +		src_view_type;

[5] src_view_type, dst_view_type, and dst_block_type are not used.
You can delete them.

 > +
 > +
 > +public:
 > +  Par_expr_block(MapT const& map, BlockT const& block)
 > +    : map_ (map),
 > +      blk_ (const_cast<BlockT&>(block))

[6] Why do you need to strip the const off?  blk_ is a 'BlockT
const&'.

 > +  {}
 > +
 > +  ~Par_expr_block() {}
 > +
 > +  void exec() {}
 > +
 > +  // Accessors.
 > +public:
 > +  length_type size() const VSIP_NOTHROW { return blk_.size(); }
 > +  length_type size(dimension_type blk_dim, dimension_type d) const 
VSIP_NOTHROW
 > +  { return blk_.size(blk_dim, d); }
 > +
 > +  void increment_count() const VSIP_NOTHROW {}
 > +  void decrement_count() const VSIP_NOTHROW {}
 > +
 > +  // Distributed Accessors
 > +public:
 > +  local_block_type get_local_block() const
 > +    {
 > +      Domain<Dim> my_local_domain =
 > +        map_.template impl_global_domain<Dim>(map_.subblock(), 0);
 > +
 > +      Subset_block<BlockT const> subblock(my_local_domain,blk_);
 > +      return subblock;
 > +    }

[7] If possible 'subblock' should be created during the constructor,
and just returned here.  That way if this parallel expression is
"captured" in a Setup_assign statement, the overhead of creating the
subblock (and doing the reference counting) is done just once, not
each time the expression is evalauted.

 > +
 > +
 > +  // Member data.
 > +private:
 > +  MapT const&     map_;
 > +  BlockT const& blk_;

[8] You should rarely store a block directly as a reference.

First, some background.  Some blocks are stored by reference, others
by value.  In particular, blocks that own memory are stored by
reference.  This way they can be reference counted and their memory
deallocted when they are no longer needed.  Blocks that don't own
memory (and just refer to other blocks) are stored by value.  This
avoids the overhead of reference counting, which is important to get
good performance with loop fusion.

The impact of this is:

a) If you hold a reference to a by-reference reference counted block,
you may have to manually increment/decrement the reference count.
This is both painful and difficult to make exception safe.  (In some
cases it is OK to hold a reference without doing reference counting,
in particular in expression blocks).

b) If you hold a reference to a by-value block, there is a good chance
that the original block lives on the stack and that the reference will
become invalid.

The right thing to do is use View_block_storage to give you the
appropriate type.

If you want to do reference counting, you would use:

	View_block_storage<BlockT>::type

For reference counted BlockT's this results in a special reference
holder type (Ref_counted_ptr) that handles reference counting for you
automatically.

If you're not doing reference counting, you would use:

	View_block_storage<BlockT>::expr_type

or
	
	View_block_storage<BlockT>::plain_type

(The difference is that expr_type throws in a const.)

In this case, you're nearly in an expression block, so

	View_block_storage<BlockT>::expr_type

is the way to go.

 > +};
 > +
 > +
 >  /// 'Combine' functor to construct an expression of Par_expr_blocks 
from an
 >  /// expression of distributed blockes.
 >
 > @@ -441,7 +500,7 @@
 >  	  typename       MapT,
 >  	  typename       BlockT,
 >  	  typename       ImplTag>
 > -typename Par_expr_block<Dim, MapT, BlockT, ImplTag>::local_block_type&
 > +typename Par_expr_block<Dim, MapT, BlockT, ImplTag>::local_block_type

[9] Similar to above comment about passing blocks by reference and by
value.

Passing a value happens to be the right thing when local_block_type is
a subset block, but it is probably not OK for other local_block_types.
Since this function is used for all ImplTags, not just Peb_remap_tag,
this will break the other ImplTags.

The right thing to do is use View_block_storage to massage
local_block_type.  Here you would want to use plain_type since you
don't want the const.


 >  get_local_block(
 >    Par_expr_block<Dim, MapT, BlockT, ImplTag> const& block)
 >  {


-- 
Jules Bergmann
CodeSourcery
jules at codesourcery.com
(650) 331-3385 x705



More information about the vsipl++ mailing list