Example of parallel processing

Tue Jul 13 20:24:31 UTC 2010

Hi all,

I was just wondering if there is a good example available of the
foreach_vector method for parallel processing.

In particular I am thinking of a case with the following constraints:

(1)    A very large matrix.  Each row or block of rows should be
processed by a single processor.  The assumption is there will be
multiple processors and that using a parallel processing scheme makes
"sense".

(2)    The primary thread, or using MPI terminology, the root process
will initialize or otherwise acquire the data.

(3)    The secondary thread or threads, assuming again MPI methodology,
mpirun was started with -np of greater than 1.

(4)    The secondary threads do the "work" on the matrix, a row or group
of rows at a time.

(5)    The main thread waits until all processing is complete.

I have searched the VSIPL++ distribution and have a working example of
more than one thread doing "work".  I am having trouble understanding
how the main thread waits until all other processing is done.  Using MPI
terminology, how does one determine when the rank of the process is 0
and considered the root process and if so wait.

Example program:

template <typename T>

class matrix_walker

{

  Vector<T> my_replica;

  //Vector<T> my_tmp;

  public:

    template < typename Block >

    matrix_walker( Vector<T,Block> replica ):my_replica( replica )

    {

    }

    template<typename Block1, typename Block2, dimension_type Dim>

    void operator()

    (

      Vector<T, Block1 > in,

      Vector<T,Block2> out,

      Index<Dim>

    )

    {

      out = in * my_replica;

    }

int main(int argc, char** argv)

{

  // Initialize the library.

  vsipl init(argc, argv);

  unsigned int num_rows = 5;

  unsigned int num_columns = 10;

  vsip_csl::impl::Communicator comm =
vsip_csl::impl::default_communicator();

   typedef float value_type;

  typedef Map<Block_dist, Whole_dist> map_type;

  typedef Dense<2, value_type, row2_type, map_type > block_type;

  typedef Dense<1, value_type, row1_type, Replicated_map<1> >
replica_block_type;

  typedef Vector<value_type, replica_block_type > VEC;

  typedef Matrix< value_type, block_type > MATRIX;

  Replicated_map<1> replica_map;

  boost::shared_ptr< VEC > vec( new VEC( num_columns, replica_map ) );

  map_type map = map_type( num_processors(), 1);

  boost::shared_ptr< MATRIX > matrix( new MATRIX( num_rows, num_columns,
map ) );

  boost::shared_ptr< MATRIX > tmp_matrix( new MATRIX( num_rows,
num_columns, map ) );

  value_type idx = 0;

  for( index_type r = 0; r < matrix->size(0); ++r )

  {

    for( index_type c = 0; c < matrix->size(1); ++c )

    {

      matrix->put( r, c, idx+= 10 );

    }

  }

  (*vec) = 5;

(*tmp_matrix) = 0;

   matrix_walker< value_type > mw( vec->local() );

   foreach_vector< tuple<0,1> >( mw, (*matrix) );

// This prints out for as many threads as I started.  What I really want
is a way of determining when the worker threads are complete

//The process was started as mpirun -np 4

std::cout << (*matrix) << std::endl;  

// A second version of the program used:

// vsip_csl::impl::Communicator comm =
vsip_csl::impl::default_communicator();

// but the process only prints out the values that were assigned to that
processor

// If( comm..rank() == 0 )

// {

  // std::cout << (*matrix) << std::endl;

// }

Thanks,

Bill

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/vsipl++/attachments/20100713/18456d8e/attachment.html>