[vsipl++] Example of parallel processing

Tue Jul 13 22:04:37 UTC 2010

Hi Bill,

On 07/13/2010 04:24 PM, Cassanova, Bill wrote:
> Hi all,
>
> I was just wondering if there is a good example available of the
> foreach_vector method for parallel processing.

Please be aware that the foreach_vector function is not part of the 
public Sourcery VSIPL++ API, and neither is part of the VSIPL++ 
specification.

We don't recommend to use functions or types from the vsip::impl 
namespace, as we can't make any guarantees about their stability or support.

That being said, we are right now experimenting with new APIs to address 
similar problems, and expect those to be published soon.

> In particular I am thinking of a case with the following constraints:
>
> (1) A very large matrix. Each row or block of rows should be processed
> by a single processor. The assumption is there will be multiple
> processors and that using a parallel processing scheme makes “sense”.

OK.

> (2) The primary thread, or using MPI terminology, the root process will
> initialize or otherwise acquire the data.

OK.

> (3) The secondary thread or threads, assuming again MPI methodology,
> mpirun was started with –np of greater than 1.
>
> (4) The secondary threads do the “work” on the matrix, a row or group of
> rows at a time.
>
> (5) The main thread waits until all processing is complete.

You are using a vocabulary from multi-threading that is not quite 
accurate in this context: While you may identify a single process as the 
"main" process (typically the one with rank=0), there really is nothing 
particular about that, as far as its work-flow is concerned.

All processes normally process the exact same code. This is the "Single 
Program Multiple Data" model, which is different from the worker thread 
or thread pool pattern.

Thus, the line

     foreach_vector< tuple<0,1> >( mw, (*matrix) );

is executed by all processes, and there is typically no need to "wait" 
for other processes to reach the same point.

> I have searched the VSIPL++ distribution and have a working example of
> more than one thread doing “work”. I am having trouble understanding how
> the main thread waits until all other processing is done. Using MPI
> terminology, how does one determine when the rank of the process is 0
> and considered the root process and if so wait.

[...]

> foreach_vector< tuple<0,1> >( mw, (*matrix) );

After this line you can assume that all processes have finished this 
function. You may insert a barrier, but that shouldn't actually be 
needed in most cases:

   comm.barrier();

To print out the result, you should indeed use the "if (comm.rank() == 
0)" idiom.

I'm not sure whether I actually answered any of your questions. If not, 
let me know.

Thanks,
		Stefan

-- 
Stefan Seefeld
CodeSourcery
stefan at codesourcery.com
(650) 331-3385 x718