[PATCH] Fix reductions for MPI operation

Jeffrey D. Oldham oldham at codesourcery.com
Mon Aug 23 23:59:39 UTC 2004


Richard Guenther wrote:

>
> This patch fixes (works around) a previously discovered problem 
> (remember the WaitingIterate).  I'm sure there is a real problem
> to fix (at least for MPI - I'm not sure about Cheetah), and this
> is the least intrusive way of fixing it until the right idea for
> a cross-context csem like mechanism pops up.
>
> Without this patch random lockups during reductions may occour.
>
> Ok?
>
> Richard.
>
>
> 2004Aug21  Richard Guenther <richard.guenther at uni-tuebingen.de>
>
>     * src/Engine/RemoteEngine.h: For MPI avoid doing blocking
>     operation during reductions while iterates are still pending.

Yes, this is fine.

>------------------------------------------------------------------------
>
>Index: src/Engine/RemoteEngine.h
>===================================================================
>RCS file: /home/pooma/Repository/r2/src/Engine/RemoteEngine.h,v
>retrieving revision 1.42
>diff -u -u -r1.42 RemoteEngine.h
>--- src/Engine/RemoteEngine.h	19 Jan 2004 22:04:33 -0000	1.42
>+++ src/Engine/RemoteEngine.h	21 Aug 2004 20:10:06 -0000
>@@ -2065,6 +2065,11 @@
>     Pooma::scheduler().endGeneration();
> 
>     csem.wait();
>+#if POOMA_MPI
>+    // The above single thread waiting has the same problem as with
>+    // the MultiPatch variant.  So fix it.
>+    Pooma::blockAndEvaluate();
>+#endif
> 
>     RemoteProxy<T> globalRet(ret, computationContext);
>     ret = globalRet;  
>@@ -2186,6 +2191,27 @@
> 
>     Pooma::scheduler().endGeneration();
>     csem.wait();
>+#if POOMA_MPI
>+    // We need to wait for Reductions on _all_ contexts to complete
>+    // here, as we may else miss to issue a igc update send iterate that a
>+    // remote context waits for.  Consider the 2-patch setup
>+    //  a,b     |         g|  |          g|
>+    // with the expressions
>+    //  a(I) = b(I+1);
>+    //  bool res = all(a(I) == 0);
>+    // here we issue the following iterates:
>+    //  0: guard receive from 1 (write request b)
>+    //  1: guard send to 0      (read request b)
>+    //  0/1: expression iterate (read request b, write request a)
>+    //  0/1: reduction (read request a)
>+    //  0/1: blocking MPI_XXX
>+    // here the guard send from 1 to 0 can be skipped starting the
>+    // blocking MPI operation prematurely while context 0 needs to
>+    // wait for this send to complete in order to execute the expression.
>+    //
>+    // The easiest way (and the only available) is to blockAndEvaluate().
>+    Pooma::blockAndEvaluate();
>+#endif
> 
>     if (n > 0)
>       {
>  
>


-- 
Jeffrey D. Oldham
oldham at codesourcery.com




More information about the pooma-dev mailing list