[pooma-dev] [RFC] Specialize (internal) guard cell handling, more data-flow analysis

Wed Jul 7 17:34:37 UTC 2004

It's closing in on four years since the POOMA team left the lab, so memory
is rusting in this regard, but...

Smarts did apply data-flow to filling the guards. However, it wasn't
perfectly efficient as there was only one Smarts DataObject associated with
each patch. More parallelism could have been achieved by having a data
object for each guard region, and this idea was thrown around, but never
seriously studied. (This would allow multiple parts of the brick to be
written into in parallel - obviously care would have to be taken in
implementing this.) 

We also discussed aggregating all of the guard updates for a single brick
into a single iterate - the current behavior creates lots of small iterates
and the overhead can kill you. This may be more along the lines of what
you're considering.

I personally don't like the idea of putting anything to do with guards,
dirty-ness, etc in BrickEngine. It is a clean abstraction. If you need an
enhanced abstraction for MultiPatchEngine to work with, then I'd build it on
top of BrickEngine. That's my 2 pfennigs anyway. :)

	Jim

------------------------------------------------------------------------
James A. Crotinger                           email:     jimc at numerix.com
NumeriX, LLC                                 phone:  (505) 424-4477 x104
2960 Rodeo Park Dr. W.
Santa Fe, NM 87505

> -----Original Message-----
> From: Richard Guenther [mailto:rguenth at tat.physik.uni-tuebingen.de]
> Sent: Wednesday, July 07, 2004 11:23 AM
> To: pooma-dev at pooma.codesourcery.com
> Subject: [pooma-dev] [RFC] Specialize (internal) guard cell handling, more
> data-flow analysis
> 
> I'm at the point thinking on how to improve MPI scalarization even more.
>   And again the obvious point is we're doing way too much (unnecessary)
> communication.
> 
> The problem is we "lower" the representation of guard cells and
> necessary updates of them too early (in the intersectors) and create
> usual iterates out of them.  A better approach would be to compute
> necessary guards at intersection time only (as I introduced in the
> previous performance improvement patches), _not_ update them, but store
> this information in the iterates.  We can then, before finally issuing
> the iterates, do data-flow analysis of the necessary guard cells and
> insert optimal update iterates at optimal places (in principle).
> 
> As iterates are per-patch, in the process of getting the above done, I'd
> suggest moving the dirty flag and its handling from the MultiPatchEngine
> down to the BrickEngine, together with more accurate information about
> the up-to-date-ness of the guards (use f.i. a GuardLayer<> to count the
> up-to-date cells).
> 
> Were there any previous plans on improving this situation within POOMA?
>   Did you never experience performance problems with the communication?
>   How did SMARTS improve situation with the guard updates (did it?)?
> 
> Thanks for any input (before I start hacking this up)!
> 
> Richard.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/pooma-dev/attachments/20040707/922842f3/attachment.html>